Symptoms Predicting SARS-CoV-2 Test Results in Resident Physicians and Fellows in New York City

: Accurate prediction of SARS-CoV-2 infection based on symptoms can be a cost-efﬁcient tool for remote screening in healthcare settings with limited SARS-CoV-2 testing capacity. We used a machine learning approach to determine self-reported symptoms that best predict a positive SARS-CoV-2 test result in physician trainees from a large healthcare system in New York. We used survey data on symptoms history and SARS-CoV-2 testing results collected retrospectively from 328 physician trainees in the Mount Sinai Health System, over the period 1 February 2020 to 31 July 2020. Prospective data on symptoms reported prior to SARS-CoV-2 test results were available from the employee health service COVID-19 registry for 186 trainees and analyzed to conﬁrm absence of recall bias. We estimated the associations between symptoms and IgG antibody and/or reverse transcriptase polymerase chain reaction test results using Bayesian generalized linear mixed effect regression models adjusted for confounders. We identiﬁed symptoms predicting a positive SARS-CoV-2 test result using extreme gradient boosting (XGBoost). Cough, chills, fever, fatigue, myalgia, headache, shortness of breath, diarrhea, nausea/vomiting, loss of smell, loss of taste, malaise and runny nose were associated with a positive SARS-CoV-2 test result. Loss of taste, myalgia, loss of smell, cough and fever were identiﬁed as key predictors for a positive SARS-CoV-2 test result in the XGBoost model. Inclusion of sociodemographic and occupational risk factors in the model improved prediction only slightly (from AUC = 0.822 to AUC = 0.838). Loss of taste, myalgia, loss of smell, cough and fever are key predictors for symptom-based screening of SARS-CoV-2 infection in healthcare settings with remote screening and/or limited testing capacity.


Introduction
The Coronavirus Disease 2019 (COVID- 19) was first confirmed in the US by the Centers for Disease Control and Prevention (CDC) on 20 January 2020 [1,2]. New York City was one of the first epicenters in the United States [3], with the first case reported in New York State on 1 March 2020 [4]. Healthcare workers (HCWs) were at high risk for SARS-CoV-2 infection during the earliest surge of the pandemic due to direct exposure to COVID-19 patients, shortages of personal protective equipment and uncertainty about infection control protocols and containment strategies [5][6][7]. According to the CDC, 49,370 (16%) out of 315,531 COVID-19 cases reported in the US between 12 February and 19 April 2020 were HCWs [8].
Accurate prediction of SARS-CoV-2 infection based on symptoms can be a cost-efficient tool for remote screening in healthcare settings with limited SARS-CoV-2 testing capacity. Several studies have been undertaken to identify the combination of symptoms most predictive of COVID-19 infection, to guide precautionary self-isolation measures and to control transmission of SARS-CoV-2 [9][10][11]. Population-based studies have identified loss of taste or smell and fever to be strongly associated with SARS-CoV-2 infection among other reported symptoms [9,11,12]. However, a meta-analysis of 28 studies in 119,883 HCWs who tested positive for SARS-CoV-2 infection found fever being the most frequently reported symptom (27.5%), followed by cough (26.1%) and fatigue (23.4%), and substantial heterogeneity across studies conducted in China, the USA, the Netherlands, Germany and Spain [13]. Previous studies have included physicians, nurses, laboratory technicians and dentists, among other HCWs [8,13,14], but none has focused on physician trainees who are a relatively younger and healthier subgroup among the HCW population. COVID-19 infection in HCWs leads to shortages in personnel due to sick leaves and isolation during the quarantine period and recovery [15], which can hamper the quality of healthcare provided [11,15]. The early detection of symptoms and rapid testing are a critical screening strategy to control COVID-19 transmission [16]. Further research can contribute to optimizing symptom-based screening among HCW subgroups for the timely diagnosis of SARS-CoV-2 infection and the implementation of containment strategies to prevent further transmission among HCWs and the immediate community. This knowledge can enable low-resource healthcare systems to effectively initiate containment strategy protocols and the reduction of COVID-19 burden at a larger scale [11,15].
We therefore used a machine learning approach to investigate symptoms of SARS-CoV-2 infection that best predict IgG antibody and/or reverse transcriptase polymerase chain reaction test results in physician trainees from the larger healthcare system in New York City. We further examined whether prediction is more accurate when combining information about reported symptoms with other risk factors for SARS-CoV-2 infection previously identified in physician trainees, including sociodemographic and occupational risk factors [5,14]. This study advances existing knowledge about symptom-based screening of COVID-19 infection, as a useful, cost-efficient, remote screening tool in healthcare settings with limited SARS-CoV-2 testing capacity [9,[17][18][19]].

Study Design and Population
We conducted a retrospective cohort study of 328 physician trainees (residents and fellows) of Mount Sinai Health System (MSHS) that comprises eight hospitals in New York City and Long Island, NY. All active residents and clinical fellows from 1 January 2020 to 31 June 2020 (n = 2543) were eligible for this study. Eligible trainees were invited through email, text messages and phone calls to complete an online survey that collected information about sociodemographic, occupational and community factors related to SARS-CoV-2 infection, medical history and SARS-CoV-2 test results, as detailed previously [5]. Selfreported SARS-CoV-2 test results and prospective data on symptoms reported prior to SARS-CoV-2 testing were extracted from Mount Sinai's COVID-19 Employee Health Services (EHS) Registry. A total of 391 physician trainees responded to the survey invitation, out of which 328 trainees had undergone at least one SARS-CoV-2 test at the time of survey completion and were included in the present study. From those, 186 participants also had longitudinal data on symptomatology preceding the laboratory-confirmed SARS-CoV-2 tests available from the COVID-19 EHS registry. The study protocol was approved by the Institutional Review Board at Icahn School of Medicine at Mount Sinai. Written informed electronic consent was obtained from all study participants.

Mount Sinai Employee COVID-19 Testing and Assessment of SARS-CoV-2 Infection
On 6 March 2020, Mount Sinai's EHS established an online registry for employees to voluntarily report high-risk exposures and daily symptoms of COVID-19. RT-PCR swabs and IgG antibody testing were available at no cost to all symptomatic employees on 7 April 2020 and to asymptomatic employees by 6 May 2020. Sensitivity and specificity of the Mount Sinai Hospital Clinical Laboratory COVID-19 ELISA antibody test were 92.5% (95% CI: 80.1-97.4%) and 100% (95% CI: 95.1-100%), respectively [5,20]. The sensitivity and specificity of the Roche Cobas RT-PCR test offered were 100% [5,21]. SARS-CoV-2 infection status was assessed by the type of test (RT-PCR, IgG antibody test or both) and whether the results were positive or negative. Among the subset of 186 study participants who had prospective data recorded in the COVID-19 EHS registry, there was 100% agreement between their SARS-CoV-2 test result reported from the laboratory compared to the selfreported SARS-CoV-2 test results collected from participants during the survey [5].

Assessment of Symptoms
Participants were asked via survey to report the months over the study period they experienced cough, chills, fever, fatigue, myalgia, headache, shortness of breath, sore throat, diarrhea, nausea/vomiting, loss of sense of smell, loss of sense of taste, malaise and runny nose. Self-reported information was collected on the presence or absence of each symptom every month from February 2020 through June 2020. These prospective data were further matched with the symptom information recorded by the COVID-19 EHS registry in real time for 186 participants, before the participants underwent a laboratory-confirmed SARS-CoV-2 test. Symptoms that were assessed in the prospective EHS registry were fever or chills, new onset persistent cough, shortness of breath, fatigue, muscle or body aches, headache, new loss of taste or smell, sore throat, new onset runny nose or nasal congestion not related to allergic rhinitis, nausea or vomiting and diarrhea. Agreement between retrospectively and prospectively collected data was 100% for all reported symptoms in the subset of 186 participants.

Assessment of Sociodemographic and Occupational Factors
The survey collected additional information regarding sociodemographic (sex, age, race) and occupational factors hypothesized to be associated with SARS-CoV-2 infection, as detailed previously [5]. Among a wide list of occupational factors examined, deployment to care for unfamiliar patient populations during the COVID-19 patient surge, assignment to in-patient medical-surgical units and training in high-risk procedural specialties were associated with increased odds for SARS-CoV-2 in this study population previously [5] and were, therefore, accounted for in the present analysis.

Statistical Analysis
All main analyses were performed on the whole data set of 328 participants who selfreported undergoing at least one type of SARS-CoV-2 test over the study period in survey responses. A schematic diagram of performed analyses is shown in Supplementary Figure S1. Differences in symptoms and sociodemographic and occupational factors between SARS-CoV-2 test result groups were examined using Fisher's exact test for categorical variables and a Wilcoxon rank-sum test for continuous variables [5]. The odds ratios (95% CI) for the associations between each symptom and SARS-CoV-2 test result were estimated using Bayesian generalized linear mixed effect regression (BGlmer). Prediction analyses were performed using an extreme gradient boosting (XGBoost) model [22] that was trained exclusively for all the symptoms experienced during the first wave. Percentage Shapley additive explanations (SHAP) [23] scores were used to show the contribution of each component in the prediction model. Two XGBoost models were examined and the accuracy between the two models was compared: (1) including only symptoms as predictors of SARS-CoV-2 test results, and (2) including symptoms and additionally sex, age, race and occupational risk factors that were associated with SARS-CoV-2 test results in previous analyses [5].
Sensitivity analysis was conducted on the data set of 186 participants who had prospective data on symptoms reported prior to SARS-CoV-2 testing and laboratory-confirmed SARS-CoV-2 test results through the EHS COVID-19 registry. Both the XGBoost model including only symptoms and the XGBoost model including symptoms and other risk factors were run for this subset.
For all statistical analyses, p-values were two-sided and the level of statistical significance was set at 0.05. All analyses were conducted using SAS version 9.4 (SAS Institute, Cary, North Carolina) or R version 4.1.0. A few missing data for covariates were imputed using random forests with the "mice" R package. The prediction analysis was conducted using the XGBoost R package.

Symptoms Associated with SARS-CoV-2 Test Result
Associations between symptoms and SARS-CoV-2 test result did not substantially differ between the crude and multivariable-adjusted BGlmer regression models ( Table 2). After adjusting in the models for age, sex, race, change in usual patient population, medical-surgical unit and training specialty, 13 out of 14 symptoms were significantly associated with a positive SARS-CoV-2 test result. The strongest associations were observed for loss of taste (adjusted OR 9.77, 95% CI 9.68-9.87), loss of smell (adjusted OR 9.18, 95% CI 9.11-9.25) and fever (adjusted OR 9.17, 95% CI 2.20-38.3). Other symptoms associated with a positive SARS-CoV-2 test result were cough, chills, fatigue, myalgia, malaise and shortness of breath. The association between pharyngitis or sore throat and SARS-CoV-2 test result was not significant (OR 1.39, 95% CI 0.78-2.48) ( Table 2).

Symptoms Predicting a Positive SARS-CoV-2 Test Result in the XGBoost Model
In the prediction model including only symptoms, loss of sense of taste, myalgia and loss of sense of smell were the top three predictors of a positive SARS-CoV-2 test result. Other predictors which ranked high were cough and fever (Figure 1

Sensitivity Analysis
Restricted analysis of the 186 study participants with prospective EHS COVID-19 registry data showed similar results with loss of smell and myalgia among the top predictors for positive SARS-CoV-2 test results (Supplementary Figure S3). The statistics of the prediction models for EHS data can be found in Supplementary Table S1. The prediction model that included symptoms along with other risk factors had slightly better performance using the EHS data set, as was also observed in the main analyses of 328 participants (Supplementary Figure S4).

Sensitivity Analysis
Restricted analysis of the 186 study participants with prospective EHS COVID-19 registry data showed similar results with loss of smell and myalgia among the top predictors for positive SARS-CoV-2 test results (Supplementary Figure S3). The statistics of the prediction models for EHS data can be found in Supplementary Table S1. The prediction model that included symptoms along with other risk factors had slightly better performance using the EHS data set, as was also observed in the main analyses of 328 participants (Supplementary Figure S4).

Discussion
In this study of residents and fellows from a large healthcare center in New York City, we found that self-reported symptoms-based screening alone can accurately predict a positive SARS-CoV-2 test result. Among a wide list of symptoms associated with SARS-CoV-2 infection examined, loss of smell, myalgia, loss of taste, cough and fever were found to be top predictors of a positive SARS-CoV-2 test result. Inclusion in the prediction models of sociodemographic (sex, age, race) and occupational risk factors previously shown to increase risk of SARS-CoV-2 infection did not substantially change results, but slightly increased prediction accuracy, suggesting that the combination of symptoms with other potentially known risk factors could further optimize screening of SARS-CoV-2 infection of physician trainees in healthcare settings with remote screening and/or limited testing capacity.
A previous population-based prospective cohort study in Spain using a machine learning approach noted olfactory dysfunction, gustatory dysfunction, fever, dry cough and asthenia (weakness) to be strong predictors of a positive SARS-CoV-2 RT-PCR result; but no association between dyspnea, rhinorrhea and sore throat and a positive test result [24]. Another study analyzed about 42 prospective SARS-CoV-2 studies and also demonstrated that anosmia, ageusia, fatigue, fever and cough were associated with higher odds for SARS-CoV-2 infection [25]. Moreover, they noted that combining symptoms with other sociodemographic (age, gender, etc.) or community risk factors (e.g., travel history) may slightly improve the sensitivity of the prediction model [25]. In our study of young HCWs we observed similar findings, in addition to shortness of breath (dyspnea) and runny nose (rhinorrhea) that were significantly associated with SARS-CoV-2 infection. A metaanalysis of HCW studies also found the occurrence of lack of smell, fever and myalgia to be associated with higher odds of SARS-CoV-2 infection in symptomatic patients, and no significant association for fatigue and sore throat [26]. However, our results demonstrated association between fatigue and a positive SARS-CoV-2 infection in addition to lack of smell, fever and myalgia. We did not find sore throat or pharyngitis to be associated with a SARS-CoV-2 infection, which is in agreement with prior evidence [24][25][26][27]. This previous meta-analysis only analyzed the abovementioned five symptoms in association with SARS-CoV-2 infection due to limited data available on symptoms reported in previous studies [26]. One previous study in the UK and USA of 18,401 participants that used smartphone-based apps for symptoms screening also found loss of smell, loss of taste, high temperature, persistent cough and loss of appetite as the top predictors of SARS-CoV-2 infection [9]. In our study, we did not assess loss of appetite, but we identified loss of smell, loss of taste, fever and cough as top predictors of a positive SARS-CoV-2 test in physician trainees. Additionally, results from a few other recent symptom-based COVID-19 screening studies further support our findings that loss of smell, loss of taste, fever, cough and myalgia are important predictors of SARS-CoV-2 infection [11,24,27].
Findings from our study remained robust in sensitivity analyses of a subset of 186 trainees with prospective, real-time data on symptoms reported prior to laboratory-confirmed SARS-CoV-2 test results available from the EHS COVID-19 registry data and, therefore, reverse causation bias is unlikely. Furthermore, we found perfect agreement (100%) between self-reports of SARS-CoV-2 test results and laboratory-confirmed SARS-CoV-2 test results in physician trainees with no evidence of recall bias in survey responses during the study period. Our study sample had a similar age range and race and specialty distributions compared to the total population of eligible residents and fellows for the present study, and therefore results should be more broadly representative of the origin cohort of trainees [5]. Study limitations include the lack of data on loss of appetite previously reported as a potentially important predictor of SARS-CoV-2 infection [9]. We further assessed symptoms predicting a positive SARS-CoV-2 test result during the first COVID-19 wave and prior to vaccination campaigns. Other factors related to SARS-CoV-2 infection such as specific variants or vaccination status might impact the prediction of a positive SARS-CoV-2 test result. Further research is needed to validate our findings in recent waves with new SARS-CoV-2 variants and after vaccination.

Conclusions
Our study focused specifically on a young and generally healthy group of residents and fellows of a large healthcare system and found that loss of smell, myalgia, loss of taste, cough and fever can serve as important predictors for symptom-based screening of SARS-CoV-2 infection in healthcare settings with limited testing capacity. Moreover, the predictive value of this method can further be enhanced by inclusion of other risk factors, such as sociodemographic and occupational risk factors known to be associated with SARS-CoV-2 infection risk in HCW populations. These findings can be helpful in certain health centers with remote screening and testing shortages, whenever IgG antibody or a reverse transcriptase polymerase chain reaction test is not available.

Supplementary Materials:
The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/covid3050049/s1, Figure S1: Flow chart of all the main analyses performed; Figure S2: Importance of each predictor's contribution using percentage (SHAP) scores in the prediction model among whole data set (n = 328), including symptoms and other covariates; Figure S3: Comparing the importance of each predictor's contribution using percentage (SHAP) scores in the two prediction models among EHS data set; Figure S4: ROC comparison of prediction models using different predictors and different data set; Table S1: Comparison of statistics in three prediction models.

Data Availability Statement:
The data presented in this study are available by request from the corresponding author. The data are not publicly available due to privacy restrictions.