Comparison of clinician diagnosis of COVID-19 with real time polymerase chain reaction in an adult-representative population in Sweden

Background Due to the high transmissibility of SARS-CoV-2, accurate diagnosis is essential for effective infection control, but the gold standard, real-time reverse transcriptase-polymerase chain reaction (RT-PCR), is costly, slow, and test capacity has at times been insufficient. We compared the accuracy of clinician diagnosis of COVID-19 against RT-PCR in a general adult population. Methods COVID-19 diagnosis data by 30th September 2021 for participants in an ongoing population-based cohort study of adults in Western Sweden were retrieved from registers, based on positive RT-PCR and clinician diagnosis using recommended ICD-10 codes. We calculated accuracy measures of clinician diagnosis using RT-PCR as reference for all subjects and stratified by age, gender, BMI, and comorbidity collected pre-COVID-19. Results Of 42,621 subjects, 3,936 (9.2%) and 5705 (13.4%) had had COVID-19 identified by RT-PCR and clinician diagnosis, respectively. Sensitivity and specificity of clinician diagnosis against RT-PCR were 78% (95%CI 77–80%) and 93% (95%CI 93–93%), respectively. Positive predictive value (PPV) was 54% (95%CI 53–55%), while negative predictive value (NPV) was 98% (95%CI 98–98%) and Youden’s index 71% (95%CI 70–72%). These estimates were similar between men and women, across age groups, BMI categories, and between patients with and without asthma. However, while specificity, NPV, and Youden’s index were similar between patients with and without chronic obstructive pulmonary disease (COPD), sensitivity was slightly higher in patients with (84% [95%CI 74–90%]) than those without (78% [95%CI 77–79%]) COPD. Conclusions The accuracy of clinician diagnosis for COVID-19 is adequate, regardless of gender, age, BMI, and asthma, and thus can be used for screening purposes to supplement RT-PCR. Supplementary Information The online version contains supplementary material available at 10.1186/s12931-023-02315-7.


Introduction
As COVID-19 continues to spread in waves across the world, rapid and accurate diagnosis are essential tools to identify, isolate, and appropriately manage patients, thereby decreasing the rate of infectivity, morbidity, and mortality [1]. Robust and rapid diagnosis of COVID-19 also aids in surveillance, management and control of disease, epidemiologic characterization, contact tracing, and decision making for public health purposes [2,3]. However, at the beginning of the pandemic, diagnosis was challenging, primarily because of the disparate symptoms manifested by those infected, ranging from mild or no symptoms to life-threatening presentations [4]. In response, various diagnostic approaches were employed, which are classified based on their underlying indications and principles.
Diagnostic approaches currently being used for COVID-19 can be broadly divided into two basic categories: clinical and in vitro diagnostics [5][6][7]. The clinical diagnostic methods are based on assessment of symptoms, imaging techniques, and laboratory tests. Findings from these methods can be non-specific and insufficient to provide compelling evidence of COVID-19 infection [7]. The diagnostic methods are commonly divided into: (a) nucleic acid-based assays, in which RNA of the virus causing COVID-19, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is amplified; and (b) serological assays, in which antibodies/antigens specific to SARS-CoV-2 are targeted [8,9]. Real-time reverse transcriptase-polymerase chain reaction (RT-PCR) is one of the most sensitive and widely implemented nucleic acid-based assays [10,11], commonly considered to be the "gold standard" for diagnosis of COVID-19 [12,13].
Not all COVID-19 patients or suspected cases end up getting an RT-PCR test, but rather are examined by a clinician and thus get classified using the recommended International Classification of Disease (ICD) codes for COVID-19. It is unclear to what extent these clinicianbased diagnoses correctly identify true COVID-19 cases. While only a few studies have evaluated the accuracy of COVID-19 diagnosis, majority of these have usually compared topography with RT-PCR. To our knowledge, the accuracy of the recommended ICD codes has not yet been validated in a population-based setting. Doing so will help to ascertain as to what extent they can be used independently or complimentarily to RT-PCR for diagnosing COVID-19. The aim of this study was to compare primary and secondary care diagnosis of COVID-19 by a clinician using the recommended ICD codes with RT-PCR in an adult-representative population. Furthermore, in comparing the two diagnostic approaches, we evaluated whether accuracy of diagnosis differed by age, gender, BMI, and pre-COVID-19 obstructive airway diseases and comorbidities.

Methods
This analysis was based on the ongoing West Sweden Asthma Study (WSAS), which is a large populationrepresentative longitudinal cohort study of adults (16-75 years at enrolment) randomly recruited from Västra Götaland county in western Sweden. WSAS constitutes of 42,621 subjects, of which 18,087 were recruited in 2008, while 24,534 were recruited in 2016. A flowchart of the study cohort is shown in Fig. 1. All participants had pre-COVID questionnaire data collected in 2008 and/or 2016, covering various demographic, environmental, and socio-demographic data, as well as the presence of obstructive airway diseases/symptoms and comorbidities.
Using the unique personal identification number given to all residents in Sweden, we collected information on COVID-19 diagnosis for all participants in WSAS, both from the register hosting clinician diagnosis in Västra Götaland region (VEGA) and that hosting RT-PCR diagnosis (SmiNet). While VEGA is a regional database system by Region Västra Götaland that collects information on primary and secondary care contacts for western Sweden, SmiNet is a national database for reporting RT-PCR COVID-19 diagnosis run by the Swedish Public Health Agency. From both registers, we collected data on COVID-19 diagnosis up until 30th September 2021. In VEGA, COVID-19 cases were identified based on the recommended ICD-10 codes, including U07.1, U07.2, U08.9, U09.9, and U10.9. Estimates of the accuracy of clinician diagnosis were determined and compared against RT-PCR as the reference standard. The study was approved by the regional ethics board at the University of Gothenburg as well as the national ethics board.

Statistical analysis
All statistical analyses were carried out using Stata/SE version 17.0 (StataCorp, College Station, Texas, USA). The distribution of pre-COVID-19 demographics by COVID-19 diagnosis and by patterns of COVID-19 diagnosis using clinician assessment and RT-PCR were compared using the Pearson Chi-square test. To assess the performance of clinician diagnosis of COVID-19 against RT-PCR, we calculated sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV), each with its respective 95% confidence interval (95% CI) using Wilson's method without continuity correction [14]. Youden's index (sensitivity + specificity − 1) with 95% CI was obtained using the method based on the empirical proportion estimate proposed by Shan [15]. The Youden's index as a measure of diagnostic accuracy measures the ability of a diagnostic test to balance between the sensitivity and specificity of the diagnostic test. Usually, a value of 50% is used as a cut-off for having an acceptable test result that meet empirical benchmark for diagnostic test to be administered for diagnostic purposes. Accuracy estimates were calculated for the entire study population as well as by age, gender, BMI, and pre-COVID-19 obstructive airway diseases and comorbidities. A significance level of 0.05 was used.

Cohort characteristics
From the total of 42,621 subjects who participated in WSAS pre-COVID-19 questionnaires, 6560 COVID-19 cases were identified. Of these, 3936 were diagnosed with RT-PCR, 5705 were diagnosed by a clinician, and 3081 had a COVID-19 infection diagnosed using both methods (Fig. 1). A comparison of pre-COVID-19 demographic factors between COVID-19 cases and non-cases is presented in Table 1. COVID-19 cases were younger than non-cases, but they were comparable regarding gender, smoking habits, BMI, growing up on a farm, and rural residence during childhood. Although COVID-19 cases were slightly more educated than non-cases, the two groups did not differ regarding social class classification. Regarding pre-COVID presence of respiratory diseases, COVID-19 cases and non-cases did not differ in reported clinician-diagnosed chronic obstructive pulmonary disease (COPD), but COVID-19 cases were slightly more likely to report clinician-diagnosed asthma, particularly allergic asthma, than non-cases. COVID-19 cases were also slightly more likely to report any respiratory symptom in the last year than non-cases. Regarding the presence and number of comorbidities, there were no differences between COVID-19 cases and non-cases ( Table 1). Table 2 presents the number of COVID-19 cases diagnosed by a clinician, RT-PCR, and different combinations of the two approaches by pre-COVID-19 demographic factors. Being diagnosed by a clinician, regardless of RT-PCR diagnosis, increased with increasing age and increasing number of comorbidities. It was also more common among those with less than high school  Any respiratory symptom ‡ education than those with higher educational levels, and slightly more common among overweight or obese subjects than those with BMI < 25 kg/m 2 . However, the proportion of clinician diagnosed COVID-19 was similar between males and females, between smokers and nonsmokers, and by social class classification levels ( Table 2). Being diagnosed by RT-PCR, regardless of clinician diagnosis, was more common among females than males and among non-smokers than current or past smokers. On the other hand, RT-PCR diagnosis was less common among those who grew up on a farm than those who did not, among those who had less than high school education than those with higher education levels, among those with two or more comorbidities than those with one or none, among those aged over 60 years than younger subjects, and among those with COPD, asthma, or any respiratory symptom in the last year than those without these respiratory disorders. There was no significant difference in being diagnosed by RT-PCR by BMI, rural residence during childhood, or social class classification (Table 2).

Relation of pre-COVID demographic factors to patterns of COVID-19 diagnosis
Being diagnosed by both a clinician and by RT-PCR was more common in those aged 45-60 years than other age groups, among overweight subjects than those with BMI ≥ 30 kg/m 2 or BMI < 25 kg/m 2 , and among females compared to males. Subjects with COPD, asthma, and those with any respiratory symptom in the last year were less likely to be diagnosed using this approach than those without these respiratory disorders. Similarly, fewer were diagnosed by both a clinician and by RT-PCR with increasing number of comorbidities, as well as among current smokers compared to non-smokers and past smokers. However, it did not differ by level of education, being raised on a farm, or rural residence during childhood (Table 2).
Being diagnosed by a clinician but not confirmed by RT-PCR was more common among those aged 60 years and above compared to the younger age groups. This was also more common in males than females, in current smokers than past smokers or non-smokers, in those who grew up on a farm than those who did not, in those who had less than high school education than those with † Following the classification of Standard för svensk yrkesklassificering (SSYK), based on International Standard Classification of Occupation 2008 (ISCO-08) ‡ Within the last year, any of the following: (a) attack of shortness of breath, or waking up with chest tightness, or any wheeze, longstanding cough; (b) dyspnea walking on level ground at normal pace, or recurrent wheezing; or (c) productive cough for periods of ≥ 3 months § Asthma, COPD, diabetes, eczema, hypertension, rhinitis, sleep disorder COPD: chronic obstructive pulmonary disease      higher education levels, and in those with COPD, asthma, or any respiratory symptom in the last year than those without these respiratory disorders. Additionally, being diagnosed by a clinician without RT-PCR confirmation increased with increasing number of comorbidities (Table 2). On the other hand, the proportion of subjects diagnosed by RT-PCR but not by a clinician decreased with increasing age. It was lower among those with COPD than those without, among those who grew up on a farm than those who did not, and it decreased with increasing number of comorbidities. In contrast, diagnosis by only RT-PCR was more common among those with higher than lower school education, and among those with BMI < 25 kg/m 2 compared to overweight and obese subjects. For gender, asthma, and report of any respiratory symptom in the last year, there was no significant differences in being diagnosed by only RT-PCR ( Table 2).

Comparison of clinician diagnosis and RT-PCR diagnosis of COVID-19
In all subjects, of those diagnosed using RT-PCR, clinician diagnosis correctly identified 78% as positive, and of those ruled out as negative by RT-PCR, clinician diagnosis correctly ruled out 93%. The validation estimates are given as follows: sensitivity 0.78 (95% CI 0.77-0.80), specificity 0.93 (95% CI 0.93-93), PPV 0.54 (95% CI 0.53-0.55), NPV 0.98 (95% CI 0.98-0.98), and Youden's index 0.71 (95% CI 0.70-0.72). These estimates did not differ between males and females, but the sensitivity increased with increasing age, ranging from 0.69 (95% CI 0.64-0.70) for those aged ≤ 30 years to 0.85 (95% CI 0.82-0.87) for those aged > 60 years (Table 3). Stratifying the results by BMI, sensitivity was lowest among those with BMI < 25 kg/m 2 compared to those with higher BMI, but the specificity, PPV, and NPV were similar for all BMI groups (Table 4). While the sensitivity and PPV were higher among those without asthma, the specificity and NPV were similar between those with and without asthma. The specificity and NPV were also similar between those with and without COPD, but the sensitivity was higher among those with than those without COPD, in contrast to the case with asthma, while the PPV was higher among those without than among those with COPD (Table 4). Stratifying the results by the number of comorbidities, the specificity and NPV were similar across groups, but the sensitivity increased while the PPV decreased with increasing number of comorbidities (Table 4). With further division of comorbidities into "severe" (COPD, diabetes), "moderately severe" (asthma, hypertension), and "mild" conditions (eczema, rhinitis, and sleep disorders), different patterns of results were observed (Table 5). While the specificity was similar across groups, the sensitivity was highest and lowest in those with one and two "severe" comorbidities, respectively. For those with "moderately severe" comorbidities, sensitivity increased with increasing number of comorbidities, but the specificity remained similar across groups. For those with "mild" comorbidities, both the sensitivity and specificity were similar across groups (Table 5).
When we excluded patients who got clinician diagnosis prior to when RT-PCR diagnosis became commonly used in Sweden (28th October 2020), the above results, were overall comparable (Additional file 1: Tables S1-S3).

Summary of key findings
By comparing clinician diagnosis of COVID-19 based on the use of recommended ICD-10 codes with RT-PCR diagnosis, the results from the current study indicate that clinicians were able to correctly classify 78% of true COVID-19 cases (identified by RT-PCR), while 93% of those ruled out by RT-PCR were correctly classified as negative by clinicians. Furthermore, while comparison of clinician diagnosis with RT-PCR did not differ between males and females, there were differences by age, and pre-COVID BMI, COPD, asthma, and presence of comorbidities.

Strengths and limitations of the study
The present study has several strengths. The sample size was relatively large and representative of the general adult population of western Sweden. RT-PCR and clinician diagnosis data were gathered from databases with comprehensive coverage. Furthermore, we included several important background factors that have previously been suggested [1][2][3] to affect the risk of contracting COVID-19. However, some limitations should be considered. The pre-COVID-19 data were collected at least 4 years prior to COVID-19 infection. It is possible that comorbidity, weight, and smoking status have changed during this period. Furthermore, data on weight, height, and comorbidities were assessed using a self-administered questionnaire, a data source which can be prone to misinterpretation [4] and inaccurate assessments by respondents. Several potentially important background factors, such as vaccination status, were unavailable and thus not included in the current analyses. The COVID-19 diagnosis data for this study covered the time up until the end of September 2021. Later data, especially for the large but less clinically severe omicron waves in 2022, could have potentially added valuable data to assess how clinician diagnosis has changed as the disease has changed. On the other hand, full coverage testing for COVID-19 in Sweden during the omicron outbreak has diminished. Finally, it is unclear to what extent clinician diagnosis was influenced by the physician knowing the result of the RT-PCR test prior to the clinical diagnosis. We assume that in many cases, the positive RT-PCR result was available to the physician (and especially if specifically ordered by the doctor), but there may be situations where clinical diagnosis was set first and RT-PCR result was obtained only later. In addition, some patients, generally with mild symptoms, might have not had appointment when having positive RT-PCR result. These may be considered as potential biases to our study. In Sweden, RT-PCR test is the most common test offered, but there are also antigen tests in a small scale, much of these being self-tests. It is unclear as to what extent the antigen test results were available to the physician.

Comparison of findings with previous studies
The use of RT-PCR and a nasopharyngeal swab is the gold standard for detecting SARS-COV-2 infection, but there have been few studies investigating the accuracy of COVID-19 diagnosis, the majority of which compared the accuracy of computed topography (CT) with RT-PCR [16][17][18].
In the United States, Blatz et al. [19], conducted a validation study among paediatric inpatients and revealed that clinician diagnosis through the ICD code of U07.1 had an 89.7% sensitivity for identifying those with RT-PCR confirmed COVID-19 infection, as well as specificity of 99.9%, PPV of 95.5%, and NPV of 99.7% [19]. The sensitivity of 78% obtained in our study in Sweden is lower than the value obtained by Blatz et al. [19], whereas specificity of clinician diagnosis was similarly high in both countries. The difference in sensitivity values found in our study and the study conducted in the United States could potentially be attributed to differences in the study population. While our findings were based on a population-based sample of randomly selected adults and clinician diagnosis of COVID-19 from both primary and secondary care, the study by Blatz et al. [19] on the other hand was a single-centre study of children recruited from inpatient department, thus was not population focused. Moreover, why Blatz and colleagues used only one of the recommended ICD codes (U07.1) to defined clinician diagnosis of COVID-19, the ICD codes used in our study were more comprehensive, including U07.1, U07.2, U08.9, U09.9, and U10.9.
With our data, the PPV of clinician diagnosis of COVID-19 was estimated at 54% (95% CI 53-55). This estimate was substantially lower than that of the study by Bodilsen et al. [20] in which medical records of 710 patients (median age of 61 years) admitted to departments of infectious diseases in Danish hospital from 27 February to 4 May 2020 with an ICD-10 diagnosis code of COVID-19 were reviewed. They found an overall PPV of 99% (95% CI 99-100) for clinician diagnosis. This remained consistently high across all subgroups, including gender, age groups, calendar period, and when stratified by diagnosis code and department [20]. Since the predictive values of a test or diagnostic tool are subject to variation in the prevalence of the disease in the population, this could explain the differences in the PPV found in the Danish study and that found in our study. The Danish study was conducted among mainly older adults (mean age of 61 years) at a time when the COVID-19 pandemic was at its peak in Denmark. The population of WSAS was more encompassing, including adults from at least 20 years and upward, most subjects within 30-60 years of age. According to some studies, older adults suffer disproportionately from the most severe outcomes of COVID-19 [21]. With age comes additional pre-existing conditions, making older adults more Table 3 Estimates and 95% confidence interval (95% CI) of sensitivity, specificity, positive predictive value, negative predictive value, and Youden's index for COVID-19 clinical diagnosis against real-time reverse transcriptase-polymerase chain reaction (RT-PCR) in all COVID-19 cases, and by gender and age NPV: negative predictive value. PPV: positive predictive value 1 Sensitivity + specificity-1  vulnerable to developing a severe form of COVID-19 infection and possibly more predisposed to the infection among them [21].

Interpretation of findings
Our findings indicate that clinician diagnosis of COVID-19 in a general adult population is adequate. Given the high cost, slow test turn-around, and varying test capacity, diagnosis by a clinician's assessment can be a useful supplement at the population level. While the accuracy of clinician diagnosis did not differ between males and females, age-related differences were observed, particularly between the youngest and oldest old groups. This age difference could be due to older patients having a higher probability of presenting to the hospital following a COVID-19 infection than young adults. In a systematic review by Israfil et al. [22], older COVID-19 patients had faster disease progression, higher risk of severe heart attack, higher ICU admission rate, and higher mortality rate than in younger patients; factors that could drive more frequent clinical contacts than among younger adults. In resource-constrained settings where RT-PCR test kits are limited, our findings are reassuring: clinicians are able to diagnose older patients over the age of 60 years to complement for unavailability of RT-PCR tests given the high sensitivity in this age group. Sensitivity and PPV were higher in those without than in those with asthma, but the specificity and NPV were similar. In contrast, the sensitivity of clinician diagnosis was higher in those with COPD than in those without COPD, while the PPV was higher in those without COPD than in those with COPD, the specificity and NPV being similar. It is unclear the reasons for the contrasting findings between patients with asthma and those with COPD, but studies have shown that adults with COPD are more affected by COVID-19 than adults with asthma. Karlson et al. [23] found that severe COVID-19 and mortality were more common among patients with COPD than those with asthma. The larger proportion of adverse outcomes in COPD patients than asthma patients was attributed to the fact that COPD patients had a higher average age than asthma patients. Additionally, COPD patients are more prone to respiratory infections due to reduced innate and adaptive immune responses than asthma patients [24], which would increase the frequency of clinical contacts among COPD patients compared to asthma patients.
While the specificity remained similar, the sensitivity of clinician diagnosis of COVID-19 increased with increasing number of "severe" comorbidities, but not "moderately severe" or "mild" comorbidities. This is not surprising given that underlying health conditions or comorbidities, like hypertension or diabetes mellitus, have been identified as risk factors for COVID-19 and can facilitate a severe course and rapid progression of the disease [25]. This could explain why clinicians are more likely to diagnose COVID-19 in patients with increased number of comorbidities, particularly those with severe comorbidities. These results indicate that clinician diagnosis of COVID-19 is adequate in patients with underlying health conditions or comorbidities, and thus can be used for screening purposes to supplement RT-PCR in this group of patients.

Clinical implications of findings
The findings of this study indicate that in the general population, clinician diagnosis is adequate and valid for identifying adults with COVID-19, particularly in aged patients and those with underlying health conditions or comorbidities. These results are important and assuring, particularly in areas where RT-PCR for COVID-19 testing is costly or access is insufficient, such as in lowresource settings. There have been reports of a lack of large-scale COVID-19 testing in many Sub-Saharan African countries, as well as long wait times for RT-PCR tests and long turnaround times due to the high volume of requests, frequent stockouts of reagent and sample collection kits, and power outages [26,27]. In such cases, clinician diagnosis of COVID-19 can supplement RT-PCR for COVID-19 diagnosis, allowing for timely provision of appropriate treatment as well as advice on prevention and isolation strategies to inform disease control response. However, this should be used with caution and should not entirely replace RT-PCR for COVID-19 diagnosis seeing as some of the symptoms of COVID-19 overlap with those of common infections, such malaria, common cold, dengue, and pneumonia, making diagnosis difficult without an appropriate diagnostic test [26]. Furthermore, given that our produced accuracy estimates are based on population-level data, caution should be taken in interpreting the data in the clinical setting, for which clinical studies are imperative. Overall, the estimates of the Youden's index in the total population and by the examined subgroups were generally above 70%, well above the benchmark of 50% for an acceptable diagnostic test. This means that clinician diagnosis of COVID-19 using the recommended ICD codes has an acceptable balance between specificity and sensitivity.

Conclusion
The accuracy of clinician's diagnosis for COVID-19 is adequate at the population level for adults, regardless of gender, pre-COVID-19 BMI, and obstructive airway diseases, thus can be used for screening purposes to supplement RT-PCR, particularly among aged adults and those with increased number of comorbidities. Pre-COVID-19 factors may influence COVID-19 diagnosis based on diagnostic method. Such information can be useful for planning future research and screening efforts for COVID or other similar outbreaks.
Additional file 1. Results from sensitivity analyses.