Summary of Studies
The systematic search resulted in 40,595 records. After removing duplicates, 20,431 articles were left for title and abstract screening. Out of these, 807 were considered eligible for full-text screening, of which a total of 114 were finally included. Three additional articles were included after screening the articles found on the FIND website. Thus, a total of 117 articles incorporating 159 data sets reporting on 24 unique iAg tests were included in the review (Fig. 1).
Study Description
A total of 109 of the 117 studies included in the review were conducted in high-income countries (HICs) and only seven studies were conducted in low- and middle-income countries (LMICs) (25–31). Two studies were multicenter studies conducted in the USA/India and the UK/USA (32, 33).
A case-control design was used in 32 of the studies (27.4%) (34–65), while the remaining 85 (72.6%) were cohort studies. RT-PCR was the reference method for all but one study that used viral culture (66). Out of a total of 159 data sets, 44 (27.7%) reported on adult populations, 7 data sets (4.4%) on children, and 32 data sets (20.1%) on mixed populations. In less than half of the data sets (n = 76; 47.8%), the age group of the target population was not reported. Across all the studies, the main reasons for testing were screening regardless of symptom status (70/159 data sets, 44.0%), contact investigations (67/159 data sets, 42.1%), and/or presence of symptoms (117/159 datasets; 73.5%). In 36 data sets (22.6%) the reasons for testing were not reported by the authors.
The most common specimen used for iAg testing was nasopharyngeal (‘NP’; 107 data sets, 67.3%). Other studies used combined anterior nasal/mid-turbinate (AN/MT) specimens (35 data sets, 22.0%), saliva (3 data sets, 1.9%), or oropharyngeal (‘OP’; 1 data set, 0.6%) specimens. The specimen type used was unclear in seven studies (13 data sets, 8.1%). Two of the studies pooled nasopharyngeal samples from multiple patients for testing ((52, 53); also see Supplementary File S2).
Of the 24 unique iAg tests evaluated across all studies, 15 were suitable for POC use and nine were lab-based immunoassays. The most frequently used iAg test was the Sofia SARS Antigen FIA test by Quidel (US; henceforth called Sofia) with 22 data sets (13.8%) and 20,970 (21.8%) tests. The STANDARD F COVID-19 Ag FIA (SD Biosensor Inc., South Korea; henceforth called STANDARD F) was assessed in 18 data sets (11.3%) with 19,617 (20.4%) tests and the BD Veritor System for Rapid Detection of SARS-CoV-2 (Becton, Dickinson and Company [BD], MD, US; henceforth called BD Veritor) in 17 data sets (10.7%) with 11,878 (12.4%) tests, followed by the LumiraDx SARS-CoV-2 Ag test (LumiraDx UK Ltd., UK; henceforth called LumiraDx) with 24 data sets (15.1%) and 10,136 (10.5%) tests. Additional details on each of the iAg tests included in the review are provided in the supplements (Table S1 and File S1).
Methodological Quality of Included Studies
The included studies were found to have variable risk of bias, but high applicability (Fig. 2). Of the data sets evaluated, only 37 (23.3%) data sets included a representative study population by avoiding inappropriate exclusions or a case-control design, resulting in a low risk of bias. A majority of studies were carried out in a routine practice setting, resulting in a high applicability of the included study population to the review in terms of patient selection in a majority of data sets (n = 145; 91.2%), while the applicability of the study population was unclear in the remaining data sets (n = 14; 8.8%).
The interpretation of the index test results was of low concern for 59 (37.1%) data sets because it was carried out without knowledge of the results of the reference standard; however, the majority of the data sets (n = 96; 60.4%) failed to report on the blinded interpretation of the index test results. A predefined threshold was used (n = 138; 86.8%) or tests were conducted in accordance with IFU in a majority of the studies (n = 120; 75.5%). Index test applicability was judged to be of low concern in 120 (75.5%) data sets, which explicitly mentioned IFU compliance, but high in the remaining 39 (24.5%).
In 104 data sets (65.4%), the reference standard selection, its conduct, or its interpretation was insufficiently described and thus resulted in an unclear risk of bias, which was primarily caused by inadequate reporting of the results blinded interpretation. The risk of bias in this aspect was low for the remaining data sets (n = 55; 34.6%) since the reference standard was administered prior to the iAg tests, and/or the operator administering the reference standard was blinded to the iAg test results, thereby minimizing the potential for bias. The applicability of the reference test was determined to be of low concern for all data sets, because the target condition for this review was defined by viral culture or RT-PCR.
Samples taken simultaneously were used for index and reference testing in 140 (88.1%) of the data sets. In 100 (62.9%) data sets, a single assay was consistently used as the reference, whereas multiple RT-PCR assays were used as the reference in 43 (27.0%) of the data sets (specified in S1). As a result, while also accounting for the possibility that not all patients were included in the analysis, the risk of bias related to flow and timing was assessed to be low in 54.7% of the data sets, intermediate in 27.0%, high in 5.7% and unclear in 12.6%.
The test manufacturers provided financial support for 41 (35.0%) of the studies. In addition, they coauthored 15 of these and 2 additional studies, accounting for 14.5% of all studies. Moreover, a conflict of interest due to receiving funding from or employment with the test manufacturer was disclosed in 34 studies (29.1%) (File S3).
Analysis of small study effects, which may indicate publication bias, yielded no significant evidence for such effects (p = 0.39) (Figure S1).
Performance of iAg tests in comparison to RT-PCR and/or viral culture
The pooled estimates of sensitivity and specificity for all iAg tests were 76.0% (95% CI 72.7 to 79.0) and 98.5% (95% CI 98.1 to 98.8), respectively, based on the bivariate analysis of the 127 data sets from a total of 99 studies that evaluated 83,993 tests (Fig. 3A). This was slightly higher than a pooled sensitivity of 74.6% (95% CI 71.7 to 77.6) obtained from the univariate analysis of 144 data sets (Fig. 3B). The point estimate of pooled specificity was the same in a univariate analysis of 133 data sets (98.5%; 95% CI 98.0 to 98.9) (Fig. 3C).
Lumipulse G had the highest pooled sensitivity (86.5% [95% CI 79.9 to 91.2]) but the lowest pooled specificity (96.4% [95% CI 94.2 to 97.8]) among the eight tests that were eligible for test-specific meta-analysis (Fig. 4A). LIAISON had the lowest pooled sensitivity (62.5% [95% CI 47.1 to 75.8]). VITROS had the highest pooled specificity at 99.7% (95% CI 99.1 to 99.9). The POC-applicable digital immunoassay BD Veritor had a pooled sensitivity of 73.9% (95% CI 63.2 to 82.3) and a pooled specificity of 99.4% (95% CI 98.9 to 99.7). Among the fluorescence immunoassays (FIAs) with sufficient numbers of data sets (> 4), LumiraDx had the highest pooled sensitivity at 81.1% (95% CI 73.2 to 87.0) but the lowest specificity at 97.3% (95% CI 95.7 to 98.3).
The pooled sensitivity and specificity for IFU-conforming data sets (n = 95) were estimated to be 75.8% (95% CI 71.9 to 79.4) and 98.5% (95% CI 98.1 to 98.9), respectively (Fig. 4B). The pooled performance for data sets without reported IFU conformity showed slightly higher sensitivity (76.5%; 95% CI 70.0 to 82.0) and similar specificity (98.4%; 95% CI 97.4 to 99.0).
The highest pooled sensitivity, 78.2% (95% CI 74.7 to 85.5), was observed when the wild-type SARS CoV-2 was predominant (64 data sets, 50.4%) (Fig. 4C). The pooled sensitivity across all studies conducted during a wave of the SARS CoV-2 Alpha variant (11 data sets, 8.7%) was 54.8% (95% CI 37.3 to 71.2), which was the lowest. Based on only six data sets, the pooled sensitivity during the Delta variant was determined to be at the center, at 74.5% (95% CI 48.8 to 90), but having the highest specificity (99.2%; 95% CI 96.6 to 99.8). Only two studies were conducted during the wave of the Omicron variant (2 data sets, 1.6%), with sensitivities ranging from 76.5–88.5% (63, 67).
After analyzing the pooled accuracy per intended setting, the tests intended for lab-based use achieved a sensitivity of 75.9% (95% CI 69.9 to 80.9) and therefore performed similarly to the POC tests, with 76.1% (95% CI 72.1 to 79.7) and specificity being almost identical (Fig. 4D).
When only NP samples (88 data sets) were considered, the pooled sensitivity and specificity were estimated to be 76.5% (95% CI 73.0 to 79.7) and 98.4% (95% CI 97.8 to 98.8), respectively (Fig. 4E). Analysis of combined AN/MT samples resulted in a pooled sensitivity of 80.0% with a wide confidence interval (95% CI 73.5 to 85.2) and a pooled specificity of 98.5% (95% CI 97.7 to 99.0).
Subgroup Analyses
By age
Thirty data sets with 14,451 samples from adults (age ≥ 18 years) were available for a meta-analysis, and the results showed a pooled sensitivity and specificity of 72.9% (95% CI 63.2 to 80.9) and 98.8% (95% CI 98.0 to 99.3), respectively (Fig. 5A). Only five datasets with 1,655 samples were available for the pediatric group (age < 18 years) with, compared to adults, a higher pooled sensitivity (81.9%, 95% CI 63.5 to 92.2) and comparable pooled specificity (98.3%, 95% CI 95.9 to 99.3).
By presence of symptoms
Compared to that in the symptomatic group (sensitivity 79.9%; 95% CI 76.5 to 83.0), the pooled sensitivity in the asymptomatic group was at 50.3% (95% CI 33.5 to 67.0) substantially lower (Fig. 5B). Both subgroups had comparably high specificity.
By duration of symptoms
Data from 1,724 people who were tested within 7 days of the onset of their symptoms were available for the analysis, compared to a very small number of patients (177) who were tested ≥ 7 days after the onset of symptoms (Fig. 5B). In comparison to an 84.6% (95% CI 78.2–89.3%) sensitivity for people tested within 7 days of the onset of symptoms, the pooled sensitivity for people tested ≥ 7 days was much lower with only 57.8% (95% CI 48.5–66.6%). The pooled specificity estimates were 98.4% (95% CI 97.3 to 99.1) in the < 7 days group and 97.0% (95% CI 86.2 to 99.4) in the ≥ 7 days group.
By Ct values
Fifty-five studies (255 data sets) reported on performance values based on various Ct value groups, allowing for univariate meta-analysis, which showed that higher Ct values were associated with decreased pooled sensitivity (Fig. 5C). For the Ct value groups < 20 and ≥ 20, the pooled sensitivities were 99.6% (95% CI 98.8 to 100.0) and 94.8% (95% CI 91.0 to 98.6), respectively. For the Ct value group < 25, the pooled sensitivity was 97.8% (95% CI 96.7 to 98.5) but decreased to 85.3% (95% CI 81.7 to 89.0) for the CT value group < 30. The pooled sensitivity for the Ct value group ≥ 30 was estimated to be very low at 26.4% (95% CI 15.8 to 37.1).
Sensitivity Analyses
When case-control studies were excluded the sensitivity and specificity remained similar to the overall pooled sensitivity and specificity estimates with 76.6% (95% CI 72.4 to 80.3) and 98.5% (95% CI 98.0 to 98.9), respectively (Figure S2). Exclusion of preprints did not change the sensitivity or specificity significantly (75.8% [95% CI 72.3 to 79.0] and 98.4% [95% CI 98.0 to 98.8]) (Figure S3). Data from manufacturer-independent studies (68 data sets) produced results with a similar specificity of 98.4% (95% CI 97.8 to 98.8) and a slightly lower sensitivity of 74.4% (95% CI 69.5 to 78.7) (Figure S4).
The studies were also categorized by country income level of the country where participants were enrolled. No significant differences were found between high-income countries (HICs) and low- or middle-income countries (LMICs) for pooled sensitivity (HICs: 75.1%; 95% CI 71.5 to 78.4; LMICs: 76.6%; 95% CI 73.4 to 79.5) or specificity (HICs: 98.6; 95% CI 98.2–98.9; LMICs: 97.1; 95% CI 93.7 to 98.7), with overlapping confidence intervals (Figure S5).