Importance of external quality assessment for SARS-CoV-2 antigen detection during the COVID-19 pandemic

Background: Antigen testing has become an essential part of fighting the ongoing COVID-19 pandemic. With the continual increase in available tests, independent and extensive comparative evaluations using data from external quality assessment (EQA) studies to evaluate test performance between different users are required. Objectives: An EQA scheme was established to assess the sensitivity of antigen tests and the potential impact of circulating SARS-CoV-2 strains on their performance. Study design: Panels were prepared for three challenges in 2021 containing inactivated SARS-CoV-2-positive samples of various genetic strains (including variants of concern, VOCs) at different concentrations, and negative samples. Data was analysed based on qualitative testing results in relation to the antigen test used. Results: Participants registered for each individual challenge in any combination. In total, 258 respondents from 27 countries worldwide were counted submitting 472 datasets. All core samples were correctly reported by 76.7 to 83.1% at participant level and by 73.5 to 83.8% at dataset level. Sensitivity differences could be shown in viral loads and SARS-CoV-2 strains/variants including the impact on performance by a B.1.1.7-like mutant strain with a deletion in the nucleoprotein gene. Lateral flow rapid antigen tests showed a higher rate of false negatives in general compared with automated point-of-care tests and laboratory ELISA/immunoassays. Conclusions: EQA schemes can provide valuable data to inform participants about weaknesses in their testing process or methods and support ongoing assay evaluations for regulatory approval or post-market surveillance.


Introduction
Reliable SARS-CoV-2 diagnostic assays are an important component for COVID-19 infection and outbreak control. In late 2020/early 2021, the use of antigen tests [laboratory-based and point-of-care (PoC) tests, including lateral flow rapid antigen tests (LFTs)] were integrated into testing strategies to complement molecular detection of SARS-CoV-2 in many countries [1][2][3]. While nucleic acid amplification tests (NAATs) remain the gold standard for SARS-CoV-2 detection due to high sensitivity and specificity [4], antigen testing has become a diagnostic pillar and essential part of fighting the ongoing pandemic. In case of LFTs, the technology offers the possibility of simple, low-cost, early detection of infectious COVID-19 cases, and can be used for screening or testing in settings which are removed from clinical and laboratory environments (e.g., long-term care facilities, schools, or mobile testing units), especially where NAAT testing capacity is reduced or a rapid test turnaround time is required [5][6][7][8][9][10].
The number of commercially available antigen tests has increased dramatically [11]. In Europe, 950 CE-marked antigen tests (560 rapid) are currently on the market [12]. With continual increases in available tests, the number of validation studies has also increased but there are still only a small number of independent extensive comparative evaluations [13][14][15]. Also, data from external quality assessment (EQA) studies to evaluate the test performance amongst different users are required.
Here, we present the results of an EQA scheme with three challenges to assess the sensitivity of antigen tests and the potential impact of circulating SARS-CoV-2 strains on their performance, introduced in 2021 by Quality Control for Molecular Diagnostics (QCMD, Glasgow, Scotland, UK).

Sample preparation and composition of panels
Panels for the QCMD SARS-CoV-2 Antigen Testing EQA Study were produced under ISO 13485 manufacturing conditions for three distributions/challenges in 2021 (C1A: May to July, C1B: August to October, and C1C: November to December). Composition and design of the panels followed QCMD's Code of Practice and ISO 17043 requirements for proficiency testing. The panels included five or six samples containing preparations from inactivated SARS-CoV-2-positive supernatants of various strains (including variants of concern, VOCs) at different concentrations, or negative samples with either stabilisation buffer (viru-sPHIX-P9™, RNAssist, Cambridge, UK) or transport medium (proprietary recipe) only, the latter was chosen as the standard matrix for the EQA scheme.

Distribution of panels and EQA testing
EQA panels were distributed under ambient conditions as liquid samples to registered participants with instructions on how to process the samples and submit results. Participants were asked to treat and test the material according to their routine testing protocol. Therefore, 'swab-based testing' (by placing a swab into the sample) or 'liquid sample-based testing' (by pipetting a specified volume of the sample into an assay buffer) were allowed in accordance with manufacturer instructions. Upon completion of the EQA testing, participants could report their results ('positive', 'negative', or 'not determined') using a result return form, either accessible via a mobile device by scanning a QR code provided with the panel, or via the QCMD website (www.qcmd. org). Workflow details were collected as part of the results submission. Participants were also surveyed to provide information about their organisation type and organisations accreditation status.

Analysis of results from participants
Each submission using the result return form was considered to be a dataset for an individual antigen test method (workflow) together with the reported results for each sample. The use of multiple assays was recorded on separate forms. For this EQA scheme, qualitative results were evaluated. The assessment focussed on sensitivity of workflows at relevant viral loads (i.e., at least 6.4 ddPCR log 10 copies/mL). However, the impact of circulating SARS-CoV-2 strains/variants was also taken into consideration.
Overall performance was determined by comparison of results against the EQA consensus. Where there were sufficient datasets available (≥5 datasets), the testing methods used were grouped to generate a specific method assessment group consensus. All participants received an EQA report with individual performance and peer group assessment on completion of a challenge.
Participants were expected to report the 'core' proficiency samples correctly within the EQA scheme (i.e., all SARS-CoV-2-positive samples with a viral load of 6.4 ddPCR log 10 copies/mL as well as the negative control in transport medium for challenges C1A to C1C, and further the SARS-CoV-2-positive sample with 7.4 ddPCR log 10 copies/mL as well as the negative control in stabilisation buffer for challenge C1A).
Furthermore, during the testing process prior to distribution to participants it was demonstrated that the number of positive tests decreased at a concentration of 6.4 ddPCR log 10 copies/mL, indicating the observable cut-off for LFTs (data not shown). Educational samples with lower concentration for the different matrix types were included in the panels to further determine the sensitivity within a wider range of different testing methods without disadvantaging the performance assessment for participants.

Statistical analyses
Test performance data was displayed with the frequencies and percentage of correct results. Binary logistic regression models were applied using a correct answer as the response variable. Odds ratios (ORs), their two-sided Wald 95% confidence intervals and p-values when testing against an OR of 1 were shown. For comparison across or within technologies, the ORs were adjusted for strains/variants. All analyses were performed using the statistical software package SAS version 9.4. (SAS Institute Inc., Cary, USA).

Results
In this EQA study, 197 participants from 27 countries worldwide ( Fig. 1) took part in at least one of the three challenges (C1A, C1B, C1C). Notably, most participants were from Zambia (n = 89) as laboratories were enroled by their public health authority, followed by participants from Great Britain (n = 32), Poland (n = 22), and Italy (n = 12). The remaining were from other countries and/or regions. As participants could register for each individual challenge in any combination, in total 258 respondents (77 in C1A, 138 in C1B, and 43 in C1C) were counted submitting 472 datasets ( Table 1). Multiple datasets were reported by 42 respondents for the same assay or combination of different assays or technologies. Fig. 1 shows the proportions of participants and their frequency of participation as respondents per country in detail.
During the results submission, in total 200 testing sites have been reported for which the absolute numbers are depicted in Fig. 2. Most of the testing sites were microbiology laboratories (n = 152, 76.0%), followed by non-laboratory organisations (n = 41, 20.5%) and test manufacturers (n = 7, 3.5%). Furthermore, of these, 49 (24.5%) have an accredited or certified quality management system (including regional requirements for four testing sites), 25 (12.5%) are pending accreditation (for ISO 15189/ISO 22870, or ISO 17025); 42 (21.0%) are not accredited or certified, five (2.5%) indicate that they don't require accreditation/certification, and 79 (39.5%) didn't provide specific information for the survey. No remarkable difference in the performances between 'laboratory testing sites' and 'non-laboratory testing sites' (excluding test manufacturers) were observed with 94.3% versus 94.5% overall percentage of correct reported results (p = 0.1479), respectively. However, it must be noted that the non-laboratory testing sites in this study were mostly run by testing professionals or were linked to clinical or diagnostic laboratories.
All core samples were correctly reported by 83.1% (64/77) of the participants and in 83.2% (119/143) of datasets for challenge C1A (correct range per core sample: 89.5-96.5%, Table 1   For viral concentration in SARS-CoV-2 positive samples (in stabilisation buffer as well as transport medium), the performance comparison against the results for a viral load of 7.4 ddPCR log 10 copies/mL as reference level shows that the percentage of correct reported results for samples with a viral load of 6.4 ddPCR log 10 copies/mL were not significantly different (p = 0.2328), but significantly lower in case of 5.4 ddPCR log 10 copies/mL (p <0.0001). This viral load dependency agrees with results from the pre-testing process of the EQA scheme (data not shown).
For SARS-CoV-2 strains/variants in core samples with transport medium as standard matrix, compared to the results reported for the SARS-CoV-2 Lineage B1 strain as reference, the proportion of correctly reported results for the Alpha variant was not significantly different, but higher for the Delta variant (p = 0.0272) and lower for the Alpha N mutant strain (p <0.0001). The last observation clearly shows that deletions in the N gene might have an impact on the performance and increase the risk of false-negative tests.
In total, 32 LFTs, seven PoC platforms and four laboratory ELISA/ immunoassay tests were used by the participants within the three EQA challenges. Supplementary Table S1 provides an overview of the assays with their respective number of datasets per challenge and grouped in corresponding method groups (specific method groups were assigned for assays with ≥5 datasets in all three challenges). All reported antigen tests target the N protein, except the tests provided by one manufacturer (Wuhan Life Origin Biotech Joint Stock) for which information was not available. However, in most of the cases the number of reported workflow datasets per assay is too limited to provide here strong statements on the performance for individual assays. The overall percentage of correct reported results ranged between 88.9 and 97.7% for the evaluated method groups in Supplementary Table S1, for all core samples in transport medium.
When comparing all technologies, the adjusted odds ratio for LFTs relative to automated PoC test platforms is significantly less than 1 [OR= 0.405, 95% confidence interval (0.227 -0.723); p = 0.002], consistent with a lower observed percentage of correct results for the former. In comparison, laboratory ELISA/immunoassay tests have OR= 0.626, 95% confidence interval (0.133 -2.943); p = 0.553. The observation for LFTs correlates with a higher rate of false negatives in comparison with the other technologies (p = 0.002 when including the results for the Alpha N mutant strain in the statistical analyses; p = 0.0145 excluding the Alpha N results from the analyses).

Discussion
This study provides a comparative performance evaluation of available antigen tests as applied by different users. The overall qualitative performance of participants was at an acceptable level within the three challenges and showed comparable success rates to our first EQA scheme for SARS-CoV-2 molecular detection in 2020 [19]. The study objective was the establishment of a new EQA programme for SARS-CoV-2 antigen detection to support testing infrastructures with high-quality proficiency testing options in line with the expansion of testing capabilities during the ongoing pandemic. Continuous monitoring improves quality within testing sites by assessing the performance of antigen tests in routine use.
It should be noted that the panels offered here focused on the sensitivity aspect (viral loads >10 6 copies/mL as acceptable limit of detection for LFTs [20]) with the impact of circulating SARS-CoV-2 strains/variants taken into consideration (as required for in vitro diagnostic products [21]). The performance on specificity, thus false-positive results due to cross-reactivity with other human coronaviruses or respiratory viruses was not specifically assessed within the study but may be included in future panels if applicable. Also, the number of negative samples was limited and may not be as representative as the conclusions presented for the SARS-CoV-2 positive samples that compare a wider peer group.
The EQA challenges showed that sensitivity and false-positivity in relation to true negative samples remained variable for individual methods. Workflows that did not correctly identify one or more of the core positive samples with SARS-CoV-2 Lineage B1, Alpha variant or Delta variant (emerging worldwide during 2020/2021 [22]), can result in misdiagnoses of infectious cases even with high viral loads. As antigen tests are generally less sensitive compared to molecular tests, where a negative result is returned, it is recommended to exclude an infection by NAAT if a person is suspected to have COVID-19.
Compared to other technologies used in this study, LFTs showed in general a higher rate of false negatives. A recent manufacturerindependent review of a total of 122 CE-marked antigen rapid tests revealed much of this can depend on the test itself, as one in five tests failed the minimum sensitivity of 75% for panel specimens with a quantification cycle (Cq) value ≤25 [15]. In addition to the viral load (as also proven in our study), intrinsic factors of antigen tests (i.e., test sensitivity/specificity), the testing process (e.g., using a swab or pipetting the sample) or the visual interpretation of test results (if a read-out device is not used) may have an influence on the performance and results. Results could therefore vary amongst participants even when the same test has been used. This becomes apparent when results have been accompanied by comments like 'faint test line' or 'weak positive', indicating that training of users for visual interpretation is important for reliable detection.
The results for the Alpha N mutant strain showed that visual interpretation of LFTs was quite challenging and some automated PoC platforms revealed in certain cases that the performance was also reduced, although all laboratory ELISA/immunoassay tests could detect this mutant strain. While it is assumed that there would be less impact on antigen tests by VOCs than there would be for molecular tests as most of the variants (including Omicron) contain mutations mainly in the S (spike) gene resulting in the observed S gene drop out in particular molecular assays [23], this observation highlights the importance to continuous monitoring for the potential impact of circulating variants and strains through EQA studies for antigen tests where data is still limited.
It is remarkable that only 37% of the participating testing sites reported to be accredited/certified or are pending accreditation. Although we could not show a remarkable difference of the performances between laboratory testing sites and non-laboratory testing sites, quality assurance is an important aspect and regional requirements for antigen testing can vary for users who are no longer limited to laboratory and testing professionals.
Regular participation in EQAs can help to verify if samples are handled properly, results are interpreted correctly, and procedures are followed. EQA schemes can provide valuable data to inform participants about weaknesses in their testing process or methods and support ongoing assay evaluations for regulatory approval or post-market surveillance, which is of particular importance during the pandemic where data about the relative performance of assays and independent extensive comparative evaluations are continuously required. As we are moving to a new phase of the pandemic, a single variant like Omicron with multiple descendent lineages dominates the COVID-19 surge worldwide. To keep the EQA schemes state of art design based on the latest epidemiological information available, for example inclusion of this and further relevant variants/strains in future schemes is of importance. This allows the quality of COVID-19 diagnostic testing to be continuously improved and ensured.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Declaration of Competing Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.