Increasing the Efficiency of a National Laboratory Response to COVID-19: a Nationwide Multicenter Evaluation of 47 Commercial SARS-CoV-2 Immunoassays by 41 Laboratories

ABSTRACT In response to the worldwide pandemic of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the subsequent antibody tests that flooded the market, a nationwide collaborative approach in the Netherlands was employed. Forty-one Dutch laboratories joined forces and shared their evaluation data to allow for the evaluation of a quantity of serological assays for SARS-CoV-2 that exceeds the capacity of each individual laboratory. As of April 2020, these performance data had been aggregated and shared in regularly updated reports with other laboratories, Dutch government, public health organizations, and the public. This frequently updated overview of assay performance increased the efficiency of our national laboratory response, supporting laboratories in their choice and implementation of assays. Aggregated performance data for 47 immunoassays for SARS-CoV-2 showed that none of the evaluated immunoassays that detect only IgM or IgA met the diagnostic criteria, indicating that they are not suitable for diagnosing acute infections. For the detection of IgG, only the Biozek Corona virus COVID rapid test, Euroimmun SARS-CoV-2 IgG, and Wantai SARS-CoV-2 antibody (Ab) ELISA met predefined performance criteria in hospitalized patients where samples were collected 14 days post-onset of symptoms (DPO), while for patients with mild or asymptomatic infections, only the Wantai SARS-CoV-2 Ab ELISA met the predefined performance criteria if samples were collected 14 days postonset. Here, we describe this unique nationwide collaboration during the onset of the COVID-19 pandemic; the collected data and their results are an example of what can be accomplished when forces are joined during a public health crisis.

including the Netherlands, had molecular testing implemented in at least one laboratory, providing the basis for a large scale up in molecular testing capabilities throughout Europe in the following weeks (2). Although laboratories in Europe were highly efficient, a complication was that most initially relied on the same protocols and platforms (3), while shortages in high-quality supplies for diagnostic testing were building up (4,5). These shortages even resulted in supranational inventories to identify critical issues in the supply chains and coordination of the procurement of supplies (6).
In March 2020, the Dutch government installed the National Test Capacity Coordination Structure (LCT) to monitor and ensure a sufficient and accurate test capacity across the nation (7). Although the primary focus and priority of the LCT was the molecular diagnosis of SARS-CoV-2, a Serology Taskforce was installed under the LCT at second instance when the offers for serological assays were building up and concerns arose about an overstrained immunoassay market. The taskforce consisted of 10 medical microbiology experts and was coordinated by the Institute for Public Health and the Environment (RIVM). The taskforce was requested to advise the LCT on the use of serology in general and of specific immunoassays in patient management and the control of the pandemic. The efforts of the Serology Taskforce consisted of monitoring the usefulness of serological testing in different patient populations and study designs, advising on national policy regarding employment of serology tests for mitigation strategies, and coordinating an efficient laboratory response in the Netherlands regarding the application of serological tests.
One of the needs identified was to provide rapid and evidence-based advice on the use of specific immunoassays to support laboratories in their choice of assay implementation and to support the LCT in guaranteeing access to those tests. The market for immunoassays is overwhelming, with a total of 605 commercialized immunoassays listed by the Foundation for Innovative New Diagnostics (FIND) in their SARS-CoV-2 diagnostic pipeline as of 31 March 2021 (8).
In the Netherlands, a nationwide effort was undertaken to collect, aggregate, and share evaluation data on immunoassays at a national level. Here, we describe the outcomes of our unique approach to a collaborative Dutch laboratory response to the SARS-CoV-2 pandemic which resulted in a multicenter evaluation of 47 commercial SARS-CoV-2 immunoassays. The outcomes of the assay evaluations by 41 laboratories are presented as an example of what can be accomplished by such a nationwide approach in which forces are joined to support international SARS-CoV-2 laboratories in informed decision-making on immunoassay implementation.

MATERIALS AND METHODS
Data collection and dissemination of results. Inventories of ongoing immunoassay evaluations were carried out via the Dutch Society for Medical Microbiology (NVMM) (9). Starting 28 March 2020, weekly requests for data sharing were sent to members of the NVMM, consisting of 440 medical microbiologists, medical molecular microbiologists, and their trainees, employed by 1 of the approximately 50 registered Dutch Medical Microbiological laboratories. Data reported by laboratories that are ISO 15189:2012 (10) accredited with a flexible scope in the fields medical microbiology for codes MM.VID.15 or MM.VID.16 or medical immunology for codes MI.IFS.01, MI.IFS.02, or MI.IFS.04 were summarized and shared in regularly updated reports (https://www.nvmm.nl/vereniging/nieuws/update-taskforce-serologie-15 -juli-2020/). All laboratories voluntarily contributed data, accompanied by available metadata such as date of onset, sampling date, disease severity, and age of patients, and their permission for sharing their aggregated data was obtained.
By the end of July 2020, 41 laboratories were contributing to the rapid sharing of aggregated data across laboratories (Fig. 1). In the period from 13 April 2020 to 17 July 2020, 16 reports were shared. By then, information had been collected and shared for 47 different immunoassays. Collection and sharing of data in updated reports was done at an almost weekly basis to ensure a rapid access to new relevant data by (inter)national laboratories and other stakeholders.
The reports drafted between 13 April and 5 May 2020 were privately shared with Dutch medical microbiologists via the NVMM; the national Outbreak Management Team; the Dutch ministry of Public the Institute for Public Health and the Environment (RIVM) (12). Reports as of 2 July were disseminated in English upon many requests.
Multicenter immunoassay evaluation. Contributing laboratories selected the assays they evaluated and the evaluation panels. All laboratories performed their evaluations in accordance with the Declaration of Helsinki. Informed consents were obtained, or other procedures required by their local institutions regarding research to improve diagnostic procedures with the use of samples obtained for routine clinical diagnostics were followed. All evaluated assays as of 17 July 2020 and their details are depicted in Table 1.
For determination of the sensitivity of the immunoassays, samples were used from reverse transcription-PCR (RT-PCR)-confirmed COVID-19 patients from all age groups, although they were predominantly from adults ($19 years old). Data on sensitivity were aggregated and stratified by severity of infection and timing of sample collection, i.e., before or after 14 days post-onset of symptoms (DPO). Hospitalized COVID-19 cases were classified as severe cases and nonhospitalized cases as mild. If disease severity or DPO was not known, samples were excluded. For determination of the specificity, both population samples collected before December 2019 and samples from syndromic patients with respiratory infections with potentially cross-reactive microorganisms, e.g., common coronaviruses, were included. Samples negative for SARS-CoV-2 using RT-PCR, which were obtained during the pandemic, were excluded. Equivocal results were considered positive in sensitivity as well as in specificity cohorts. The aggregated results from 41 laboratories of sensitivity and specificity, including the 95% confidence interval (CI) based on Wilson score (13), were reported here.
For individual patient diagnostics, the predefined performance criteria for IgM and IgG antibodies, for both separately, were .95% sensitivity and .98% specificity if samples were obtained after 14 DPO. The same performance criteria were posed for epidemiological and serological prevalence studies but only for IgG antibodies. These predefined performance criteria are not absolute but were recommendations from the Serology Taskforce based on expert opinion and also used by other European member states (14). However, the applicability of these criteria will have to be continuously assessed by local experts in each specific context of use.
Additional to determining the sensitivity of the immunoassays with RT-PCR as a reference, three laboratories determined test sensitivity with a virus neutralization test (VNT; 50% plaque reduction/neutralization titer [PRNT 50 ]) as a reference test, and these results were aggregated and reported here.
Evaluation of nationwide collaborative approach. Early July 2020, the added value of the collaborative national laboratory response was assessed with a short online questionnaire sent out through the NVMM and made using the online tool Typeform.
This survey consisted of questions about (i) the already implemented immunoassays, (ii) information sources that were used for the assay selection, (iii) if this nationwide collaborative approach was considered valuable and contributing to an increase of efficiency in laboratory response, (iv) if laboratories would be interested in a similar approach for future infectious disease crises, and (v) if there were other laboratory-related activities that should be nationally coordinated during a next public health crisis. The

RESULTS
Multicenter immunoassay evaluation. The aggregated results from 41 laboratories of sensitivity and specificity, including the 95% CI based on Wilson score (13), as of 15 July 2020, were reported in Tables 2 and 3. A total of 17 laboratories had submitted data on test accuracy for 22 point-of-care (POC) tests (Table 2) and 39 laboratories for 25 ELISA and autoanalyzer tests ( Table 3).
The Biosynex, Biozek, Cellex, Vomed, and Zhejiang Orient/Healgen were the only POC tests that met the predetermined criteria of .95% sensitivity combined with 98% specificity for diagnostics in severe infections with samples taken after 14 DPO but only for IgG (or total Ig for Vomed). None of the POC tests complied to predetermined criteria for IgM only or for patients with mild or asymptomatic infections ( Table 2).
Although multiple ELISA and autoanalyzer assays testing IgM antibodies met the predetermined criteria for sensitivity (.95%) in patients with severe infections with a sample collection after 14 DPO, none of them reached a specificity of .98%, and therefore, they did not fulfill all criteria (Table 3). For IgG or IgTotal targeted assays, only Euroimmun IgG and Wantai Ab ELISAs met both sensitivity and specificity criteria in severe infections if samples were taken after 14 DPO (Table 3). For diagnostics in mild or asymptomatic infections, only Wantai Ab met the predetermined criteria for use in diagnostics if samples were taken after 14 DPO (Table 3).
Additionally, the results of the sensitivity of the immunoassays if virus neutralization tests (VNTs; PRNT 50 ) were used as reference instead of RT-PCR were reported in Table 4. We observed a good sensitivity (.95%) in severe infections if samples were taken after 14 DPO for the POC tests InTec and Zhejiang Orient/Healgen and for ELISAs Euroimmun IgG and IgA and Wantai Ab (Table 4). In mild infections, we observed a good sensitivity for Zhejiang Orient/Healgen rapid test and for the Wantai Ab ELISA if samples were taken after 14 DPO (Table 4).
Evaluation of nationwide collaborative laboratory response. To assess how the joint collection and sharing of evaluation data of commercial immunoassays were perceived and whether they contributed to an improved laboratory response in the Netherlands, we sent out a short survey to the Dutch COVID-19 diagnostic laboratories. In total, 36 representatives from 34 of approximately 50 registered medical microbiological laboratories (60% to 70%) in the Netherlands responded to the survey ( Fig. 2A). The results of the survey were summarized in Fig. 2.
Almost all laboratories (33/34, 97%) had implemented a serological assay for the detection of antibodies against SARS-CoV-2 (Fig. 2B). Most of them (80%) implemented at least one ELISA test. The Wantai SARS-CoV-2 Ab ELISA was implemented by the majority of laboratories (n = 23). The choice for the implemented test was for 24 (71%) laboratories based on the reports with shared evaluation data published by the Serology Taskforce. For four of them, these reports were the sole source on which they based their choice (Fig. 2B). Other reported information sources were comparisons of tests in their own laboratory (n = 20), the already local existing platforms and/or relationships with suppliers (n = 12), a literature review (n = 1), and the guarantee of the national stock of Wantai SARS-CoV-2 Ab ELISA (n = 10) (Fig. 2B).
A total of 28 (76%) of the respondents advised to continue the data sharing (Fig. 2C). Additionally, they provided the time frame for this continuation, 9 (32%) gave an exact frame with end date, and 19 (68%) gave an abstract time frame, based on knowledge that still needs to be gained (Fig. 2C). All but one of the respondents (97%) indicated that their laboratory benefitted from the joint collection and sharing of evaluation data, and the specific advantages were specified (Fig. 2D). Next to local added value, 89% (32 of 36) of the respondents thought that the (almost) real-time sharing of data increased the efficiency of the national laboratory response regarding serological testing for SARS-CoV-2 (Fig. 2E). Experiences were that it enabled laboratories to make a more rapid choice and quickly implement tests in a confusing and aggressive market The non-invasive MEGA test of SARS-CoV-2   VIDAS anti-SARS-CoV-2 IgG (n = 10), that more data are at their disposal which results in more robust evaluations (n = 14), and that it avoided repetitive experiments in multiple laboratories and a subsequent waste of budget (n = 3).
All respondents considered that a similar approach should be employed during future epidemics. In total, 32 (89%) respondents reported that other activities, besides the sharing of evaluation data, should be coordinated at a national level in a future epidemic. Suggestions included providing standard sample panels or high-quality biobanking at the national level (n = 18); organizing external quality assessment panels (n = 5); providing and distributing material and reagents (n = 3); joint purchasing of assays, material, and reagents (n = 3); creating a consensus about the role and meaning of serology (n = 2); organizing interlaboratory communication, i.e., through webinars (n = 3); and performing one central evaluation of all tests, enabling local verification only (n = 2).

DISCUSSION
Upon the emergence of SARS-CoV-2 as a novel pathogen with pandemic spread, the diagnostic market was overflowing with assays for molecular detection; assays for detection of SARS-CoV-2-specific IgG, IgM, and/or IgA; and antigen tests. All medical laboratories need to validate any new assay before implementation for diagnostic purposes as part of their quality management system ISO 15189 (10). In a nonpandemic context, each laboratory individually evaluates and validates diagnostic tests for their own implementation, leading to dispersed and nonaccessible data of valuable assay performance. However, the rapid pandemic spread of a novel pathogen required a different, collaborative laboratory response, when sufficient test kits and properly defined evaluation panels are initially lacking, to collect a robust quantity of test performance data within the short time frame that is needed for an adequate response.
Already in the early phase of the outbreak, the first immunoassay evaluation data were shared by a few laboratories which prompted the Dutch government to establish a large stockpile of the Wantai SARS-CoV-2 Ab ELISA to guarantee availability for Dutch laboratories. In the following weeks, 41 laboratories joined forces and shared their ongoing evaluations on a weekly basis to enable the compilation of larger data sets for multiple serological tests that exceeded the capacity of each individual laboratory. The shared evaluation data were summarized and updated in reports that came out regularly and were made publicly available. These reports produced a complete overview of the performances of various serological tests at the service of all laboratories and policymaking institutes, instead of the otherwise valuable but less integrated information that one laboratory independently can yield. When using RT-PCR as a reference test, the aggregated data of the POC evaluations showed that five of the investigated POC antibody tests met the predetermined criteria for IgG or Ig total diagnostics in severe infections, where samples were collected after 14 DPO. However, results for 4 POC tests were based on fewer than 100 samples. Additionally, two ELISAs met the predetermined criteria for IgG or Ig total diagnostics in this patient group based on a sufficient amount of samples. However, currently, in practice, the relevance and added value of serology-based diagnostics compared with other diagnostic methods that aim to directly detect the presence of virus seem to focus on patients with a negative SARS-CoV-2 RT-PCR and a persistent strong suspicion for COVID-19. Indeed, this added value was clearly demonstrated in the SARS-COV-2 20C/H655Y hospital cluster in Brittany, France, in March 2021 where cases were confirmed based on serology, while RT-PCR on nasopharyngeal swabs failed (15). None of the 22 investigated POC antibody tests met the predetermined criteria for IgM and IgG sensitivity and IgG specificity for use in patients with mild or asymptomatic infections if based on a sufficient amount of diagnostic samples. The only test that met the criteria in the patient group with mild or asymptomatic SARS-CoV-2 infections is the Wantai SARS-CoV-2 Ab ELISA, which is based on the detection of total antibodies. None of the immunoassays that detect only IgM or IgA met the diagnostic criteria, indicating that they are not suitable for the diagnosis of acute infections. These data underline the importance of extensive validation in the right (sub)populations and settings. Before such an extensive validation, it is not appropriate to use (rapid) immunoassays for clinical decision making to guide dedicated measures for specific subpopulations and to guide general control measures.
Primarily, positive nucleic acid amplification testing (NAT) prior to sample collection for the use in immunoassays was used as a reference for sensitivity calculation. Because of the kinetics of an infection, PCR will be positive only in the acute stage, followed by IgM antibody production that wanes relatively fast, and IgG that will be detectable much longer (16). To limit the possibility of premature sample collection for antibody detection after positive SARS-CoV-2 PCR, only sensitivity measured in serological samples that were taken .14 DPO was considered reliable. However, even so, it cannot be completely ruled out that some confirmed patients did not develop any detectable immune response, thereby inflating the sensitivity (17). This could also partly explain the lower sensitivity in patient populations with mild infection, as their immune response is less intense than that of severe infections (18)(19)(20). The use of virus neutralization as a reference test aimed at determining the relationship between the outcomes of the routine serological assays and the presence of functional antibodies, i.e., neutralizing antibodies. However, this comparison was performed by only three laboratories in the country and yielded a data set that was too limited to draw firm conclusions.
Limitations of this study were that laboratories used their own protocols. Ideally, a joint national laboratory response is based on one common, standardized protocol for assay evaluation shared among the laboratories at the start of the outbreak. Use of a standardized protocol would greatly enhance the comparability between studies. Another limitation is that for some immunoassays, evaluation data from only a small amount of samples were available. This limitation was due to (i) the fact that laboratories chose their own assays to evaluate and (ii) the period in the pandemic that certain assays became available, as some assays were scarce and/or laboratories had difficulty accessing positive sample material for mildly infected or asymptomatic patients. Because of these scarcities, a power analysis could not be made before the study started, and all available samples and evaluation data were welcomed. Based on expert opinion, the results of test evaluations were considered reliable if at least 100 samples were used, as the range of the 95% confidence interval width decreased with these sample numbers. The difficulty accessing positive sample material can be solved by, depending on the outbreak at hand, providing well-documented reference materials from a (virtual) national biobank.
The evaluation of this specific activity of the national laboratory response to the emergence of SARS-CoV-2 showed that the collaborative effort was highly appreciated and directly informed decision making on implementation of diagnostic tests by individual laboratories. The consensus among the survey respondents was that the joint assay evaluation had increased the overall efficiency of the Dutch laboratory response. By sharing the weekly reports with the ECDC and WHO, the Dutch laboratories contributed to the EU and worldwide laboratory response to SARS-CoV-2 (8,14).
To conclude, the shared data generated by the joint approach in the Netherlands increased the efficiency of a nationwide laboratory response. First, it quickly confirmed the advice in the scientific brief from 8 April 2020 of the WHO that POC immunoassays should be used only for research purposes because of questionable performances (21). Second, there were concerns about quick implementation of immunoassays with unproven performance characteristics because there is considerable pressure on laboratories from the public and governments (22). Our data quickly gave laboratories an overview of the initial performances of the assays and enabled them to make a more evidence-based choice for quality assays. Third, these joint forces had a positive influence on the number of assays and samples that could be processed, despite the initial shortage of tests and samples from a variety of patients cohorts. Finally, this approach enabled laboratories to verify tests rather than evaluate and validate it extensively, preventing duplicate experiments and waste of budget in health care settings.
Many laboratories, Dutch governmental institutes, and public health institutes endorsed the value of the collaborative evaluations of immunoassays described here; therefore, this approach in which evaluation data are shared will be continued in the upcoming period. Currently, the international market of SARS-CoV-2 diagnostics also focuses on antigen tests, and the same collaborative approach is employed for these assays in the Netherlands.