Diagnostic and analytical performance evaluation of ten commercial assays for detecting SARS-CoV-2 humoral immune response

Objective Analytical validation of newly released SARS-CoV-2 antibody assays in the clinical laboratory is crucial to ensure sufficient performance in respect to its intended use. We aimed to assess analytical and diagnostic performance of 8 (semi-)quantitative assays detecting anti-nucleocapsid IgG (Euroimmun, Id-Vet) or total Ig (Roche), anti-spike protein IgG (Euroimmun, Theradiag, DiaSorin, Thermo Fisher) or both (Theradiag) and 2 rapid lateral flow assays (LFA) (AAZ-LMB and Theradiag). Methods Specificity was evaluated using a cross-reactivity panel of 85 pre-pandemic serum samples. Sensitivity was determined at both the manufacturer's and a 95% specificity cut-off level, using 81 serum samples of patients with a positive rRT-PCR. Sensitivity was determined in function of time post symptoms onset. Results Specificity for all assays ranged from 92.9% to 100% (Roche and Thermo Fisher) with the exception of the Theradiag IgM LFA (82.4%). Sensitivity in asymptomatic patients ranged between 41.7% and 58.3%. Sensitivity on samples taken <10 days since symptom onset was low (23.3%–66.7%) and increased on samples taken between 10 and 20 days and > 20 days since symptom onset (80%–96% and 92.9%–100%, respectively). From 20 days after symptom onset, the Roche, Id-vet and Thermo Fisher assays all met the sensitivity (>95%) and specificity (>97%) targets determined by the WHO. Antibody signal response was significantly higher in the critically ill patient group. Conclusion Antibody detection can complement rRT-PCR for the diagnosis of COVID-19, especially in the later stage, or in asymptomatic patients for epidemiological purposes. Addition of IgM in LFAs did not improve sensitivity.


Introduction
In December 2019 several cases of pneumonia of unknown cause occurred in Wuhan, Hubei Province, China. On January 72,020, a novel betacoronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was isolated from the patients in Wuhan (Wang et al., 2020a). This virus is responsible for a viral pneumonia called coronavirus disease 2019   Lu et al., 2020). Although most people with COVID-19 disease have mild to moderate symptoms, the disease can cause severe medical complications such as acute respiratory distress syndrome, septic shock, bleeding and coagulation disorders, and can lead to death in pre-disposed people . Due to a combination of high human-to-human transmissibility, absence of natural immunity in the population and a lot of international traffic, the virus has quickly spread around the world and evolved to a global pandemic (World Health Organization, 2020a;Worobey et al., 2020). At the time of writing, the virus is globally disrupting society. Therefore, a rapid and correct identification of the virus is crucial, not only for the diagnosis of COVID-19 disease and subsequent correct treatment, but also to take necessary isolation precautions and thereby avoid further spreading. Also, vaccination will probably soon be a possible solution for reducing the spread of the virus by evoking humoral immunity in the vaccinated people.
The current gold standard for the diagnosis of COVID-19 is the detection of viral RNA in respiratory tract samples with real-time reverse transcriptase-polymerase chain reaction (rRT-PCR) targeting SARS-Cov-2 specific sequences coding for spike (S), envelope (E), or nucleocapsid proteins Wang et al., 2020b;Chu et al., 2020;World Health Organization, 2020b;Bohn et al., 2020). rRT-PCR is highly sensitive and specific, especially in the acute phase of the infection (Infantino et al., 2020;Tang et al., 2020). The sensitivity of the PCR test depends on the time of sample collection in relation to the diagnostic testing window. Negative PCR results may be caused by extremely low viral load when tested shortly after exposure or at late stages of infection. Maximum viral load in throat swabs was observed 2 days before until 5 days after symptom onset (Wölfel et al., 2020;Kampf et al., 2020). The median [interquartile range] period between symptom onset and a negative rRT-PCR result has been reported to be 20 [17][18][19][20][21][22][23][24] days . Furthermore, higher viral load and a longer mean duration of viral detection in respiratory samples correlate with disease severity (Clementi et al., n.d.). rRT-PCR can be false negative due to pre-analytical issues such as the sample collection technique. As to be expected, bronchoalveolar lavage (BAL) and sputum samples have shown to contain a higher viral load, and thus to remain positive for a longer time, compared to nasopharyngeal, nose or throat samples (Wang et al., 2020b). However, the Centers for Disease Control and Prevention (CDC) recommends using upper respiratory specimens for initial diagnostic testing, for logistical purposes and to limit invasive sampling procedures (Centers for Disease Control and Prevention, 2020). The nasopharyngeal swab (NP) is currently proposed as the gold standard sample for detection of SARS-CoV-2 due to the higher sensitivity for the detection of SARS-CoV-2 compared to oropharyngeal swabs and saliva samples (Wang et al., 2020b), restricting the latter sample types to specific screening strategies (Williams et al., 2020).
Serological assays have the potential to play a complementary role in the diagnosis of rRT-PCR-negative COVID-19 cases (Xiang et al., 2020). Seroconversion for SARS-CoV-2 is typically detected between 7 and 14 days post symptom onset (Guo et al., 2020;Burbelo et al., 2020;Long et al., 2020a). Among the four SARS-CoV-2 structural proteins, the spike (S) and nucleocapsid (N) proteins are the most immunogenic (Meyer et al., 2014;Qiu et al., 2005). Different types of tests are available to detect anti-SARS-CoV-2 antibodies: rapid lateral flow assays (LFA) as point of care tests, enzyme-linked immunosorbent assays (ELISA) and automated immunoassays. The contribution of serological assays to seroprevalence studies and evaluation of the results of vaccine trials is currently under debate (Okba et al., 2020a).
The aim of this study is to compare the diagnostic performance of ten commercial SARS-CoV-2 antibody test assays: five ELISA's, one fluoroenzyme-immunoassay (FEIA), two rapid LFA's, and two chemiluminescence immunoassays (CLIA) (Supplementary Material 1). The study was performed in co-operation with the Belgian Federal Agency for Medicines and Health Products (FAMHP) that had set up a validation scheme for serological SARS-CoV-2 assays, whereby positively evaluated laboratory assays are reimbursed by the national health insurance.

Patient selection
This retrospective study was performed using 166 patient samples collected at the OLV Hospital Aalst, Belgium.
Specificity was assessed on a selection of 85 serum samples from unique patients, collected before the COVID-19 pandemic, from March 2017 to March 2020. This cross-reactivity panel consisted of a) 35 samples of patients with a rRT-PCR confirmed non-coronavirus respiratory pathogen infection b) 19 samples of patients with a rRT-PCR confirmed non-SARS-CoV-2 coronavirus infection c) 10 samples of patients with a confirmed systemic auto-immune rheumatic disease and d) 21 samples of patients with antibodies against other viral/bacterial/ parasitic pathogens. A detailed description of this specificity cohort is listed together with the results description in Table 3.
Sensitivity was assessed on a selection of 81 serum samples from 77 patients with a rRT-PCR confirmed SARS-CoV-2 infection on nasopharyngeal swab. rRT-PCR was performed using an in-house method complying with the WHO guidelines . The time between symptom onset and sampling date was a) less than ten days (n = 30) b) between 10 and 20 days (n = 25) and c) more than 20 days (n = 14). Also 12 samples of asymptomatic patients were included. The median time between symptom onset and serum sampling was 11 days (range 1-51). The group consisted of 53 male and 28 female patients with a median age of 66 years (range 17-97). Of note, in case of multiple samples per patient, only the first sample per time-category was used to assess sensitivity.
All samples were stored at − 20 • C until analysis.

Data collection
The protocol was approved by the local Ethics Committee OLV Hospital Aalst with Belgian registration number B126202000015. For all COVID-19 patients, disease severity status was collected. Patients were classified as a) "mild" if no hospital admission was required b) "moderate" in case of admission to a non-ICU ward, c) "critical" in case of admission to the ICU-ward or death and d) "asymptomatic". Serum samples of immunosuppressed patients (hematological malignancies, solid organ transplant) and patients younger than one year were excluded from the data set (specificity and sensitivity).

Assays
Four new ELISA's, one FEIA and two new rapid LFA's were evaluated and compared to one established ELISA and two established CLIA's.
The established ELISA and CLIA's included concerned respectively Anti-SARS-CoV-2 (Euroimmun, Germany) targeting IgG anti-S antibodies (abbreviation: EI-S), LIAISON SARS-CoV-2 S1/S2 IgG (DiaSorin S.P.A., Italy) targeting IgG anti-S antibodies (abbreviation: DS-S) and Elecsys Anti-SARS-CoV-2 (Roche, Germany) targeting total Ig anti-N antibodies (abbreviation: R-N). A detailed description of the different assays, including type of analyzer used, is provided in Supplementary Material 1. The samples were analyzed within controlled pre-analytical sample conditions in batch by the laboratory of OLV Hospital Aalst according to the instructions of the different collaborating companies.

Performance measures and statistical analysis
Analytical performance of each assay was assessed by calculating imprecision (coefficient of variation (CV), %) using the manufacturer's internal quality control materials (iQC) and three patient serum samples with a low, medium and high SARS-CoV-2 Ab concentration. All iQC samples were measured before and after every run during 10 runs (CLSI EP5-A2) (Clinical and Laboratory Standard Institute (CLSI), 2004). Linearity was assessed by diluting a high level serum SARS-CoV-2 Ab sample with increasing amounts of a serum sample with very low levels of SARS-CoV-2 Ab (CLSI EP06-A) (Clinical and Laboratory Standard Institute (CLSI), 2003).
Diagnostic performance characteristics (sensitivity, specificity) were calculated for every SARS-CoV-2 antibody assay based on the manufacturer's cut-off and compared with the McNemar test. For calculation of performance characteristics, borderline results were considered as positive. For (semi-)quantitative assays, receiver operating characteristic (ROC) curve analyses were performed to verify company cut-off values. Cut-off values at the 95% specificity level were determined and corresponding sensitivity for diagnosing COVID-19 was compared between the different assays. Finally, a Box and Whisker analysis was performed between the different disease severity patient cohorts. Quantitative variables are presented as median and range and categorical variables with number and percentage or frequency. Data analysis was performed in MEDCALC® Statistical Software version 17.1 (Med-Calc Software Ltd., Ostend, Belgium), except for ROC curves, which are performed in Microsoft Excel + Analyse-it® Software version 5.65.3 (Leeds, UK). A p-value <0.05 was considered statistically significant.

Patient demographics
An overview of the demographic features of the different patient cohorts is shown in Supplementary Material 2a-d. In general, we retained no significant difference in gender distribution between the sensitivity and specificity patient cohorts (p = 0.1174), but regarding age, the sensitivity patient group was significantly older (p < 0.0001).

Imprecision
Results of the imprecision study are presented in Supplementary Material 3. For the ELISA's of Euroimmun (EI-S & EI-N), the imprecision obtained for the patient sample iQC was higher than for the kit iQC which can be explained by the fact that the kit iQC's are prediluted and their imprecision results didn't include a predilution step. The latter is not true for the ELISA's of Theradiag (TD-S & TD-SN) and Id-vet (Id-N), with comparable imprecision results for the kit and patient sample iQC. Assays based on ELISA format obtained the highest CV% results.

Linearity
No deviation from linearity was revealed for any of the assays, which is illustrated in Supplementary Material 4a-h. The lower results for TD-S are related to imprecision rather than to non-linearity.

Sensitivity cohort
The evaluated SARS-CoV-2 antibody assays showed a significant difference in diagnostic performance with the 81 serum samples selected from patients with a rRT-PCR confirmed SARS-CoV-2 infection. Data of all (semi-)quantitative assays are shown in Table 1 and corresponding receiver operating characteristic (ROC) curves in Fig. 1. The observed differences are mainly related to the diagnostic performance in the early phase of antibody detection. The diagnostic performance of the antibody tests was directly proportional to the time period after onset of symptoms: the longer this time period, the higher the diagnostic performance of all antibody tests and consequently, the lower the difference in diagnostic performance between tests. In addition, assays using a recombinant N-antigen revealed generally higher sensitivities compared to those targeting the S-antigen, although specificity results were generally comparable. Based on these results, the R-N assay showed the best diagnostic performance characteristics.
Equally, all data on diagnostic performance characteristics of the rapid LFA are shown in Table 2 For all assays, Box and Whisker analysis revealed significantly higher antibody results in the 'critically ill' patient cohort (n = 33) compared to the 'moderately ill' cohort (n = 33) (Supplementary Material 5; for all assays p < 0.05). However, the proportion of samples collected ≥20 days after symptom onset was significantly higher in the 'critically ill' patient cohort (36% versus 3%; p = 0.0008), which could attribute to the higher antibody levels. No significant difference was observed when comparing antibody results between patient cohort 'asymptomatic' (n = 12) and cohort 'mildly ill' (n = 3) or 'mildly ill' and cohort 'moderately ill' patients for all assays (p > 0.05).

Discussion
Since the start of the COVID-19 pandemic, an increasing number of serological SARS-CoV-2 assays have been introduced to the diagnostic market (Deeks et al., 2020). The expertise of laboratory professionals is critical in the validation of these diagnostic assays to ensure sufficient analytical performance in respect to the intended use (Vermeersch et al., 2020).

TD-SN IgG
seroconversion for IgM and IgG (Long et al., 2020a;Zhao et al., 2020). Antibodies against N protein are reported to appear earlier in infection than those against S protein (Grzelak et al., 2020). Within the subgroup of 'patients < 10 days of symptoms' and in the asymptomatic patient cohort we also revealed higher sensitivities for the N-based assays (range of respectively 46.7-50.0%, 50.0-58.3%) versus S-based assays (range of respectively 23.3-36.7%, 41.7-50%), with significantly different areas under the diagnostic (AUC) receiver operating curve (ROC) between some of the N and S-protein based assays. However, the Ag-source clearly appeared not to be the only factor attributing to diagnostic sensitivity (Table 1). Our data are in concordance with other head to head SARS-CoV-2 antibody comparison studies (Lassaunière et al., 2020;Van Elslande et al., 2020;National SARS-CoV-2 Serology Assay Evaluation Group, 2020;Pieri et al., 2020;Charpentier et al., 2020;Herroelen et al., 2020;Perkmann et al., 2020) and, if compared on the same level of specificity (95%), the R-N revealed the best overall sensitivity (84.0% [74.1-91.2]) versus DS-S the lowest (61. 7% [50.3-72.3]). For symptomatic patients, all tests, except for TD-S, revealed a sensitivity of 100% ≥ 20 days after symptom onset (Table 1). At this time-point, there are no significant differences in area under the diagnostic curve (AUC) between the serological tests (Fig. 1).
The sensitivity in the asymptomatic cohort was significantly lower than the overall sensitivity. Importantly, 9 of the 12 samples were taken <10 days after positive rRT-PCR and 4 of those 9 serum samples tested negative in all antibody assays. Most likely, the lower sensitivity can be attributed to early infection or to a difference in Ab kinetics as described earlier in this patient category (Jiang et al., 2020). In the study of Jiang and colleagues, IgG/IgM titers and plasma neutralisation capacity were, at the time of virus clearance, significantly lower in recovered asymptomatic than in recovered symptomatic patients. Reinforced by the fact that a major part of asymptomatic and pauci-symptomatic patients is not even tested for viral RNA, serology ultimately offers the greatest potential to understand the true scale of SARS-CoV-2 infections. The persistence of the SARS-CoV-2 specific antibody response is still under review. It is observed that IgG levels and neutralizing antibodies in a high proportion of individuals who recovered from SARS-CoV-2 infection start to decrease within 2-3 months after infection, especially for the asymptomatic patients. 40.0% (120) of asymptomatic individuals, but only 12.9% (4/31) of symptomatic individuals, became seronegative for IgG eight weeks post hospital discharge (Long et al., 2020b).
The added value of SARS-CoV-2 antibody detection for the diagnosis of COVID-19 in rRT-PCR negative patients presenting in the late stage of the disease is well known (Long et al., 2020a). As we have shown in Table 1, antibody detection shows a high sensitivity from 10 days post symptom onset onwards. In this regard, serology can potentially offer added value in patients with a single respiratory sample with low viral load (cycle threshold ≥32), to distinguish acute from past infection. At this stage of the pandemic, quite some people have already gone through a COVID-19 infection some without knowing. As SARS-CoV-2 RNA remains detectable for several weeks to months after an infection, this can result in unexpected positive rRT-PCR results (with low viral load) at routine (pre-)admission screening of patients. In the absence of respiratory symptoms, the presence of antibodies in combination with a low viral load on rRT-PCR is highly suggestive for late stage of the disease or past infection. The patient can be considered noncontagious and there is no need for specific isolation precautions. When the antibody test is negative, the high cycle threshold most likely indicates a very recent infection, and thus an infectious patient. A new respiratory sample for the detection of SARS-CoV-2 RNA is warranted to confirm a recent onset of infection. The potential role of serology in monitoring the immune response after vaccination is a next topic of research.
based assays have shown to be more specific compared to assays to full viral antigens (Okba et al., 2020b;Amanat et al., 2020) (Supplementary Material 1). Overall specificity in the samples collected prior to the pandemic (n = 85) ranged from 82.4% (LFA TD-S) to 100% (R-N & TF-S) ( Table 1). Cross-reactivity is mainly attributed to antigens wellconserved among different coronaviruses and to cross-reaction with antibodies of autoimmune diseases (Wang et al., 2004). When using antibody assays on a population level, a high specificity is of utmost importance, as every small drop in specificity will seriously reduce the positive predictive value (Galli and Plebani, 2020). In the future, if antibodies prove to be protective, false positive results can potentially also have an important impact on the individual patient level if these results are used to decide whether or not to administer (re)vaccination or to use personal protective equipment.
In accordance with preceding studies (Long et al., 2020a;Okba et al., 2020a), we found that all (semi-)quantitative assays result in significantly (p < 0.05) higher antibody levels in the 'critically ill' patient cohort compared to the 'moderately ill' cohort. However, the proportion of samples collected ≥20 days after symptom onset was also significantly higher in the 'critically ill' patient cohort (36% versus 3%; p = 0.0008), which could attribute to the higher antibody levels. Nevertheless, our observations are completely in line with earlier findings that antibody levels are associated with disease severity (Gudbjartsson et al., 2020).
A strength of our study is the parallel evaluation of the diagnostic performance of several new serologic SARS-CoV-2 assays and assays with established diagnostic performance. Furthermore, we've performed a separate diagnostic performance analysis in asymptomatic Table 3 Overview of the cross reactivity of every SARS-CoV-2 antibody assay for the total specificity cohort.
people. In this subgroup, overall sensitivity revealed to be lower than the overall sensitivity obtained for the several assays, as mentioned above. This is not surprising taking into account the earlier mentioned difference in Ab kinetics in the asymptomatic population. A limitation of our study is that the samples used to evaluate specificity were all challenging. We thus expect a higher specificity in a routine laboratory setting. Another limitation is the limited sample size, which results in a small number of cases in the subgroup analyses concerning timing post symptom onset and severity of symptoms. Finally, the categorization of the patient cohorts "mild", "moderate", "critical" was only based on whether or not the patient was admitted to the hospital/intensive care unit. Information on duration and severity of symptoms of individual cases is lacking, due to the retrospective design of this study.
We can conclude that, in this study, the R-N serological assay revealed the best overall performance. However, for the intended use of antibody detection (>20 days after symptom onset), the R-N, Id-N and TF-S assays all met the sensitivity (95-98%) and specificity (97-99%) targets determined by the WHO (World Health Organization, 2020c).

Funding
No funding was received for conducting this study.

Data availability
Data will be available from the author upon request.

Ethics approval
The protocol was approved by the local Ethics Committee OLV Hospital Aalst with Belgian registration number B126202000015. The procedures used in this study adhere to the tenets of the Declaration of Helsinki.

Authors' contributions
MM, EVH, LC, AB and LVH contributed to the study conceptualization. Data curation and project administration were performed by MM, EVH, LN, SVDB and LH; Formal data analysis was performed by LVH. MM, EVH, LN and LVH wrote-original draft. Writing-review & editing was performed by MM, EVH, LN, LC, AB and LVH. The final manuscript was read and approved by all authors.

Declaration of Competing Interest
AB and LVH have been consultants for Thermo Fisher Scientific.

Acknowledgments
We thank Euroimmun, Id-vet, AAZ-LMB, Theradiag and Thermo Fisher Scientific for the donation of the assays. We are very grateful to the laboratory technicians for their most appreciated efforts.