SARS-CoV-2 serology: Validation of high-throughput chemiluminescent immunoassay (CLIA) platforms and a field study in British Columbia

Background SARS-CoV-2 antibody testing is required for estimating population seroprevalence and vaccine response studies. It may also increase case identification when used as an adjunct to routine molecular testing. We performed a validation study and evaluated the use of automated high-throughput assays in a field study of COVID-19-affected care facilities. Methods Six automated assays were assessed: 1) DiaSorin LIAISONTM SARS-CoV-2 S1/S2 IgG; 2) Abbott ARCHITECTTM SARS-CoV-2 IgG; 3) Ortho VITROSTM Anti-SARS-CoV-2 Total; 4) VITROSTM Anti-SARS-CoV-2 IgG; 5) Siemens SARS-CoV-2 Total Assay; and 6) Roche ElecsysTM Anti-SARS-CoV-2. The validation study included 107 samples (42 known positive; 65 presumed negative). The field study included 296 samples (92 PCR positive; 204 PCR negative or not PCR tested). All samples were tested by the six assays. Results All assays had sensitivities >90% in the field study, while in the validation study, 5/6 assays were >90% sensitive and DiaSorin was 79% sensitive. Specificities and negative predictive values were >95% for all assays. Field study estimated positive predictive values at 1–10% disease prevalence were 100% for Siemens, Abbott and Roche, while DiaSorin and Ortho assays had lower PPVs at 1% prevalence, but PPVs increased at 5–10% prevalence. In the field study, addition of serology increased diagnoses by 16% compared to PCR testing alone. Conclusions All assays evaluated in this study demonstrated high sensitivity and specificity for samples collected at least 14 days post-symptom onset, while sensitivity was variable 0–14 days after infection. The addition of serology to the outbreak investigations increased case detection by 16%.


Introduction
Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), a member of the Betacoronavirus genus of the Coronaviridae family, and the causative agent of COVID-19 disease, has dominated international attention since its discovery in December 2019 and subsequent rapid spread. The virus exhibits varying degrees of similarity in structural and functional proteins with other Betacoronaviruses [1].
for false negative molecular test results [3], and global shortages of molecular diagnostic reagents made it clear that other testing modalities, such as serology, are necessary to help estimate the true spread of this virus through populations. Furthermore, with multiple COVID-19 vaccines currently in deployment worldwide, and additional vaccine candidates in various stages of clinical trials, tests to accurately determine vaccine-induced seroconversion and to differentiate between natural and vaccine-induced immunity will be necessary. There has been a rapid explosion in the number of SARS-CoV-2 serological tests, but their performance varies [4]. To be useful, these tests need to have demonstrated high performance characteristics not only in validation studies, but also in clinical/epidemiological settings.
We undertook a multi-site laboratory validation study of six high throughput SARS-CoV-2 chemiluminescent immunoassays (CLIA) and subsequently evaluated the same assays in a serosurvey at two healthcare facilities affected by COVID-19 outbreaks.

Methods
Six high throughput chemiluminescent immunoassay (CLIA) platforms were evaluated: 1) LIAISON TM SARS-CoV-2 S1/S2 IgG (DiaSorin IgG; DiaSorin, Italy); 2) ARCHITECT TM SARS-CoV-2 IgG (Abbott IgG; Abbott, USA); 3) VITROS TM Anti-SARS-CoV-2 Total (Ortho T) and 4) VITROS TM Anti-SARS-CoV-2 IgG (Ortho IgG; Ortho Clinical Diagnostics, USA); 5) SARS-CoV-2 Total Assay (Siemens T; Siemens, USA); and 6) Elecsys TM Anti-SARS-CoV-2 (Roche T; Roche, USA). Table 1 lists the SARS-CoV-2 viral antigen targets and performance characteristics claimed by each manufacturer. All of the serology platforms provide a semi-quantitative signal intensity, which is translated to a categorical reactive or non-reactive result. Signal to cut-off ratios for categorical interpretation are platform-specific (Supplementary Table 2). Reactive test results were examined to determine if there was any association of semi-quantitative test signals with age and sex. Where signals fell beyond the dynamic range of the platform (i.e., "above maximum" or "below minimum" signal), the maximum or minimum of the dynamic range was used as proxy. Consensus reactive and consensus non-reactive samples, i.e., those reactive or non-reactive, respectively, on at least 5/6 platforms, were considered true positive and true negative serologic results.
Testing was performed in accordance with manufacturers' recommendations. A combination of In Vitro Diagnostic and Research Use Only test kits was used.

Validation using characterized samples
The validation panel consisted of 107 serum or plasma specimens. "Known positive" samples (n = 42) were from 37 COVID-19 patients previously diagnosed by PCR testing, collected at different time points from symptom onset. One patient had two samples >14 days; three patients had one 0-14 and one >14 days; and one patient had one 0-14 and two >14 days. Most samples collected 0-14 days post-onset were from hospitalized patients, while those collected >14 days were from those who were outpatients at the time of collection, with no information available on their hospitalization history or clinical course (Table 2a). All >14 days samples were collected within 3 months of either PCR-based diagnosis or symptom onset. Presumed negative samples (n = 65) were leftover frozen serum or plasma samples obtained prior to November 2019: 51 from pre-natal and organ donor testing, which accounts for a higher proportion of females and younger ages in the panel; and 14 potential cross-reactive samples (nine known to be serologically positive for another pathogen and five with a known positive result for another respiratory pathogen (confirmed by PCR) within 12 months of serum collection; Table 2b). All CLIA platforms were evaluated with the same set of samples to facilitate comparability of results.

COVID-19 outbreak field study
For the serosurvey of facilities affected by COVID-19 outbreaks, a total of 296 serum samples were collected from consenting residents and staff as part of a Public Health investigation. All samples were tested on the six CLIA platforms and results were compared against the participants' COVID-19 status: known positive if ever tested PCR positive for SARS-CoV-2 and collected at least 14 days and less than 2 months postonset (n = 92); and, unknown if PCR negative or never PCR-tested (n = 204). Specificity estimates were based on consensus negative serologic results.

Statistical analysis
Ninety-five percent confidence intervals (95%CI) were calculated for overall agreement, kappa statistic, sensitivity, and specificity. For sensitivity and specificity comparisons, McNemar's test was used and p   Abbreviations: RPR=rapid plasma reagin values <0.05 were deemed to be statistically significant. For each platform, comparison of median reactive test signals for males vs. females and consensus vs. non-consensus reactive results was performed using Mann-Whitney test. Pearson correlation was used to assess potential differences in test signals by age.

Ethics
The study was conducted under the BC Centre for Disease Control's legislated mandate for outbreak investigation. Ethics approval for the study was also obtained from the University of British Columbia Clinical Research Ethics Board (approval #H20-01090).

Validation study
Sensitivity, specificity and estimated positive (PPV) and negative (NPV) predictive values for each assay are shown in Table 3. DiaSorin IgG showed lower sensitivity than the other assays for samples collected 0-14 days post-onset (60% vs. 90-100%), but the difference was not statistically significant (all McNemar's p values ≥0.125). Overall sensitivity was also lowest for DiaSorin IgG (78.6%) and was highest for Ortho T (100%). Specificities were high for all assays (range 95.4% to 100%). Overall sensitivities and specificities did not differ significantly between any of the assays (all McNemar's p values ≥0.09). PPVs for DiaSorin IgG, Ortho IgG and Ortho T were lower than for the other assays, especially in lower prevalence scenarios. NPVs were high for all assays (range 97.6% to 100%).
None of the three serum samples collected from patients previously diagnosed with endemic non-SARS-CoV-2 coronaviruses showed crossreactivity on any of the assays. HKU1 is a Betacoronavirus with higher homology to SARS-CoV-2, while NL63 and 229E are Alphacoronaviruses with lower homology. DiaSorin IgG was reactive for three presumed negative prenatal samples, and both Ortho T and Ortho IgG were reactive with one syphilis positive sample (RPR 1:32). These were presumed to be false reactive SARS-CoV-2 tests. No reactive results were observed for the Siemens T, Abbott IgG and Roche T assays for any of the presumed negative samples.
Overall assay agreements and kappas are shown in Supplementary  Table 1; overall agreements were >90% except for DiaSorin IgG vs. Roche T (88.8%). Highest agreement was between Abbott IgG and Roche T (98.1%), both of which are based on nucleocapsid antigen.

Outbreak field study
Of 296 samples in the outbreak investigation, 92 were from confirmed SARS CoV-2 PCR positive participants, and 204 were from participants who tested either PCR negative or were not PCR-tested. CLIA sensitivities for the PCR-confirmed participants were not significantly different and ranged from 95.7% to 98.9% (Table 4). All but four samples had consensus reactive CLIA results; 84/92 samples were reactive on all and 4/92 on five assays. Of the four non-consensus results, three were reactive on four and one on three CLIA assays. Sensitivities in the field study were similar to those in the validation study for samples collected >14 days post-onset, except for DiaSorin IgG, which had higher sensitivity in the field study (95.7% vs. 78.6%). Estimated specificities of the assays ranged from 96.8% to 98.9%. Reactive results that were deemed to be false positive were distributed randomly among all assays. There were no significant differences in sensitivity or specificity for any of the assays using outbreak samples (all McNemar's p values ≥0.4).
There were 15 participants of initially unknown status who were consensus reactive (n = 12 for all assays and n = 3 for five assays). Thus, the addition of serologic testing to the outbreak investigation resulted in identification of an additional 16.3% (15/92) cases compared to PCR testing alone.

Field study signal intensities
Median signal intensities did not differ significantly for males vs. females for any of the assays (Supplementary Table 2). In addition, there was no strong correlation between signal intensity and age for PCR positive participants (data not shown).
Median signal intensities for consensus reactive results were 2-to 14fold higher than for non-consensus reactives; the differences were statistically significant for the Ortho T, Ortho IgG, Abbott IgG and Roche T assays (Supplementary Table 3). This suggests that falsely reactive tests for a given platform are more likely to have lower signals, but there were similarly low signals in all the assays among samples with consensus reactive results, implying that signal intensity alone is not a useful criterion for suggesting the likelihood of a false reactive test.

Discussion
This study demonstrated high sensitivity for all the SARS CoV-2 assays we assessed, although some differences were observed. In the validation study, for samples collected >14 days post-onset, sensitivity was >90% for all assays except DiaSorin IgG, which had a non- significantly lower sensitivity than the other assays. Charlton et al. [5] also reported low sensitivity (48%) for this assay at 0-14 days, which increased to 73% by days 15-21. Low sensitivity early in infection could be a result of an assay detecting only IgG antibodies, but both the Ortho and Abbott IgG-specific assays had higher 0-14 day sensitivities which were similar to the total antibody assays. Other assay validation studies have reported similarly high sensitivities for samples collected after at least 14 days [6][7][8][9][10]. In the outbreak field study, where all samples were collected >14 days and within 2 months post-onset, >95% sensitivity was observed for all assays. Given the lower sensitivity for some of the assays 0-14 days after infection, and the resulting likelihood of missing early infections, current serological assays may not be a useful diagnostic tool for routine clinical use. However, the outbreak investigations revealed an increased diagnostic yield of 16% compared to PCR testing alone, suggesting that serology would be useful as an adjunct to molecular testing in clinical situations where the PCR diagnostic window might have been missed [11]. The additional cases detected by serology are likely an under-estimate as other studies have demonstrated low or absent antibody responses for individuals with mild disease [12,13], while those requiring hospitalization tended to have higher responses [14].
In the field study, there was consensus agreement (at least five assays reactive or at least five assays negative) for the majority of known positive (95.7%) and negative (98.4%) patient samples, but no assay had 100% accuracy. Discordant results may indicate variability in serological responses among patients or in antibody detection by the assays. When using a given assay for clinical diagnostic purposes, especially where PPV is low, orthogonal testing with a second assay, perhaps with a different antigen target, might increase confidence that dual reactive results indicate true antibody positivity [15,16].
The utility of these serology platforms for estimation of vaccine responses and population seroprevalence would be enhanced if it was known that the antibodies detected correlate with neutralizing capacity, and it has been reported that anti-spike antibodies correlate more closely with neutralizing antibodies than anti-nucleocapsid antibodies [17]. A study by Kohmer et al. [18] demonstrated that antibodies in all but one COVID-19 PCR positive patients who were subsequently tested by serology were neutralizing, but that study evaluated different commercial platforms than those in our study.
Strengths of this study include the large number of assays evaluated and the availability of a large number of samples from SARS-CoV-2 outbreaks. A limitation of the validation study is the small number of samples collected early after infection which may have impacted the sub-group sensitivity analysis. The small number also precluded assessment of antibody kinetics during seroconversion. The large number of reactive Siemens T results that exceeded the maximum range of the assay likely impacted assessment of sex differences in test signals, although there were no significant sex differences for the other assays which had almost no out of range results.
In conclusion, all six SARS-CoV-2 serologic assays evaluated in this study showed high sensitivity, specificity and positive predictive value for samples collected ≥14 days after onset, suggesting that these assays will be useful for assessment of population seroprevalence and response to vaccines. Furthermore, the addition of serology to PCR testing during outbreaks increased the overall case yield by 16%.

Declaration of Competing Interests
MK has received grants/contracts paid to his institution from Roche and Hologic related to human papillomavirus research, and from Siemens, DiaSorin, and Ortho, unrelated to the present work. DC has received travel expenses and speaker honoraria from Hologic, unrelated to the present work. All other authors report no conflicts of interest.  The study sponsor had no role in the collection and interpretation of data, nor in the decision to submit the article for publication. All test reagents were supplied by the respective manufacturers at no charge. The manufacturers had no input into the study analysis or the decision to publish the results.