Anti-Spike Protein Assays to Determine SARS-CoV-2 Antibody Levels: a Head-to-Head Comparison of Five Quantitative Assays

ABSTRACT Reliable quantification of the antibody response to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is highly relevant, e.g., for identifying possible vaccine failure and estimating the time of protection. Therefore, we evaluated five different anti-SARS-CoV-2 antibody assays regarding the quantification of anti-spike (S) antibodies. Sera from 69 SARS-CoV-2-naive individuals 21 ± 1 days after vaccination with a single dose of BNT162b2 (Pfizer/BioNTech) were tested using the following quantitative assays: Roche S total antibody, DiaSorin trimeric spike IgG, DiaSorin S1/S2 IgG, Abbott II IgG, and Serion/Virion IgG. Results were further compared to the percent inhibition calculated from a surrogate virus neutralization test (sVNT). Individual values were distributed over several orders of magnitude for all assays. Although the assays were in good overall agreement (ρ = 0.80 to 0.94), Passing-Bablok regression revealed systematic constant and proportional differences, which could not be eliminated by converting the results to binding antibody units (BAU) per milliliter, as suggested by the manufacturers. Seven (10%) individuals had negative sVNT results (i.e., <30% inhibition). These samples were identified by most assays and yielded significantly lower binding antibody levels. Although all assays showed good correlation, they were not interchangeable, even when converted to BAU per milliliter using the WHO international standard for SARS-CoV-2 immunoglobulin. This highlights the need for further standardization of SARS-CoV-2 serology. IMPORTANCE Reliable quantification of the antibody response to SARS-CoV-2 is highly relevant, e.g., for identifying possible vaccine failure and estimating the time of protection. We compared the performance of five CE marked tests that quantify antibodies against the viral spike protein. Our findings suggest that, although all assays showed good correlation, their results were not interchangeable, even when converted to BAU per milliliter using the WHO international standard for SARS-CoV-2 immunoglobulin. This highlights the need for further standardization of SARS-CoV-2 serology.

quantitative tests and were designed by the manufacturer to achieve the highest possible specificity and high sensitivity. High specificity was indispensable, especially at the beginning of the pandemic, because the extremely low seroprevalence rates led to many false positives and low positive predictive values even with tests having a specificity of 99% (6). In contrast, the sensitivity of SARS-CoV-2 testing was often reduced to ensure the high specificities needed for these assays (7). The lower antibody levels further aggravated suboptimal sensitivities in mild/asymptomatic infections and during the pandemic by the natural decline in antibody levels (8)(9)(10)(11)(12)(13).
Various antigens have been used for this purpose, but essentially two types can be distinguished: nucleocapsid (NC)-and spike protein (S)-based assays (14). Antibodies directed against SARS-CoV-2 specific nucleocapsid antigens are induced early and strongly in most infected individuals due to the virus nucleocapsid's typical strong immunogenicity (15). Furthermore, a very high specificity can be achieved by targeted modification of the nucleocapsid antigen so that no cross-reactivity is observed even with closely related viruses. The discriminatory properties of such nucleocapsid-based antibody assays can therefore be excellent (16,17). The physiological significance of these antibodies, on the other hand, is unclear, and these surrogate markers for a previous infection are unlikely to be functionally relevant to confer protection or immunity. The antibodies that react with the spike protein, however, act differently. At least a proportion of these S-binding antibodies are likely to have the function of neutralizing antibodies (18). Thus, it is not surprising that numerous studies have shown a correlation between spike protein binding assays and various forms of functional virus neutralization assays (19)(20)(21)(22)(23)(24).
In the context of SARS-CoV-2 vaccines, it is precisely these neutralizing antibodies that are of paramount importance. The primary goal of active immunization is to induce many SARS-CoV-2-specific neutralizing antibodies that ideally prevent the pathogen's entry and thus infection or stop the systemic spread to prevent disease (25). The functional virus neutralization assays are not feasible everywhere: assays with live viruses require biosafety level 3, but variants such as pseudotyped neutralization assays are also labor-intensive and cannot be performed at high throughput (26)(27)(28). Classical antibody assays, which measure the reactivity of antibodies in serum/plasma with defined antigens, can be performed very rapidly and in high throughput, in contrast to neutralization tests.
Thus, anti-spike protein assays will play an important role in the future. However, these test systems must be able to reliably quantitate SARS-CoV-2-specific antibody levels, be comparable to each other, and have good to excellent agreement with the presence of neutralizing antibodies. The comparability of antibody assays is expected to be improved by the recent introduction of a first WHO international standard for anti-SARS-CoV-2 immunoglobulin (NIBSC code 20/136) with reference to neutralizing antibodies.
In the present work, we aimed to go a step further and compare five commercial quantitative anti-spike protein antibody assays (4 of them with manufacturer's correction factor for the WHO standard) head-to-head in serum samples from 69 individuals who received a single dose of BNT162b (Pfizer/BioNTech).

RESULTS
Measurement ranges differ between binding assays. Twenty-nine female (42%) and 40 male (58%) participants with a median age of 42 years (29 to 51) were included. Results from the five different antibody binding assays are presented in Table 1 and Fig. 1 The measured values indicate that the numerical results are strongly dependent on the test system used. In the next step, we aimed to evaluate the overall agreement between the test systems.
Finally, the DiaSorin S1/2 IgG and the Serion IgG correlated at a r value of 0.91, and the Passing-Bablok regression equation was Serion IgG = 250.9 1 1.78x. All described relationships, as well as related residual plots, are presented in Fig. 2.
Furthermore, we assessed whether the classification of results into tertiles (0 to 33.3%, 33.4 to 66.7%, and 66.8 to 100%) was comparable, e.g., whether a sample yielding  In conclusion, the results of the investigated test systems correlate well but are not necessarily interchangeable. Several manufacturers provided conversion factors related to the WHO international standard for SARS-CoV-2 immunoglobulin, as described in Materials and Methods. Next, we wanted to clarify whether comparing values converted to binding antibody units (BAU) per milliliter instead of arbitrary values facilitates comparability.
Associations between standardized binding assay results. The numbers of BAU per milliliter were calculated for the Abbott S IgG, the DiaSorin TriS IgG, and the Serion IgG, according to the recently proposed conversion factors. Results from the Roche S tAb ECLIA did not require conversion, as indicated by the manufacturer.
As shown in Fig. 3, the recalculation of BAU per milliliter did not solve the problem of high proportional errors. The least proportional error could be observed for the relationship between Roche S tAb and Serion IgG. However, the same combination was characterized by comparatively high variability (r = 0.82). Correlation of binding assay results with a surrogate neutralization assay. In a final step, the binding assays' results were compared to percent inhibition of a surrogate virus neutralization test (sVNT). In the sVNT, the tested samples yielded median values of 63% (50 to 76%), ranging from 6% to 92%. Figure 4A illustrates that all binding assays except the DiaSorin S1/2 IgG showed a quadratic relationship with the sVNT. The binding assays also differentiated those values clustered in the upper range of the sVNT. However, for the DiaSorin S1/2, the quadratic curve approached a straight line, indicating a mostly linear relationship between this binding assay and the sVNT within the observed range.

DISCUSSION
SARS-CoV-2 antibody assays become important tools to evaluate the proportion of people affected by COVID-19 and identify those who are still at infection risk. Now, with the first vaccines available, a new field of use for SARS-CoV-2 antibody tests will open up. First, many vaccinated individuals will be interested in confirming their own vaccination success based on the detection of specific antibodies. Second, vaccinationinduced antibodies may be used as surrogate from which a protection correlate will be estimated. To date, only limited information on the performance of quantitative SARS-CoV-2 antibody assays is available, since most currently evaluated assays were developed in-house, as recently summarized by the CDC COVID-19 response group (29).
Only for a few commercially available quantitative CE-marked test systems are preliminary data on the performance given in the literature (17,20,(30)(31)(32).
Although a protection correlate for immunity in SARS-CoV-2 has not been defined yet, it is useful to begin this important preliminary work now (33). Therefore, we compared different commercial SARS-CoV-2 antibody assays with spike protein reactivity using a vaccination cohort to give a first insight into the comparability of these assays.
With regard to the numerical results, we were able to determine a broad distribution of values for each individual test system, so that these were presented on a logarithmic scale. This is in line with recently published reports, showing the antibody response after a single dose of BNT162b2 vaccine (3,4). Interestingly, in agreement with a study involving .500 participants in an identical study setting, we observed very similar mean values for the measurements with the DiaSorin S1/ S2 IgG: 66.3 AU/ml versus 68.6 AU/ml (3). Therefore, it is reasonable to assume that our cohort is representative despite the moderate number of participants. In addition, we were able to show that the results of the different test systems varied by a factor of up to more than 50. This leads to the initial conclusion that a direct comparability of the numerical results of different test systems is unlikely to be given across the range of individual findings. Differences also occurred with respect to measurement ranges, and upper measurement limits were exceeded in 2 of 5 systems (DiaSorin TriS IgG and Serion IgG), although the study cohort reflects the antibody response before the administration of the second dose of the Pfizer/BioNTech vaccine in SARS-CoV-2 naive individuals. However, it must be mentioned that it is not yet known up to what level a differentiation of the obtained values is meaningful. Nevertheless, it can be assumed that the average values of completely vaccinated persons are significantly higher than those in our collective, and thus, the upper measurement limits could frequently be exceeded in most assays. If clinically relevant, this could make additional dilution steps necessary, which are not yet taken into account by the manufacturers.
Despite the different levels of measurement, all systems showed good correlations with each other. When the measured values of the individual antibody tests were assigned to tertiles, good agreement was shown between the lowest third, the middle third, and the highest third of the results. Thus, one individual with known immunosuppressive therapy consistently showed no formation of antibodies in all five antibody binding assays tested. With defined cutoffs for low or high vaccination titers of the different test systems, at least a partial transferability of a result from one to another test system may therefore be expected.
Such transferability of results could also be anticipated via referencing the antibody assays used to an international reference standard (29). Indeed, a first WHO international SARS-CoV-2 antibody standard with the valence of 1,000 BAU/ml has recently become available. This standard was used by the manufacturers for four of the five assays studied. However, this standardization was introduced not during the establishment of the test system but post hoc as a reference to define a conversion factor of their own units in BAU per milliliter. It is therefore not surprising that this subsequent correction did not reduce the existing systematic deviations (Fig. 2) between the different tests. Only the Roche S tAb and Serion IgG tests were able to approximate the equivalence line, although here a very wide scattering of values around the trend lines was observed.
The in vitro binding of infection-associated antibodies to pathogen-specific antigens in an antibody test are important markers to objectify a past infection or vaccination. However, these do not necessarily say anything about the function of these antibodies (1). Only those antibodies that will prevent the virus from binding to the cellular receptor, the ACE2 receptor, via the surface spike protein (34,35), act as neutralizing antibodies. Tests to neutralize live viruses can only be performed in very specialized laboratories and unfortunately, in the case of SARS-CoV-2, are not standardized, making comparability almost impossible. For this reason, we chose to use a well-characterized surrogate virus neutralization test (sVNT) as a functional reference (36)(37)(38). In this assay, a simple enzymelinked immunosorbent assay (ELISA) format is used to determine the inhibition of conjugated receptor-binding domain (RBD) protein by neutralizing antibodies to the plate-bound ACE2 receptor. The manufacturer suggests a threshold for positivity of 30% inhibition. With the exception of the Serion IgG assay, where the median of samples with negative sVNT results was borderline (14 U/ml), the medians of sVNT-negative samples were above the thresholds for positivity in all other test systems. This implies that the cutoff values given for the respective test systems are only valid for the diagnosis of a past infection and do not necessarily represent a threshold value for the presence of sufficient neutralizing activity.
In conclusion, we found good correlation between all evaluated assays; however, the values from the different test systems were not interchangeable, even when converted to BAU per milliliter using the WHO international standard for SARS-CoV-2 immunoglobulin. Furthermore, it should be noted that the thresholds for positivity provided by the manufacturers are of diagnostic value and are not indicative of sufficient inhibitory capacities.

MATERIALS AND METHODS
Study design and participants. This prospective observational study was performed using sera collected in February 2021 from 69 individuals without a previous SARS-CoV-2 infection in the course of a workplace vaccination campaign in the metropolitan area of Vienna, Austria. The samples were taken 21 6 1 days (mean 6 standard deviation) after the first dose of the Pfizer/BioNTech BNT162b2 vaccine. We included vaccinated persons rather than individuals with a history of SARS-CoV-2 infection, in order to be able to compare test systems following a more or less standardized stimulus. Further inclusion criteria were an age of .18 years, whereas an insufficient amount of serum resulted in exclusion from the study. The study protocol was reviewed and approved by the Ethics Committee of the Medical University of Vienna (EK1066/2021). All participants provided written informed consent to donate blood for the evaluation of diagnostic test systems (EK404/2012). The studied complied with the World Medical Association Declaration of Helsinki regarding ethical conduct of research involving human subjects.
Laboratory procedures. Serum was obtained and stored at 2 to 10°C for ,7 days within the MedUni Wien Biobank, a centralized facility for the preparation and storage of biomaterial with certified quality management (ISO 9001:2015) (39). All analytical procedures were performed at the Department for Laboratory Medicine, Medical University of Vienna. The following CE-marked binding assays were applied: The Roche Elecsys anti-SARS-CoV-2 S (Roche S tAb) is an electrochemiluminescence sandwich immunoassay (ECLIA) and detects total antibodies directed against the receptor-binding domain (RBD) of the viral spike (S) protein. It was measured on cobas e801 modular analyzers (Roche Diagnostics, Rotkreuz, Switzerland). The quantification range is between 0.4 and 2,500.0 U/ml, and 0.8 U/ml is used as a cutoff for positivity.
The Abbott SARS-CoV-2 IgG II Quant-test (Abbott S IgG) is a chemiluminescence microparticle immunoassay (CMIA). It quantifies IgG-type antibodies against the RBD of the viral S-protein on an Abbott Architect platform (Abbott, Abbott Park, IL, USA) between 21.0 and 40,000.0 AU/ml, with $50 AU/ml as a threshold for positivity.
We excluded prior SARS-CoV-2 infection by using the Roche Elecsys SARS-CoV-2 ECLIA on the cobas e801 analyzer (Roche), which detects total antibodies to the viral nucleocapsid antigen. These antibodies are not induced by vaccination with BNT162b2. This assay yields high diagnostic sensitivity (90%) and specificity (99.7%) for infections that occurred at least 14 days before blood withdrawal (5). As suggested by the manufacturer, results of a cutoff index (COI) of .1.000 were considered positive.
Statistical analysis. Continuous data are given as medians and interquartile ranges; categorical data are given as counts and percentages. Test systems were compared by Passing-Bablok regressions. This method reveals differences between two test systems by estimating the slope (systematic proportional differences) and the intercept (systematic constant differences) of a linear regression line. The advantage of this well-established method over conventional linear regressions is that no preconditions regarding the distribution of the measured values and the measurement errors have to be met (42). Besides Passing-Bablok regressions, Cohen's kappa (linear weights) and Spearman rank correlations were applied to evaluate the agreement between binding assays. Kappa is a measure for assessing the compliance between two ratings. In detail, we evaluated the degree to which the different test systems agreed to classify the test samples into tertiles (43). Spearman rank correlation is a method to describe relationships between two variables that do not have to be linear. The relationship between binding assays and results from the sVNT was described by quadratic curve fitting. Statistical significance was assumed if P values were below 0.05. All analyses were performed using MedCalc 19.6 (MedCalc, Ostend, Belgium), and graphs were drawn using GraphPad 9 (GraphPad, La Jolla, CA, USA). The underlying data can be requested by interested researchers from the corresponding author. respective supplier (medac GmbH and DiaChrom), the Abbott S IgG kit and the DiaSorin TriS IgG kit were kindly provided by the manufacturers.
The MedUni Wien Biobank is part of the Austrian biobanking consortium BBMRI.at. There was no additional funding received for the present work.