Comparison of five Anti-SARS-CoV-2 antibody assays across three doses of BNT162b2 reveals insufficient standardization of SARS-CoV-2 serology

Objectives To investigate the comparability of WHO standard referenced commercial SARS-CoV-2 antibody tests over three doses of BNT162b2 vaccine and up to 14 months. Methods 114 subjects (without previous SARS-CoV-2 infection or immunosuppressive medication) vaccinated with three doses of BNT162b2 were included in this study. Antibody levels were quantified 3 weeks after the first dose, 5–6 weeks and 7 months after the second dose, and 4–5 weeks and 4 months after the third dose using the Roche Elecsys SARS-CoV-2 S, the Abbott SARS-CoV-2 IgG II Quant, the DiaSorin LIAISON SARS-CoV-2 TrimericS IgG, the GenScript cPASS sVNT and the TECO sVNT assays. Results For each time point analyzed, systematic differences are evident between the results in BAU/mL of the three antibody binding assays. The assay ratios change in a time-dependent manner even beyond administering the third dose (Roche measuring 9 and 3 times higher than Abbott and DiaSorin, respectively). However, changes decrease in magnitude with increasing time intervals from the first dose. IgG-based assays show better agreement across them than with Roche (overall correlations: Abbott x DiaSorin: ρ = 0.94 vs. Abbott x Roche: ρ=0.89, p < 0.0001; DiaSorin x Roche: ρ = 0.87, p < 0.0001), but results are not interchangeable. The sVNTs suggest an underestimation of antibody levels by Roche and slight overestimation by both IgG assays after the first vaccine dose. Conclusions Standardization of SARS-CoV-2 antibody binding assays still needs to be improved to allow reliable use of variable assay systems for longitudinal analyses.

Objectives: To investigate the comparability of WHO standard referenced commercial SARS-CoV-2 antibody tests over three doses of BNT162b2 vaccine and up to 14 months. Methods: 114 subjects (without previous SARS-CoV-2 infection or immunosuppressive medication) vaccinated with three doses of BNT162b2 were included in this study. Antibody levels were quantified 3 weeks after the first dose, 5-6 weeks and 7 months after the second dose, and 4-5 weeks and 4 months after the third dose using the Roche Elecsys SARS-CoV-2 S, the Abbott SARS-CoV-2 IgG II Quant, the DiaSorin LIAISON SARS-CoV-2 TrimericS IgG, the GenScript cPASS sVNT and the TECO sVNT assays. Results: For each time point analyzed, systematic differences are evident between the results in BAU/mL of the three antibody binding assays. The assay ratios change in a time-dependent manner even beyond administering the third dose (Roche measuring 9 and 3 times higher than Abbott and DiaSorin, respectively). However, changes decrease in magnitude with increasing time intervals from the first dose. IgG-based assays show better agreement across them than with Roche (overall correlations: Abbott x DiaSorin: ρ = 0.94 vs. Abbott x Roche: ρ=0.89, p < 0.0001; DiaSorin x Roche: ρ = 0.87, p < 0.0001), but results are not interchangeable. The sVNTs suggest an underestimation of antibody levels by Roche and slight overestimation by both IgG assays after the first vaccine dose. Conclusions: Standardization of SARS-CoV-2 antibody binding assays still needs to be improved to allow reliable use of variable assay systems for longitudinal analyses.

Background
SARS-CoV-2 antibody tests are, besides their value in epidemiological and immunological research, essential tools to detect poor response to vaccination, especially in certain risk groups [1][2][3]. Many standardized immunoassays are now available for quantifying binding antibodies or estimating virus neutralization capacities [4,5]. Their diagnostic and clinical performance could be accurately assessed using pre-pandemic and well-characterized convalescent samples [6]. For individual test systems, correlations were obtained between binding antibody assay results (standardized in BAU/mL) and virus-neutralizing activity (using either live virus, pseudovirus, or surrogate neutralization assays). Thus, it was suggested that using these standardized values from easy-to-perform binding assays, one can infer the results of functional virus neutralization assays to approach the question of protective correlates or cut-offs for therapeutic/preventive use of SARS-CoV-2 monoclonal antibody therapies, which was also strongly endorsed by the scientific community [7].
Unfortunately, it is now well established that although manufacturers standardized the results of SARS-CoV-2 binding antibody tests using the first WHO immunoglobulin standard [8], the various test systems are not interchangeable. Therefore, when two different test systems are used sequentially to assess response to vaccination, the observed divergence could be misinterpreted as a biological increase or decrease in antibody levels. However, changes would be merely attributable to a hidden analytical variability.
Systematic differences between test systems of different providers can usually be detected by statistical means and corrected using a conversion formula. But this assumes that the differences between test results remain constant and do not change due to time, immunological events, etc. However, there is some evidence that this is not the case for SARS-CoV-2 antibody tests. Their agreement may, for example, be timedependent, whereby the conversion factor between two tests can even invert over time [9]. Previously, we showed that after vaccination with AZD1222 (AstraZeneca), the conversion factor between quantitative anti-Spike (S) antibody assays from Roche and Abbott evolved from 1:3 a few weeks after the first dose, over 2:1 before the second dose, to finally 5:1 three weeks after the second dose. The different detection mechanisms could explain the time-labile agreement between these two tests.
We, therefore, aimed to evaluate whether results from another IgGbased assay, the DiaSorin LIASION Trimeric Spike assay (DiaSorin, Stillwater, USA), were in better agreement with the Abbott assay than with the Roche assay. Currently, mRNA vaccines are predominantly used [10]. For the widely used mRNA vaccine BNT162b2 (Pfizer/BioNTech), there is only limited work reporting the comparability of antibody results after one or two doses [5,[11][12][13][14].
It remains to be clarified whether the described time-dependent differences between the test systems also occur after vaccination with BNT162b2 and how they develop after the third dose of vaccine. The present study aims to fill this gap.

Study design and participants
Samples from 114 of 124 participants from the MedUni Vienna Biobank cohort of healthy volunteers were included in this prospective performance evaluation study. The participants received their COVID vaccination with BNT162b2 as part of an occupational vaccination program outside of this study [5]. In brief, the first two doses were administered within three weeks. Participants received their second booster dose (3 rd injection) after a median of 273 (269-274) days after dose 2.
All individuals willing to participate were included unless SARS-CoV-2 infection before the first dose (n = 9) or serological nonresponse due to a severely compromised immune system (n = 1) led to exclusion. Blood was donated at the following time points: 3 weeks (21

Preanalytical procedures
All samples were collected during voluntary medical check-ups by an occupational physician. Samples were then transferred to the MedUni Wien Biobank, where they were processed and stored according to standard operating procedures in an ISO 9001:2015-certified environment [15].

Antibody assays
A detailed description of the antibody assays used (Roche SARS-CoV-2, Abbott S II IgG, DiaSorin Trimeric S, cPASS sVNT, TECO sVNT) is given in the supplement.

Statistical analysis
Continuous data are presented by medians and interquartile ranges, and categorical data by counts and percentages. Quantitative Assays are compared by Passing-Bablok regression and Spearman's rank correlation. Groupwise comparisons were performed by Mann-Whitney U-tests or repeated measurements ANOVA with Bonferroni-corrected post-hoc t-tests. P-values <0.05 were considered statistically significant. The predictive performance of binding assay levels with respect to sVNTs was assessed using ROC (receiver operator characteristics) calculations. All calculations were performed using MedCalc v20 (MedCalc bvba, Ostend, Belgium) and analyze-it 5.66 (analyze-it Software Ltd, Leeds, UK).

Agreement of three SARS-CoV-2 antibody binding assays across three doses BNT162b2
Comparisons of binding assay levels at T1 have already been reported earlier from the pre-final cohort [5]. Fig. 1a presents median levels for each time point and test system. Due to typical post-vaccination kinetics, antibody levels overlap significantly at certain time points. Thus, at time points 1 and 3 (Fig. 1a, blue) and, to some extent, also at time points 2 and 5 ( Fig. 1a, red), there is a relevant overlap of measured values for the IgG assays (Abbott and DiaSorin), that is not seen for the sandwich assay (Roche). Fig. 1b shows pairwise comparisons of the three binding tests for all five time points. The above-mentioned lack of overlap in results for time points 1, 3, and 2, 5 in the Roche assay results in a time-dependent greater dispersion of the dot-blots, forming distinct time-dependent populations above and below the line of equality.
Therefore, when comparing the IgG-based assays (Abbott and Dia-Sorin), the overall dispersion is significantly lower and the observed correlation higher (Abbott x DiaSorin: ρ=0.94) than for the combinations with the Roche test (Abbott x Roche: ρ = 0.89, p < 0.0001; Dia-Sorin x Roche: ρ = 0.87, p < 0.0001). However, Fig. 1b reveals a higher variability for the Abbott and the DiaSorin assay if blood was drawn shortly after a booster shot (time points 2 and 4).
The Passing-Bablok regressions for the individual time points allow an even more detailed analysis of these associations (Fig. 2). The correlation coefficients between Abbott and DiaSorin are ≥ 0.90 at time points 1, 3, and 5 but decrease significantly after the first and second booster (T2: ρ = 0.68, p for difference < 0.0001; T4: ρ = 0.71, p < 0.001). Also, the previously described time-dependent higher variation of the Roche sandwich/IgG assay pairs is evident in Fig. 2. While the Roche assay detects much lower antibody levels three weeks after the first dose than Abbott or DiaSorin, levels rise and become comparable with the IgG assays five weeks after the second dose and finally prominently exceed the values measured with the IgG assays from the third time point. In contrast, the comparisons of the IgG assays across all time points do not show such dramatic changes or inverse correlations. Nevertheless, IgG assays also show time-dependent changes in both systematic differences (at all time points) and increased variability (at time points 2 and 4, after booster vaccinations).
To exclude the possibility that the observed associations were due to the inclusion of differing individuals at each time point, the calculations were repeated using only those 44 individuals from whom data were available at all time points. Again, time points 1 and 3, or 2 and 5, presented overlapping results in the IgG-based assays but not in the Roche assay. In the following steps, we evaluated how the variability in binding antibody levels is reflected in sVNT results throughout three doses of BNT162b2.

Changes and agreement of levels of two sVNT assays across 3 doses of BNT162b2
We applied two FDA-approved/CE-marked surrogate virus neutralization tests: the TECO sVNT (TECO Medical) and the cPASS (Gen-Script). Median values, ranges, and outliers for each time point are presented in Fig. 3a. At time points 2, 4, and 5, the assessed sVNTs were consistently above 95% and therefore not useful in further discrimination of different binding assay levels.  6-96.8]%, p = 0.004), however, the TECO assay appeared to yield significantly lower results than the cPASS sVNT. In the cPASS sVNT, samples of one vaccinee showed a paradoxical course. After the booster shots, low (T2=8.1%) or even numerically negative (T4=-31%, T5=-5.9%) values were found. Using TECO sVNT, this phenomenon could not be objectified.
Applying a 20% cut-off for the TECO sVNT and 30% for the cPASS sVNT (as suggested by the manufacturers), both tests presented only a fair agreement (Kappa=0.36±0. 19), as seven of the 410 samples were positive in only one of the two assays.

Correlations between binding assays and sVNTs
For the TECO sVNT, in addition to the semi-quantitative evaluation in %inhibition, a CE-labelled standard curve was generated. This allows the calculation of IU/mL standardized according to the first WHO standard for SARS-CoV-2 immunoglobulins. Then the quantitative and semi-quantitative readouts were correlated with the results of the binding assays. As presented in Fig. 4, there was no linear relationship between sVNTs and either of the binding assays. However, the graphs shown in Fig. 4 underline that the Roche assay underestimates samples with similar neutralizing capacity at time point 1. In contrast, the reverse appears true for the IgG-based assays, where higher levels of antibodies are measured at time 1 than the sVNT would suggest.
The next step was to narrow the range of antibody binding concentrations in which the sVNTs have sufficient discriminatory power. As shown in Supplementary Fig. 4 values above ca. 95% inhibition do not allow any further reasonable differentiation. The regression line approaches a horizontal line, and large differences in the values of the binding tests are associated with only small changes in the sVNT results. ROC analyses were calculated to determine at which binding assay level an sVNT value of 95% was most likely to be achieved (criterion associated with the maximum Youden index). These calculations are only meaningful for time points 1 and 3, since at the other time points the sVNT values already show >95% inhibition on average (see Fig. 3A). Again, a time-dependent relationship is evident: the criterion value in BAU/mL, at which saturation of sVNT is reached with high probability not only depends on the binding assay used but particularly also on the time of blood sampling (Supplementary Tbl. 2). Whereas the cut-off criterion for the Roche assay increases between T1 and T3, it decreases for the IgG-based assays. Therefore, it is not possible to establish a generally applicable threshold for the binding assays above which an sVNT determination is no longer useful; it can only be inferred that this value could be somewhere between 10 2 and 10 3 BAU/mL.

SARS-CoV-2 infection or vaccination leaves immunological traces by
forming SARS-CoV-2 specific antibodies. The importance of those antibodies as a scientific and epidemiologic tool should be evident. For example, we have seen that heterologous vaccination regimens may offer advantages over homologous ones [16,17], that certain types of immune suppression have a particularly negative impact on vaccination response [18,19], that individuals with prior SARS-CoV-2 infection show robust antibody production after the first dose of vaccination [20][21][22], or that post-booster antibody levels can be predicted from pre-booster levels in BNT162b2 vaccination, although the strength of the association depends on the assay used [23,24].
Many SARS-CoV-2 antibody assays are now available based on the spike protein's antigens, provide quantitative results, and allow conversion to BAU/mL referenced to the WHO standard.
Nevertheless, we and others have shown that conversion of results to BAU/mL was insufficient to make different serological assays interchangeable [5,[24][25][26][27]. In AZD1222 vaccinated participants, we demonstrated for the first time that the correlation between two antibody test results changes depending on the time of blood collection [9].
Here we used a cohort receiving three doses of BNT162b2 and analyzed the serological response at five time points over 14 months using three different SARS-CoV-2 antibody binding tests (Abbott, DiaSorin, and Roche) and two sVNTs (GenScript cPass and TECO).
We could confirm the substantial time-dependent changes in the comparability of the different antibody tests (Fig. 3). Although the comparison between Abbott and Roche showed a nearly identical ratio for the time point after the first dose of vaccine (Abbott twice as high as Roche), the response to the second dose differed substantially between the AZD1222 and BNT162b2 cohorts (Roche 5 times higher than Abbott versus approximately equal). Thus, vaccine type may also influence the comparability. Nevertheless, the varying time interval from the first to the second vaccine dose (11 wks vs. 3 wks) must be pointed out. Indeed, it is likely that not the number of immunization events but rather the time since the first immunization might influence the relationship between the test systems (Suppl. Fig. 3). So, the previously described underestimation of antibody levels by the Roche assay three weeks after the first dose compared to the Abbott assay and the GenScript cPass sVNT could explain these observed discrepancies [5]. After 11 weeks instead of 3 weeks, the primary induced B cells may have already started producing more mature antibodies, which are then boosted by the second dose and can be better detected by the sandwich assay (Roche). Fittingly, a letter by Nakagama et al. reported the strong dependence of the Roche assay on high-avidity antibodies to efficiently bind the double-antigen sandwich [28]. The observed lower Roche levels after the first dose compared to Abbott and DiaSorin have also been confirmed in other studies [25,27].
Interestingly, the two IgG-based assays pairs performed more uniformly than those involving the Roche sandwich assay. These findings align with the described lower correlation coefficients for pairs involving the Roche test [26,27]. So, measurements with the DiaSorin test consistently yielded higher values (1.5 to 3-fold higher) across all  time points. In contrast, the comparisons of IgG-based tests with the sandwich assay show a broad spectrum from a factor of 0.1 to 6.
To illustrate this further, a value of roughly 160 BAU/mL in the Abbott assay or 500 BAU/mL in the DiaSorin assay corresponds to approximately 80 BAU/mL in the Roche assay, when blood was drawn directly before the 2nd dose but might be >800 BAU/mL in the Roche test, when the sample was collected seven months after the 2nd dose.
After the third dose, approximately 7-9 times higher values in BAU/ ml are measured with the Roche assay than with the Abbott assay and 3-4 times higher than with the DiaSorin assay. However, it is striking that between time point 4 (4-5 weeks after the third dose) and time point 5 (4 months after the booster), there are no major differences in the comparability of the Roche test with Abbott or DiaSorin. A possible explanation would be that the maturation of the vaccine-induced antibodies, which may be necessary for optimal binding in the sandwich assay, progresses with the time interval to primary immunization. Consistent with this we found a correlation showing a decrease in the change of the ratio with increasing time interval up to the first vaccine dose and a flattening from time 4 onwards for Roche/Abbott and Roche/ DiaSorin, respectively; in contrast, no such time-dependent dynamics are found when comparing the two IgG assays (see Supp. Fig. 3).
The SARS-CoV-2 immunoglobulin standard should be used only to directly compare test systems using the same isotype or target antigen [8]. The available data in the literature are controversial; e.g., Infantino et al. found a better agreement in test systems with the same antigens [26]. In contrast, Swadzba et al. could not confirm this [27]. In our work, we used two assays with S-RBD as antigen (Roche and Abbott), but since they are methodologically fundamentally different (sandwich vs. IgG assay), we cannot draw any concrete conclusions. However, our data demonstrate that the general assumption of better comparability when using the same antigen is not unconditionally valid. In this context, it should also be taken into account that minor differences in the antigens used by different manufacturers (which then nevertheless bear the same name, e.g., RBD) may influence the available epitopes and thus the probable reactivity with antibodies in the sample. Furthermore, the varying optimal measuring ranges, dilution protocols and the linearity behavior of different test systems should be considered as a possible cause for the limited comparability.
In addition to the antibody binding tests, we have also performed two surrogate virus neutralization tests (sVNT) closer to the functional gold standard of the live virus neutralization test [29]. However, since these sVNTs were developed primarily to detect infection, the ranges are limiting for post-vaccination levels. For instance, in our cohort, we can discriminate a relevant spectrum of antibody levels only three weeks after the first (T1) and 7 months after the second dose (T3). At all other times, most samples are at or near the saturation level of the test systems. The sVNTs map the antibody kinetics of the binding assays, although there are some specific differences, and even within the two sVNTs, the concordance is not optimal. In contrast, Krütgen et al. [30] found a better correlation between the DiaSorin S1/S2 IgG assay and the cPASS as well as the TECO sVNT after mRNA-1273 vaccination, but discrepancy this might be explained by a 1:20 predilution of all samples, so that the measurement range for the tested assays was optimized.
Our most important finding regarding sVNT is that both the cPass and the TECO sVNT show an underestimation of the Roche readings for time point 1 compared to time point 3, whereas the opposite seems to be the case for the IgG assays. This picture is again in line with the assumption that Roche might tend to underestimate early developing antibodies.
This work has strengths and limitations: we are unaware of any study that has systematically used 5 SARS-CoV-2 antibody tests over three doses of vaccine up to 14 months to examine the comparability of SARS-CoV-2 antibody tests. A limiting factor is that of the 114 participants included, only 44 participants could be included for all blood collection time points. However, in a subgroup analysis, this subgroup did not differ from the entire group.
In summary, we show that the interchangeability of WHOstandardized SARS-CoV-2 antibody test systems is not given. Results from different serological assays differ at each time point analyzed. We also see sustained time-dependent changes in comparability of SARS-CoV-2 antibody assays over three vaccine doses. Comparability is neither improved nor established by using the same target antigen nor the same isotype. Therefore, using BAU/ml should not be interpreted to mean that different SARS-CoV-2 test systems are interchangeable. So, an immunoassay optimally adapted to the planned use should be chosen for longitudinal analyses, which should remain the same.

Declaration of Competing Interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: The Department of Laboratory Medicine (Medical University of Vienna) received compensations for advertisement at scientific symposia from Roche, Abbott and DiaSorin and holds a grant for evaluating an in-vitro diagnostic device from Roche outside of the present study. NPN received a travel grant from DiaSorin. The manufacturers/distributors provided part of the analysis kits (Abbott, DiaSorin, medac, TECO).