Assay Harmonization Study To Measure Immune Response to SARS-CoV-2 Infection and Vaccines: a Serology Methods Study

ABSTRACT The Coronavirus disease 2019 (COVID-19) pandemic presented the scientific community with an immediate need for accurate severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) serology assays, resulting in an expansion of assay development, some without following a rigorous quality control and validation, and with a wide range of performance characteristics. Vast amounts of data have been gathered on SARS-CoV-2 antibody response; however, performance and ability to compare the results have been challenging. This study seeks to analyze the reliability, sensitivity, specificity, and reproducibility of a set of widely used commercial, in-house, and neutralization serology assays, as well as provide evidence for the feasibility of using the World Health Organization (WHO) International Standard (IS) as a harmonization tool. This study also seeks to demonstrate that binding immunoassays may serve as a practical alternative for the serological study of large sample sets in lieu of expensive, complex, and less reproducible neutralization assays. In this study, commercial assays demonstrated the highest specificity, while in-house assays excelled in antibody sensitivity. As expected, neutralization assays demonstrated high levels of variability but overall good correlations with binding immunoassays, suggesting that binding may be reasonably accurate as well as practical for the study of SARS-CoV-2 serology. All three assay types performed well after WHO IS standardization. The results of this study demonstrate there are high performing serology assays available to the scientific community to rigorously dissect antibody responses to infection and vaccination. IMPORTANCE Previous studies have shown significant variability in SARS-CoV-2 antibody serology assays, highlighting the need for evaluation and comparison of these assays using the same set of samples covering a wide range of antibody responses induced by infection or vaccination. This study demonstrated that there are high performing assays that can be used reliably to evaluate immune responses to SARS-CoV-2 in the context of infection and vaccination. This study also demonstrated the feasibility of harmonizing these assays against the International Standard and provided evidence that the binding immunoassays may have high enough correlation with the neutralization assays to serve as a practical proxy. These results represent an important step in standardizing and harmonizing the many different serological assays used to evaluate COVID-19 immune responses in the population.

IMPORTANCE Previous studies have shown significant variability in SARS-CoV-2 antibody serology assays, highlighting the need for evaluation and comparison of these assays using the same set of samples covering a wide range of antibody responses induced by infection or vaccination. This study demonstrated that there are high performing assays that can be used reliably to evaluate immune responses to SARS-CoV-2 in the context of infection and vaccination. This study also demonstrated the feasibility of harmonizing these assays against the International Standard and provided evidence that the binding immunoassays may have high enough correlation with the neutralization assays to serve as a practical proxy. These results represent an important step in standardizing and harmonizing the many different serological assays used to evaluate COVID-19 immune responses in the population.
KEYWORDS serology, SARS-CoV-2, antibodies, harmonization Given the above observations regarding SARS-CoV-2 serological assay variability, and the hundreds of COVID-19 serology tests available, including more than 86 with FDA EUA approval, there are several fundamental questions that comparison studies have sought to address (19). Studies working to explore the reliability of different serology assays have found variable sensitivities and specificities (20)(21)(22)(23)(24)(25)(26)(27)(28)(29). Some studies have illustrated variability by comparing measurements of seroprevalence in their sample population or percent positive samples found in confirmed infection or polymerase chain reaction (PCR) verified samples (30,31). Finally, studies highlighting the current challenges of comparing different serology assay results have proposed a number of solutions, from harmonizing against the WHO International Standard (WHO IS) to comparing to an in-house reference standard (20,22,26,28,31,32). This is the first study to directly compare 27 serology assays between laboratories, using the same set of samples with a wide range of antibody levels. In addition, this study seeks to analyze the feasibility of using the WHO IS as a global harmonization tool to facilitate comparison of results between assays.
A number of serology method comparison studies have also sought to answer the question of how closely traditional enzyme-linked immunosorbent assay (ELISA) or binding immunoassays correlate with neutralization results (33). Bonhomme et al. reported a weak correlation between antibody binding and antibody neutralization in convalescent-phase serum samples, but a good correlation between binding and neutralization in sera from vaccinated individuals (21). Other studies have reported variable correlation coefficients between binding and neutralizing antibody titers ranging from 0.127 to 0.916 (22,27,34,35). This study has therefore evaluated the correlation between binding antibody assay and neutralization assay results across different labs, with the goal of addressing this issue.

RESULTS
Participating institutions, sample selection, and assay design. In total, 17 institutions participated in the study, which accounted for 27 selected assays measuring antibody concentrations consisting of 18 antibody binding assays (including 2 multiplex (3-plex and 5-plex) assays) and 9 (7 live virus, 2 surrogate virus) assays measuring neutralization activity against SARS-CoV-2 (Table 1). To account for the antibody responses measured from the antigens included in the two multiplex assays, each antibody response from the individual antigens was analyzed independently, so the total number of Assay IDs assigned was 33. Participating laboratories were sent 3 frozen aliquots of the US SARS-CoV-2 Serology Standard, in addition to 3 frozen aliquots of a panel of 120 samples from uninfected, infected, vaccinated healthy volunteers, WHO IS, and monoclonal antibodies to SARS-CoV-2 ( Table 2). The shipments to the participating institutions occurred between August 2021 and September 2021, and all the testing was completed and data sent back for analysis by April 2022. The vaccinated donors received the primary series of the Moderna, Pfizer, Johnson & Johnson, or AstraZeneca vaccine in December 2020 to April 2021. For each day of testing, the US Serology Standard was to be tested in triplicate, and the panel of 120 samples tested in duplicate. Participating laboratories were to follow in-house standard operating procedures when testing the samples. The 3 aliquots of sample or US Serology Standard were to be tested on 3 separate days to determine inter-day variability.
Assay specificity and sensitivity. Assays were categorized for evaluation as commercial, laboratory developed (LDT) or in-house, or neutralization. Specificity and sensitivity were measured for each assay and graphed for visualization of these performance factors (Fig. 1). Acceptable assay performance cutoffs for specificity and sensitivity were $93% and $90%, respectively (2). Most commercial assays demonstrated high specificity ($93%) and sensitivity ($90%), while in-house assays performed well in sensitivity ($90%) (Fig. 1). Neutralization assays showed variable results across both measures (Fig. 1).
To determine if variation in sensitivity was affected by sample characteristics, the vaccination results were analyzed against infection according to assay class (Table 3 to 5). In all three assay types, there did not appear to be any significant difference in sensitivity according to whether the sample was collected from an individual who had been infected versus an individual who had been vaccinated against SARS-CoV-2 (Table 3 to 5). However, when results were separated according to known sample antibody levels, lower levels of antibody resulted in higher variability in sensitivity, meaning that some assays struggled to detect at these levels ( Table 6, Table S1 and S2). Of note, the vaccinated samples were excluded from analysis with the Nucleocapsid antibody detecting assays. Assay reproducibility. Assay reproducibility was measured for within-day replicates and across-day replicates ( Table 7 to 9). Commercial assays showed excellent reproducibility both within-day and across-day, with all showing variability below 20% (Table 7). In-house assays had acceptable levels of variability, with about half showing an overall percent variability below 20% (Table 8). Neutralization assays showed high levels of variability in both within-day and across-day replicants (Table 9).
Harmonization with the WHO International Standard. The three assay classes were harmonized against the WHO IS, and their measures of the US serological standard compared against the calibrated value. The harmonized results of the commercial tests had high correlation with the calibrated antibody level value ( Fig. 2A), as did many of the neutralization assays (Fig. 2C). The in-house assays had the highest variability when comparing the results against the US Serology Standard calibrated value (Fig. 2B).
Correlation between antibody level and neutralization assay results. Commercial and in-house assays were compared against neutralization using Pearson's Correlation (Fig. 3). Commercial assays demonstrated high correlation with neutralization ( Fig. 3A), while the in-house assays had somewhat lower levels of correlation (Fig. 3B). Total 121 a Samples were from unique individuals. b Collected from subjects who were unvaccinated and uninfected with SARS-CoV-2, and these samples were used for assay cutoff evaluations.
FIG 1 Sensitivity and specificity measurements of commercial, in-house, and neutralization serological assays. Graphical representation of the sensitivity and specificity of all tested assays. Specificity $93% and sensitivity $90% were considered acceptable. Data analysis was performed prior to harmonization.

DISCUSSION
The results of this study highlight the variability in performance of currently available anti-SARS-CoV-2 serological assays. Commercial assays were more accurate than in-house, and neutralization-based assays varied greatly in both sensitivity and specificity ( Fig. 1). Most of the commercial assays presented with high specificity ($93%), which is perhaps due to the high standards for validation in the commercial setting. The in-house lab-developed assays, on the other hand, excelled in sensitivity (with the majority reaching $90% sensitivity), while some also presented with high specificity ($93%).
The neutralization assays had the most variability in both sensitivity and specificity, with 5/8 reaching a specificity above the 93% threshold, and 3/8 measuring above the 90% sensitivity threshold (Fig. 1). Note that only 8 of the 9 neutralization assays were evaluated for sensitivity and specificity because one did not have a predefined assay cutoff value, so seropositive and seronegative samples could not be determined. No neutralization assays measured above threshold for both parameters. Such variability is to be expected in neutralization assays, as these protocols are known to vary widely (8,12,36,37). This variability provides further evidence for the potential utility of using binding immunoassays as a proxy for neutralization in high-throughput and/or clinical testing of the population for immune response against the SARS-CoV-2 virus.  To understand the source of the variability in the assays, particularly in the case of the neutralization, the results were separated by vaccination versus infection sample type. There was no significant difference between the two sample types for any of the three assay classes (Table 3 to 5). However, when the samples were separated according to antibody level, a relationship did emerge between the variability of the sensitivity of the assay and antibody level. Specifically, the variability in sensitivity of the assays was higher for samples that were known to have low levels of antibody ( Table 6, Table S1 and S2). These results suggest that a number of the assays did not reliably detect low levels of antibody, contributing to the variability in performance in all three classes. These results are in accordance with other studies both within and outside SARS-CoV-2 research, which have reported lower sensitivity in samples with lower antibody levels (24,30,38).
The assays were also evaluated for within-day and across-day variability to estimate reproducibility. Reproducibility of commercial assays was excellent, with all having a percent variability less than 20% ( Table 7). The in-house assays had greater variability, but still exhibited acceptable reproducibility, with about half having a variability lower than 20% (Table 8). Mirroring the variability seen in sensitivity and specificity, neutralization assays exhibited low reproducibility, with the majority having a variability higher than 20% (Table 9). As stated above, this high variability was expected, as neutralization assays are known to have low reproducibility and high coefficients of variability due to the fact that they are cell-based (8,12,36,37).
An important question in the evaluation of serological assays is if different tests can be compared to each other despite using different parameters, measures, and reagents as well as being developed and used in different laboratories. Using a method previously a FDA recommended performance thresholds (Pan Ig and IgG): Specificity $93%; Sensitivity $90%. Data highlighted in bold if below FDA recommended performance thresholds. Sensitivity and specificity analysis were not evaluated for Assay ID 33 due to no reported cutoff value for the assay, and dashes were added to indicate the data was not analyzed. Data analysis was performed prior to harmonization. Sensitivity and specificity analysis were not evaluated for Assay ID 33 due to no reported cutoff value for the assay, and dashes were added to indicate the data was not analyzed. Data analysis was performed prior to harmonization.

SARS-CoV-2 Serology Assay Harmonization Study
Microbiology Spectrum developed and employed for the evaluation of HPV serological assays, all three classes were harmonized against the WHO IS and used to measure the US Serology Standard. This strategy was utilized with the goal of comparing the results of the different assays for the US Serology Standard against the known value of the standard, which had previously been calibrated against the WHO IS. The commercial assays performed well in harmonization, with only one, which measured pan IgG against the S1 protein, measuring far below the calibrated value of the standard ( Fig. 2A). This deviation may be due to the S1 antigen used in the assay not being properly recognized by the antibodies in the US Serology Standard. The in-house developed assays had the highest variability upon harmonization, as there was more scatter compared to commercial assays (Fig. 2B). Several of the neutralization assays after harmonization were able to measure the US Serology Standard around the assigned value (Fig. 2C). These results suggest that harmonization against the WHO IS is a feasible method for the comparison of different serological assays across platforms. These results also provide evidence that such standards are applicable for the evaluation of serological assays across different viruses, as it has been shown to be effective in both anti-HPV and SARS-CoV-2 antigen serology tests. Finally, the commercial and in-house assays were evaluated for correlation with neutralization. Neutralization assays are considered the best way to measure the actual biological activity of anti-SARS-CoV-2 antibodies, as they measure total immunoglobulins (IgM, IgA, and IgG) that exhibit in vitro neutralizing activity (16,17). Neutralization is going to be especially important as new variants of interest continue to emerge. As the virus changes, existing antibodies  that bind the SARS-CoV-2 virus may no longer actually neutralize the variant, and thus not confer effective immunity. Consequently, total antibody measures, such as those given by binding immunoassays, may not truly represent protection against the latest variants (16,18). However, neutralization assays are impractical for clinical or population level use, as they do not lend themselves to multiplexing or high-throughput testing, are complex and time-consuming, and have low levels of sensitivity, specificity, and reproducibility (Table 9) (8,12). Binding immunoassays such as the commercial and in-house developed assays are far more practical as they are fast, high-throughput, amenable to multiplexing, and often reproducible ( Table 7 and 8) (8). However, these assays only detect one class of antibody at a time, recognize both conformation and nonconformational epitopes, and measure all antibodies that bind to the antigen in question, regardless of neutralization activity (8). Given these caveats, it is necessary to determine if binding immunoassays can be reliably used to measure immunogenicity of vaccines in place of neutralization, which is technically a more direct measure of the mechanism at work. In this study, both the commercial and the in-house assays correlated well with neutralization, with commercial assays having slightly higher correlations with neutralization than the in-house assays (Fig. 3). These results are consistent with some previous studies, while other studies have reported much lower levels of correlation (22,27,34,35). The lowest correlations were found in the samples with low antibody levels, suggesting correlation with neutralization was also negatively associated with the inability of the assays to detect lower levels of antibody (data not shown).
The COVID-19 pandemic brought high needs for testing and evaluation of the immunogenicity of viral infection and vaccination on the population level, and with them unprecedented rates of scientific development in the field of serology (2)(3)(4). Previous studies have shown significant variability in these assays, both highlighting the significant need for evaluation and validation and the requirement of a method to easily and quickly compare results (2,8,12). This study evaluated the performance and reproducibility of commercial and in-house developed immunoassays and neutralization assays. This study demonstrated that there are high performing assays that can be used reliably to evaluate immune responses to SARS-CoV-2 in the context of infection and vaccination. This study also demonstrated the feasibility of harmonizing these assays against the WHO IS and provided evidence that the binding immunoassays may have high enough correlation with neutralization to serve as a practical proxy. This study has potential for far reaching effects in public health and serosurveillance. These results provide key information regarding performance characteristics of a multitude of assays using a common set of samples from infected and vaccinated individuals evaluated side-by-side and represent an important step in standardizing and harmonizing the many different serological assays used to evaluate SARS-CoV-2 antibody responses in the population.

MATERIALS AND METHODS
Study design. This serology comparison study was conceptualized as part of SeroNet and the Clinical and Translational Serology Task Force (CTTF) activities, a consortium of laboratories and institutions  Standard (n = 1), and monoclonal antibodies (n = 2) were selected and aliquoted in a manner to blind the recipient laboratories, so they would not be able to discern seronegative samples from seropositive samples.
All samples were categorized as seronegative, low, intermediate, or high responder as defined by preliminary evaluations at the FNL Serology Lab and antibody test results from CDC. The samples from SARS-CoV-2 uninfected/unvaccinated individuals were collected between July 2020 and January 2021, and these samples were selected based on a negative antibody response at both confirmatory laboratories (FNL Serology Lab and CDC). The categorization of the low, intermediate, and high antibody response groups was evaluated separately for the SARS-CoV-2 infected donors and vaccinated donors. As many of the vaccinated donor samples had strong antibody responses to vaccination, we contrived 34 vaccinated samples by diluting the serum from vaccinated individuals with seronegative serum or sera from other vaccinated donors to generate the three levels of antibody responses. Samples were sent to the participant laboratories for testing using the selected SARS-CoV-2 serology binding and viral neutralization assays (Table 2).
WHO International Standard (IS). The first WHO International Standard for anti-SARS-CoV-2 (WHO IS) was established and released in December of 2020 and made available under code 20/136. The assigned unitage of the IS is 1,000 International Units (IU) neutralizing activity per milliliter or 1,000 binding antibody units BAU (binding antibody units) per milliliter for binding antibody assays when resuspended in 250 mL of water (11).
US SARS-CoV-2 Serology Standard. The US SARS-CoV-2 Serology Standard is a pool of plasma samples containing IgM and IgG from four donors targeting SARS-CoV-2 spike and nucleocapsid proteins (4). The US Serology Standard was calibrated against the WHO IS, showing a neutralization value of 813 IU/mL and an antibody concentration of 246 BAU/mL and 764 BAU/mL, respectively, for the anti-SARS-CoV-2 Spike IgM and IgG, and 1037 BAU/mL, and 681 BAU/mL, respectively, for the Nucleocapsid IgM and IgG.
Sample data. Basic characteristics of each individual and relevant vaccination information or infection status were captured. These data include, when appropriate, vaccine product, number of doses received, timing of additional doses, time between dose and blood draws. For infected patient samples, PCR confirmation or type of viral variant was recorded.
Immunoassays and laboratories. From 17 institutions, 27 assays measuring antibody concentrations were selected. These consisted of 18 antibody binding assays, including 2 multiplex (3-plex and 5-plex) assays and 9 (7 live virus, 2 surrogate virus; Table 1) assays measuring their neutralization activity against SARS-CoV-2. Despite there only being 27 assays (2 multiplex), the total number of Assay IDs assigned was 33 to account for each antigen assessed individually. Each of these assays have been internally validated and were conducted in laboratories with experience in testing large numbers of samples.
Statistical analysis. (i) Reproducibility. After log transformation, the intraclass correlation coefficient (ICC) and coefficient of variation (%CV) were calculated. ICC's were calculated from fitting mixed models. Confidence intervals for all estimates were calculated by a bootstrap procedure. Analyses were performed separately for each assay and each antigen of interest. We also considered subgroup analyses focusing on each of the study groups.
(ii) Sensitivity and specificity. Sensitivity and specificity of each assay was calculated, and 95% confidence intervals were estimated. Analyses were done separately for each assay.
(iii) Correlation. For each binding and neutralization assay, log-transformed measurements were plotted, and Pearson correlation coefficients were determined. Confidence intervals for all estimates were calculated by a bootstrap procedure, and analyses were done separately for each assay. Furthermore, subgroup analyses were performed where appropriate.
(iv) Population differences. Two parameters (sensitivity and correlation) were calculated in each population separately (natural infection [dominant and variant] and vaccinated).
(v) Assay cutoffs. We defined the 95th percentile of the antibody levels in the baseline samples to be the threshold for positivity. Therefore, the specificity of the resulting test based on this threshold should be approximately 0.95.
(vi) Harmonization. The following formula was used to calculate the results calibrated to the WHO IS for each assay: (Geometric Mean of all reported values for a sample/Geometric Mean of all reported values for WHO IS sample) * 100 equals the harmonized result for the sample.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. SUPPLEMENTAL FILE 1, DOCX file, 0.1 MB.