Dynamics of SARS-CoV-2 seroassay sensitivity: a systematic review and modelling study

Background Serological surveys have been the gold standard to estimate numbers of SARS-CoV-2 infections, the dynamics of the epidemic, and disease severity. Serological assays have decaying sensitivity with time that can bias their results, but there is a lack of guidelines to account for this phenomenon for SARS-CoV-2. Aim Our goal was to assess the sensitivity decay of seroassays for detecting SARS-CoV-2 infections, the dependence of this decay on assay characteristics, and to provide a simple method to correct for this phenomenon. Methods We performed a systematic review and meta-analysis of SARS-CoV-2 serology studies. We included studies testing previously diagnosed, unvaccinated individuals, and excluded studies of cohorts highly unrepresentative of the general population (e.g. hospitalised patients). Results Of the 488 screened studies, 76 studies reporting on 50 different seroassays were included in the analysis. Sensitivity decay depended strongly on the antigen and the analytic technique used by the assay, with average sensitivities ranging between 26% and 98% at 6 months after infection, depending on assay characteristics. We found that a third of the included assays departed considerably from manufacturer specifications after 6 months. Conclusions Seroassay sensitivity decay depends on assay characteristics, and for some types of assays, it can make manufacturer specifications highly unreliable. We provide a tool to correct for this phenomenon and to assess the risk of decay for a given assay. Our analysis can guide the design and interpretation of serosurveys for SARS-CoV-2 and other pathogens and quantify systematic biases in the existing serology literature.


Introduction
Throughout the COVID-19 pandemic, policymakers have been guided by the number of past infections inferred from serological assays. Seroassays have been heavily used to estimate the proportion of individuals that have been infected, the rate of fatal or severe infections [1][2][3][4][5] and population-wide immunity [6][7][8], and to anticipate the effect of future infection waves [9,10], among other purposes.
However, antibody levels wane with time after infection [11], reducing the sensitivity of serological assays for detecting previous infections [12][13][14]. We refer to the decay of assay sensitivity (in the context of serosurveillance) with time after seroconversion as seroreversion (by 'time', we refer to the time spanned between COVID-19 diagnosis and serological testing). Seroreversion is a major potential source of bias when estimating numbers of infections [1,15,16], and because these estimates guide public health policies such as vaccination programmes, it is important to account for this phenomenon.
More broadly, understanding seroreversion in general is important for the management of other emerging infectious diseases. For this, the study of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections presents a unique opportunity. Firstly, an emergent pathogen with distinct symptoms, leading to a high rate of people seeking diagnosis and doctors requesting tests, and short incubation times allows for precise timing of epidemic waves and infections. Secondly, in some cohorts, it can be assumed that reinfections are rare (i.e. serosurveys performed after first epidemic waves). Thirdly, large numbers of serological surveys were performed for SARS-CoV-2 infection, using a wide range of assays and cohorts. These features of the COVID-19 pandemic allow for a rich analysis of seroreversion.
We performed a systematic review and meta-analysis of serology studies of COVID-19, to better characterise seroreversion across assays. We collected and curated time-specific sensitivity estimates from serological studies testing previously diagnosed COVID-19 patients who had not received COVID-19 vaccines. We analysed 76 of more than 400 screened studies, encompassing 50 seroassays, 290 data points and 44,992 tests.
We present time-varying sensitivity estimates for the assays included in the analysis and the dependence of seroreversion on assay characteristics. Finally, we compare time-varying sensitivities to manufacturerreported sensitivities and estimate the risk of seroreversion bias in the literature, providing an overview of how seroreversion impacted the performance of emergency-approved seroassays during the COVID-19 pandemic.

Literature search
We performed a systematic literature review of seroprevalence studies, including studies identified up to 13 July 2022 and using search parameters detailed in a prior publication [27].
We supplemented this analysis with a search on medRxiv, BioRxiv, PubMed, SSRN and Google Scholar, on 30 June 2021 using the key "COVID-19 longitudinal, antibody waning" and on 15 February 2022 using the key "COVID-19 seroreversion". Additional studies were taken from a prior review [28]. If a study cited prior publications assessing seroreversion in the same research cohort, we included those prior publications.
Inclusion and exclusion criteria for studies to be included in the analysis are listed in the Supplement, section A. The results of the systematic search are summarised in Figure 1. Broadly, we excluded studies reporting on vaccinated individuals and on highly unrepresentative groups. For the final list of included studies, see Supplementary Table S1. Details of the included study cohorts (e.g. age, sex) are shown in Supplementary Table S2 and further discussed in Supplement section A. Most cohorts (90%) were serologically tested during 2020, indicating that the reinfection incidence is likely to be low in the analysed data [29] and that infections mainly correspond to the original SARS-CoV-2 variant [30]. A list of the included studies and search details is presented in the GitHub repository associated with the project.

Analysed assay characteristics
Serological assays have different characteristics. We considered only some assay characteristics to keep model complexity low. We did not consider antibody isotype because IgG is used in all assays, and a preliminary analysis did not show effects for including other isotypes (data not shown). We considered whether the assay was quantitative or a lateral flow assay (LFA). We did not consider the specific type of quantitative readout technique, guided by preliminary analyses (data not shown). We considered all three antigens: nucleocapsid, spike protein and S1 receptor-binding domain (RBD). We considered three different types of antibody binding in quantitative assays: indirect, competitive and direct (the latter also called double-antigen sandwich assays in the literature).

Statistical model
We fitted a hierarchical logistic regression Bayesian model to the data. For a given cohort of N serologically tested individuals in a study (all of whom had a previous COVID-19 diagnosis), we modelled the likelihood of the number of positive results x, with a binomial distribution with sensitivity θ: Each cohort of N individuals tested in a study was associated with a time of testing t (i.e. the average time between COVID-19 diagnosis and serological testing for this cohort). Throughout the text, we refer to a cohort of individuals tested in a given study s, at a given time t, with an assay a, as a data point (e.g. a cohort tested across different times corresponded to multiple data points). We modelled the sensitivity of data point θ a,s,t (assay a, time t, study s) with the logit function: where μ is the mean intercept, u a and u s are the random effects on the intercept of assay and study, β is the mean time-slope and b a is the random effect of assay on the slope. We set flat priors for μ and β. We set gamma priors with shape and rate parameters of 4 for the standard deviations σ ua , σ us and σ ba of the random effects.  Reports not retrieved (n = 0) To study the effect of assay characteristics, we modified the equation of the logistic regression to include their effects on the slope: Parameters β LFA , β Direct and β Competitive are the effects on the time slope of using, respectively, LFA, quantitative-direct or quantitative-competitive assay designs.
Variables L a , D a and C a take values of 0 or 1 to indicate whether assay a uses that design. We did not include an effect for the quantitative-indirect design, making it the baseline slope (thus, the parameters above indicate a difference relative to this design). Similarly, β Nucleocapsid , β Spike and β RBD are the effects of the antigen used on the time slope.
We fitted the models using STAN [31], with four chains with 4,000 draws each (1,000 warmup) and default parameters.
We tested the model fits using a cross-validation analysis, leaving out data points from model fitting and obtaining sensitivity predictions for the left-out data. We repeated this procedure to obtain an estimate for every data point. We used a tailored procedure that required that every prediction involved extrapolation of the model through time (for details see Supplement section B).

Estimation of testing times
When studies did not report the median time between diagnosis and serological testing for their cohort, we estimated these times using reported case curves [32] for the study's location (see details in Supplement section A).

Data and code availability
All the data, code, literature pointers and review comments are available at the associated GitHub page.

Assay variability in seroreversion
We fitted a model without considering assay characteristics. In Figure 2, we see the slope of sensitivity decay obtained for each assay (we provide the corresponding sensitivity-time curves in Supplementary Figure  S1). Estimated slopes were highly variable across assays (random effects of the assay were σ ua = 0.26 (95% credible interval (CrI): 0.19-0.36) for the intercept and σ ba = 0.66 (95% CrI: 0.31-1.04) for the slope). Interestingly, although most assays had decreasing sensitivity as expected (negative slopes), some assays had increasing sensitivities (positive slopes, shaded region in Figure 2). The positive slopes were not due to a lower starting sensitivity, or an initial increase followed by a decay. In Supplementary Figure S2, we provide an additional analysis in which an early and a late slope are fitted to these assays, where both early and late changes in sensitivity were increasing. There was also considerable variability in the intercepts between different studies using the same assay, with a standard deviation of σ us = 0.81 (95% CrI: 0.67-0.97) (larger than the between-assay standard deviation), outlining the importance of this source of variability.
We note that while some assays had many data points spanning several months, other assays only had a few time points (several assays with only a few data points can be seen in Supplementary Figure S1). For the latter, our model's sensitivity estimates involved extrapolation of sensitivity across time. We tested our model's performance at extrapolation using a crossvalidation procedure specifically designed for this (method details in Supplement section B). We found that the 95% CrI contained the validation data 91.7% of the time. For assays with fewer than nine data points (which applied to 99 of the 290 data points), 95.1% of the data points were within the cross-validation CrI.

Assay characteristics determine seroreversion
Next, we analysed the relation between assay characteristics and sensitivity decay. We fitted a model with effects of different assay characteristics on the assayspecific slope. We included terms for each of the three antigens (nucleocapsid, spike, and RBD) and for three different assay designs (LFA, quantitative-direct, quantitative-competitive), leaving the fourth assay design (quantitative-indirect) as the baseline slope.
Both the analytic technique and the antigen showed important effects on seroreversion ( Figure 3, Table 1). The slope term for LFA assays was negative, β LFA = −0.23 (95% CrI: −0.40 to −0.07), and β LFA < 0 in 99.6% of the posterior samples, indicating that their sensitivity decayed faster than that of quantitative-indirect assays. The slope term for quantitative-direct assays had a value of β Direct = 0.31 (95% CrI: 0.15-0.48), and β Direct > 0 in 99.9% of the posterior samples, indicating that they decayed more slowly. The term for quantitative-competitive assays had a value of β Competitive = −0.03 (95% CrI: −0.25 to 0.20), and β Competitive > 0 in 40.1% of the posterior samples, showing no clear difference compared with the quantitative-indirect assays (which may be due to the small number of assays in the quantitative-competitive group). Differences between analytic techniques can be appreciated by comparing the different columns of Figure 3.
On the antigen effect, assays targeting the nucleocapsid showed faster seroreversion than those targeting the spike protein (β Nucleocapsid < β Spike in 99.7% of the posterior samples). Assays targeting the RBD had on average slower seroreversion than those targeting the spike protein, although the effect was not statistically significant (β Spike < β RBD in 87.3% of the samples).
Differences between antigens can be appreciated by comparing the different rows of Figure 3.
To see how the different slopes translate to differences in sensitivity, the reader can compare the sensitivities of the different types of assays for a given delay between diagnosis and test in Table 1. Note that there was considerable variability between different assays of the same type (i.e. between the black lines within a same panel). We provide assay-specific sensitivity profiles in Supplementary Table S2. When estimating the extent of seroreversion for a given survey, assayspecific sensitivity estimates should be preferred over the coarser estimates provided for assay types.
Finally, we tested whether specificity was also related to assay characteristics. Since specificity does not have temporal dynamics, we only analysed point estimates (see the details of the model in Supplement section G). Similar to sensitivity, we found that LFA assays have on average smaller specificities than quantitative assays (β LFA < 0 in 98.4% of the posterior samples). Unlike sensitivity, we did not find significant differences with quantitative assays (e.g. β Direct > 0 in 85.0% of the posterior samples) or between antigens (e.g. β RBD > β N in 67.2% of the posterior samples). Differences in specificity between types of assays were of epidemiologically relevant magnitude (e.g. average specificity of 99.9% (95% CrI: 99.7-100) for RBD/quantitativeindirect assays and 98.8% (95% CrI: 96.6-99.7) for nucleocapsid/LFA assays; the authors make all estimates available in Supplementary Table S4). Like for sensitivity, we found considerable variability between studies reporting on the same assay (σus = 0.61, 95%

Figure 3
Sensitivity profiles for different SARS-CoV-2 seroassay characteristics, January 2020-March 2022 (n = 276 cohorts) LFA: lateral flow assay; SARS-CoV-2: severe acute respiratory syndrome coronavirus 2. The sensitivity profile across time for each kind of assay is shown in a different panel. Rows indicate the targeted antigen, and columns indicate the analytic technique. For example, the panel in the second row and second column shows assays that target the spike protein and use a quantitative-indirect design. Red lines: mean sensitivity for each kind of assay; shaded regions: 95% credible intervals of the mean sensitivity of the group (i.e. not accounting for variability between assays); black lines: fits for individual assays; grey dots: data, with size proportional to the square root of sample size. Empty panels indicate that no assays with those characteristics were found for the analysis. CrI: 0.20-1.14). Specificity data and the resulting fit are appended to this article in Supplementary Figure S7.

Manufacturer sensitivities and risk of bias in the literature
Although quantitatively estimating and correcting the seroreversion bias in the literature is outside the scope of the present work, we can coarsely estimate the risk of seroreversion across the literature.
We compared our estimates to assay sensitivities provided by manufacturers, which report the percentage of serological samples from individuals diagnosed with COVID-19 that show a positive test result (if manufacturer values were missing, we used values reported by the United States Food and Drug Administration (FDA) or reported by authors). We found that 4 months after diagnosis, 20% of the assays have sensitivities below 75% of the originally specified value. At 6 months after diagnosis, 34% of the assays were below 75%. Thus, a few months after a COVID-19 wave, some serological assays (mostly LFA and quantitative-indirect assays targeting nucleocapsid antibodies) can severely underestimate previous infections.
We further analysed what percentage of serosurveys reported in the literature were at high risk of bias by seroreversion. As a reference, we used a comprehensive meta-analysis of the global evolution of SARS-CoV-2 seroprevalence [17], using the publicly available SeroTracker dataset [33], which notes the lack of seroreversion adjustment as a limitation. We estimated what percentage of the data points listed in SeroTracker, aligned with the World Health Organization (WHO) Unity protocol (i.e. those studies used in [17]), used assays with high rates of seroreversion (LFA assays or nucleocapsid quantitative-indirect assays). Because seroreversion depends on the assay used and on the time elapsed between an epidemic wave and serosurvey, we segregated the data across semesters.

Table 1
Estimated sensitivities of SARS-CoV-2 seroassays at each time after diagnosis, for each type of assay fitted in the analysis, January 2020-March 2022 (n = 276 cohorts)

Time after diagnosis (months)
Sensitivity by assay type in % (95% CrI) Each row corresponds to a different type of test, with characteristics indicated in the first three columns. These estimates do not include the between-assay or between-study variability in their CrI. These sensitivities correspond to the red lines and shaded regions in Figure 3.

Table 2
Unity-aligned seroprevalence data points of the Serotracker dataset [33] that use assays at high risk of seroreversion, defined as lateral flow assays or quantitative-indirect assays for SARS-CoV-2 nucleocapsid antibodies, January 2020-December 2021 (n = 1,592) Period of serological sampling SeroTracker data points at high seroreversion risk (total data points) Percentage of assays at high seroreversion risk (%) We see in Table 2 that although the use of serological assays at high risk of seroreversion decreased throughout the pandemic, they still constituted a considerable fraction of Unity-aligned data points until mid-2021.

Discussion
Serology-based estimates of infections are important to understand COVID-19. Although it is known that accounting for seroreversion in these estimates is important, there is a lack of appropriate data and guidelines to do so. Few studies correct for seroreversion [1,15,16,27,34,35], and the lack of robust assayspecific seroreversion estimates make it uncertain how accurate existing adjustments are. We present the first large-scale systematic analysis of seroreversion across dozens of seroassays for SARS-CoV-2, making three major contributions to help understand and correct for seroreversion.
Firstly, we provide time-varying sensitivity estimates for 50 assays and estimates of the average time-varying sensitivity for different assay types. These estimates can be used to adjust for seroreversion in the literature. Knowing the assay's identity (or its characteristics, for assays not represented in our sample), and the time span between the epidemic wave and the serosurvey date at the tested location (which can be estimated from case or death curves), a seroreversionadjusted sensitivity estimate can be selected from our results. Using these sensitivity estimates in the standard Rogan-Gladen formula will produce seroprevalence estimates that are corrected for seroreversion. Importantly, this procedure showed good performance at predicting assay sensitivity in a rigorous cross-validation analysis.
Our second contribution is the quantification of how seroreversion depends on assay characteristics. We show that seroreversion depends heavily on the antigen and on the analytical technology. Assays that use LFA technique (qualitative, rapid tests) show faster sensitivity decay, while quantitative assays with direct antibody binding have the slowest decay. This is in line with the high sensitivity of direct binding assays reported for other pathogens, ascribed to factors such as less sample diluting or the detection method not being limited to one class of antibodies [36,37]. Assays for nucleocapsid-targeting antibodies tended to decay faster than assays for spike protein antibodies, while assays targeting S1-RBD antibodies tended to decay more slowly (although this last effect was not significant at the 95% level). Interestingly, we found that one type of assay, the quantitative-direct assays targeting RBD-binding antibodies had on average increasing sensitivity over time. This is in line with previous studies reporting assays of this type to have increasing sensitivity, attributing this effect to prolonged antibody maturation [13,14]. Because reinfection incidence was likely to be low in our data, it is unlikely that these results reflect infections.
The striking differences between types of assays (e.g. average sensitivity at 6 months of 98% for S1-RBD-targeting quantitative-direct, against 26% for nucleocapsid LFA assays) outlines the need for assayspecific corrections. For example, the one-size-fits-all seroreversion rates (i.e. not assay-specific) used in two previous analyses of SARS-CoV-2 infection fatality rate (5% monthly decrease [35] and 190 days half-life [1]) would either considerably overestimate or underestimate seroreversion for many assays, according to our results. These results are in line with previous reports in the literature [13][14][15]20], although previous studies analysed fewer characteristics in general and did not quantify their effects. Our analysis also showed that specificity depends on assay characteristics.
These results will allow researchers to assess the risk of seroreversion bias in serosurveys, providing a valuable tool for the design of serological studies. For example, our results suggest that the strategy of comparing S1-RBD and nucleocapsid antibody prevalences to distinguish vaccine-and infection-induced population immunity can be affected by the different seroreversion rates of these assays [10,17,38].
Our third contribution is showing that a few months after diagnosis, manufacturer specifications can be unreliable for a considerable fraction of assays. Relatedly, we show that a sizable fraction of Unityaligned serosurveys used in recent WHO estimates of global seroprevalence dynamics are at risk of seroreversion bias [17]. This underscores the potential of decaying sensitivity to bias our epidemiological understanding of COVID-19, and a potential interest of public health policymakers in ensuring that assay manufacturers and regulatory bodies provide information and guidelines regarding seroreversion [39]. The sensitivity estimates presented here should provide a straightforward way to correct for seroreversion in such datasets and to quantify literature bias.
To our knowledge, this is the most comprehensive analysis, for any pathogen, of assay-specific serological sensitivity decay and its dependence on assay characteristics. This is because some characteristics of the COVID-19 pandemic have allowed for a richer seroreversion dataset than is probably possible for any other pathogen (i.e. well approximated infectionto-testing times, multiple seroassays, multiple studies per seroassay, first exposures to a novel pathogen). Thus, many of the conclusions extracted from this analysis may serve as a guide for other emerging and endemic pathogens.
Our study has some limitations. Firstly, although we included more assays than previous studies, many of the included assays had seroreversion data for only a few time points. Secondly, we were unable to test the effects of important parameters such as age or disease severity on seroreversion [13,14,25,40]. Relatedly, although an ideal dataset would use a well defined cohort representative of the general population, with known age, sex ratio, disease severity, infecting variant and occurrence of reinfections, the available literature falls short of this ideal. This has the potential to introduce variability and biases in our estimates. We note, however, that our modelling framework is flexible, and could be extended to account for these variables, given appropriate data. Thirdly, as we analysed test data conditional on individuals having a previous COVID-19 diagnosis, it is likely that asymptomatic individuals were underrepresented in our sample. Finally, because our analysis included only data points on nonvaccinated individuals, and most of the included data points were sampled in 2020 when SARS-CoV-2 variants of concern and reinfections were uncommon, it is unclear how our results would extrapolate to antibodies induced by vaccines, reinfections or new variants of the virus.

Conclusion
Accounting for seroreversion in serology-based estimates of infection numbers is important for understanding the COVID-19 pandemic, and for the usefulness of continued serological testing to monitor the effects of COVID-19. Rapid LFA tests as well as quantitative-indirect tests for nucleocapsid targeting antibodies have a high potential for seroreversion, and quantitativedirect assays are likely to be preferred for long-term serological surveillance. A considerable number of studies in the literature use assays with high risk of seroreversion, indicating some important potential for bias. We present a simple method for researchers to account for seroreversion when analysing serological data and when designing serological studies. This may be of interest to the management of other pathogens, and serosurveillance more in general, because of the unique opportunity to study the effects of seroreversion provided by the data generated during the COVID-19 pandemic.

Ethical statement
This study exclusively used publicly available aggregate data sets and published research, and hence no ethics approval was required.