Antibody testing for COVID-19: A report from the National COVID Scientific Advisory Panel

Background: The COVID-19 pandemic caused >1 million infections during January-March 2020. There is an urgent need for reliable antibody detection approaches to support diagnosis, vaccine development, safe release of individuals from quarantine, and population lock-down exit strategies. We set out to evaluate the performance of ELISA and lateral flow immunoassay (LFIA) devices. Methods: We tested plasma for COVID (severe acute respiratory syndrome coronavirus 2; SARS-CoV-2) IgM and IgG antibodies by ELISA and using nine different LFIA devices. We used a panel of plasma samples from individuals who have had confirmed COVID infection based on a PCR result (n=40), and pre-pandemic negative control samples banked in the UK prior to December-2019 (n=142). Results: ELISA detected IgM or IgG in 34/40 individuals with a confirmed history of COVID infection (sensitivity 85%, 95%CI 70-94%), vs. 0/50 pre-pandemic controls (specificity 100% [95%CI 93-100%]). IgG levels were detected in 31/31 COVID-positive individuals tested ≥10 days after symptom onset (sensitivity 100%, 95%CI 89-100%). IgG titres rose during the 3 weeks post symptom onset and began to fall by 8 weeks, but remained above the detection threshold. Point estimates for the sensitivity of LFIA devices ranged from 55-70% versus RT-PCR and 65-85% versus ELISA, with specificity 95-100% and 93-100% respectively. Within the limits of the study size, the performance of most LFIA devices was similar. Conclusions: Currently available commercial LFIA devices do not perform sufficiently well for individual patient applications. However, ELISA can be calibrated to be specific for detecting and quantifying SARS-CoV-2 IgM and IgG and is highly sensitive for IgG from 10 days following first symptoms.


Introduction
The first cases of infection with a novel coronavirus (severe acute respiratory syndrome coronavirus 2; SARS-CoV-2) causing coronavirus infectious disease (COVID) emerged in Wuhan, China on December 31st, 2019 1 . Despite intensive containment efforts, there was rapid international spread and three months later, there had been over 1 million confirmed infections and 60,000 reported deaths 2 . Containment efforts have relied heavily on population quarantine ('lock-down') measures to restrict movement and reduce individual contacts 3,4 . To develop public health strategies for exit from lock-down, diagnostic testing urgently needs to be scaled-up, including both mass screening and screening of specific high-risk groups (contacts of confirmed cases, and healthcare workers and their families), in parallel with collecting data on recent and past infection at individual and population levels 2 .
Laboratory diagnosis of infection has mostly been based on real-time RT-PCR, typically targeting the viral RNA-dependent RNA polymerase (RdRp) or nucleocapsid (N) genes using swabs collected from the upper respiratory tract 5,6 . This requires specialist equipment, skilled laboratory staff and PCR reagents, creating diagnostic delays. RT-PCR from upper respiratory tract swabs may also be falsely negative due to quality or timing; viral loads in upper respiratory tract secretions peak in the first week of symptoms 7 , but may have declined below the limit of detection in those presenting later 8 . In individuals who have recovered, RT-PCR provides no information about prior exposure or immunity.
In contrast, assays that reliably detect antibody responses specific to SARS-CoV-2 could contribute to diagnosis of acute infection (via rises in IgM and IgG levels) and to identifying those infected with or without symptoms and recovered (via persisting IgG) 9 . Receptor-mediated viral entry to host cells occurs through interactions between the unique and highly-conserved viral spike (S) glycoprotein and the ACE2 cell receptor 10 . This S protein is the primary target of specific neutralising antibodies, and current SARS-CoV-2 serology assays therefore typically seek to identify these antibodies ( Figure 1A-C). Rapid lateral flow immunoassay (LFIA) devices provide a quick, point-of-care approach to antibody testing. A sensitive and specific antibody assay could directly contribute to early identification and isolation of cases, address unknowns regarding the extent of infection to inform mathematical models and support individual or population-level release from lock-down. Laboratory-based ELISA platforms have also been evaluated as an approach to detection and quantification of SARS-CoV-2 antibodies 11 . However, before either laboratory assays or LFIA devices can be widely deployed, their performance needs to be carefully evaluated ( Figure 1D, E) 12 . We therefore compared a novel laboratory-based ELISA assay with nine commercially-available LFIA devices using samples from patients with RT-PCRconfirmed infection, and negative pre-pandemic samples.

Research reporting
Samples. A total of 142 plasma samples designated seronegative for SARS-CoV-2 were collected from adults (≥18 years) in the UK before December 2019 (Underlying data, Table S1, including demographic details 13 ) from three ethically approved sources: healthy blood donors, organ donors on ICU following cerebral injury and healthy volunteers from a vaccine study.
In total, 40 plasma samples were collected from adults positive for SARS-CoV-2 by RT-PCR from an upper respiratory tract (nose/throat) swab tested in accredited laboratories (Underlying data, Table S1 13 ). Acute (≤28 days from symptom onset) and convalescent samples (>28 days) were included to optimise detection of SARS-CoV-2 specific IgM and IgG respectively ( Figure 1B) Cases were classified following WHO criteria as critical (respiratory failure, septic shock, and/or multiple organ dysfunction/failure); severe (dyspnoea, respiratory frequency ≥30/minute, blood oxygen saturation ≤93%, PaO 2 /FiO 2 ratio <300, and/or lung infiltrates >50% of the lung fields within 24-48 hours); or otherwise mild 14 . Among 22 acute cases, 9 were critical, 4 severe and 9 mild. All but one convalescent individual had mild disease; the other was asymptomatic and screened during enhanced contact tracing.

ELISA
We developed a novel ELISA targeting the SARS-CoV-2 spike protein. Recombinant SARS-CoV-2 trimeric spike protein was constructed as described 15 , using mammalian codon optimized SARS2 Spike (1-1208, Genbank accession MN908947) with a GSAS substitution at the furin cleavage site (aa 682-685) and double proline substitution at aa 986-987. The C-terminal was followed by T4 fibritin motif, an HRV3C protease cleavage site, a TwinStrep Tag and an 8-HisTag. The gene was cloned into a pHLsec and expressed in 293T cells. The HIS trap HP column (cat no 17524701; Cytiva) was used to purify the recombinant S protein.
We used ELISA to detect antibodies to the S protein. MAXISORP immunoplates (442404; NUNC) were coated with StrepMAB-Classic (2-1507-001;iba). Plates were blocked with 2% skimmed milk in PBS for one hour and then incubated with 0.125 µg of soluble trimeric SARS-CoV-2 trimeric S protein or 2% skimmed milk in phosphate buffered saline. After one hour, plasma was added at 1:50 dilution, followed by ALP-conjugated anti-human IgG (A9544, RRID:AB_258459; Sigma) at 1:10,000 dilution or ALP-conjugated anti-human IgM (A9794, RRID:AB_258474; Sigma) at 1:5,000 dilution. The reaction was developed by the addition of PNPP substrate and stopped with 1.0 M NaOH. The absorbance was measured at 405nm after 90 minutes, and a final optical density (OD) value was calculated by subtracting the background (skimmed milk) from the test value. The ELISA assay takes 5-6 hours to perform with an experienced operator being able to process up to five 96-well plates (480 samples including relevant controls).

LFIA
We tested LFIA devices designed to detect IgM, IgG or total antibodies to SARS-CoV-2 produced by nine manufacturers short-listed as a testing priority by the UK Government Department of Health and Social Care (DHSC), based on appraisals of device provenance and available performance data. Individual manufacturers did not approve release of device-level data, so device names are anonymised.
Testing was performed in strict accordance with the manufacturer's instructions for each device. Typically, this involved adding 5-20 µl of plasma to the sample well, and 80-100 µl of manufacturer's buffer to an adjacent well, followed by incubation at room temperature for 10-15 minutes. The result was based on the appearance of coloured bands, designated as positive (control and test bands present), negative (control band only), or invalid (no band, absent control band, or band in the wrong place) ( Figure 1C).
We recorded results in real-time on a password-protected electronic database, using pseudonymised sample identifiers, capturing the read-out from the device (positive/negative/invalid), operator, device, device batch number, and a timestamped photograph of the device.

Testing protocol
We tested 90 samples using ELISA to quantify IgM and IgG antibody in plasma designated SARS-CoV-2 negative (n=50) and positive (n=40). All positive samples were included and an unstratified random sample of negative plasma from healthy blood donors (n=23) and organ donors (n=27). We tested the nine different LFIA devices using between 39-165 individual plasma samples (8-23 and 31-142 samples designated SARS-CoV-2 positive and negative, respectively, Table S2 13 ). Total numbers varied according to the number of devices supplied to the DHSC; samples were otherwise selected at random.

Statistical analysis
Analyses were conducted using R (version 3.6.3) and Stata (version 15.1), with additional plots generated using GraphPad Prism (version 8.3.1). Binomial 95% confidence intervals (CI) were calculated for all proportions. The association between ELISA results and time since symptom onset, severity, need for hospital admission and age was estimated using multivariable linear regression, without variable selection. Non-linearity in relationships with continuous factors was included via natural cubic splines. Differences between LFIA devices were estimated using mixed effects logistic regression models, allowing for each device being tested on overlapping sample sets. Differences between devices were compared with Benjamini-Hochberg The UK Government DHSC selected the lateral flow devices for testing as described above. Otherwise, the funders had no role in study design or in the collection, analysis, and interpretation of data. Authors from DHSC contributed to writing of the report and in the decision to submit the paper for publication.
Considering the relationship between IgM and IgG titres and time since symptom onset ( Figures 2C, D), univariable regression models showed IgG antibody titres rising over the first 3 weeks from symptom onset. The lower bound of the pointwise 95%CI for the mean expected titre crosses our OD threshold between days 6-7 ( Figure 2D). However, given sampling variation, test performance is likely to be optimal from several days later. IgG titres fell during the second month after symptom onset but remained above the OD threshold. No temporal association was observed between IgM titres and time since symptom onset ( Figure 2C). There was no evidence that SARS-CoV-2 severity, need for hospital admission or patient age were associated with IgG or IgM titres in multivariable models (p>0.1, Table S3 13 ).
Detection of SARS-CoV-2 antibodies by LFIA vs. ELISA We also considered performance relative to ELISA (Extended data, Table S5, Figure S1 13 ), because the LFIA devices target the same antibodies. We considered patients positive by this alternative standard if their IgG OD reading exceeded the threshold described above, since no samples were IgM-positive, IgG-negative). Sensitivity of antibody detection by LFIA ranged from 65% (95%CI 46-80%) to 85% (66-96%) and specificity from 93% (95%CI 83-98%) to 100% (94-100%); however, the device with the highest sensitivity had one of the lowest specificities (Extended data, Figure S1 13 ). There was no evidence of differences in sensitivity (p≥0.010, cf. p=0.0014 threshold) or specificity between devices (p≥0.19).
Of 50 designated negative samples tested by both ELISA and the nine different LFIA devices, nine separate samples generated at least one false-positive, on seven different LFIA devices (Figure 3). Four samples generating false-positive results did so on more than one LFIA device, despite the absence of quantifiable IgM or IgG on ELISA, potentially suggesting a specific attribute of the sample causing a cross-reaction on certain LFIA platforms.
Of the 22 samples collected from RT-PCR positive patients in the acute setting, six fell below the ELISA detection threshold for IgM or IgG; two of these six were positive on LFIA testing, each on one (different) device. Of the remaining 16 acute samples (all ELISA IgG-positive), only nine were consistently positive across all nine LFIA devices. Due to limited availability of LFIA devices, fewer tests were performed on the 18 convalescent samples with available ELISA data, all with quantifiable IgG ( Figure 2B, Figure 3A). Two had no antibody detected on any LFIA device, and only eight were consistently positive across all LFIA devices tested (between 1 and 9 devices tested per sample). Full metadata for results of ELISA and LFIA devices are available in Underlying data, Supplementary Table S6 13 .

Discussion
We here present the performance characteristics of a novel ELISA and nine selected LFIA devices for detecting SARS-CoV-2 IgM and IgG. Among 40 RT-PCR-confirmed positive patients, 85% had IgG detected by ELISA, including 100% patients tested ≥10 days after symptom onset. A panel of LFIA devices had sensitivity between 55 and 70% against the reference-standard RT-PCR, or 65-85% against ELISA, with specificity of 95-100% and 93-100%, respectively. These estimates come with wide confidence intervals due to constraints on the number of devices made available. Comparable results have been obtained through a similar appraisal undertaken independently, in which specificity ranged from 84-100.0%, and the proportion of specimens testing positive increased over time from symptom onset, with >80% sensitivity achieved by some LFIA devices at later time points 16 . Our study, and these parallel data from another centre 16 , provide a benchmark against which to assess the performance of future antibody testing platforms, with the aim of guiding decisions about deploying antibody testing and informing the design of second-generation assays.
LFIA devices are cheap to manufacture, store and distribute, and could be used as a point-of-care test, offering an appealing approach to diagnostics and evaluating exposure, were adequate performance to be confirmed. A positive antibody test is currently regarded as a probable surrogate for immunity to reinfection. Secure confirmation of antibody status would therefore reduce anxiety, provide confidence to allow individuals to relax social distancing measures, and guide policy-makers in the staged release of population lock-down, potentially in tandem with digital approaches to contact tracing 17 . As a diagnostic tool, serology may have a role in combination with RT-PCR testing to improve sensitivity, particularly of cases presenting some time after symptom onset 18,19 . Reproducible methods to detect and quantify vaccine-mediated antibodies are also crucial as COVID vaccines enter clinical trials.
Appropriate thresholds for sensitivity and specificity depend on the primary purpose of the test. For diagnosis in symptomatic patients, high sensitivity is required (generally ≥90%). Specificity  is less critical as some false-positives could be tolerated (provided other potential diagnoses are considered, and accepting that over-diagnosis causes unnecessary quarantine or hospital admission). However, if antibody tests were deployed as an individual-level approach to inform release from quarantine, then high specificity is essential, as false-positive results return non-immune individuals to risk of exposure. For this reason, the UK Medicines and Healthcare products Regulatory Agency has currently set a minimum 98% specificity threshold for LFIAs 20 .
Appraisal of test performance should also consider the influence of population prevalence, acknowledging that this changes over time, geography and within different population groups. The potential risk of a test providing false reassurance and release from lock-down of non-immune individuals can be considered as the proportion of all positive tests that are wrong. Based on the working 'best case' scenario of a LFIA test with 70% sensitivity and 98% specificity, the proportion of positive tests that are wrong is 35% at 5% population seroprevalence (19 false-positives/1000 tested), 10% at 20% seroprevalence (16 false-positives/1000) and 3% at 50% seroprevalence (10 false-positives/1000) (Figure 4).
More data are needed to investigate antibody-positivity as a correlate of protective immunity. Indeed, pre-existing IgG could enhance disease in some situations 21 , with animal data demonstrating that SARS-CoV-2 anti-spike IgG contributes to a proinflammatory response associated with lung injury in macaques 22 . Our study, and another undertaken independently in parallel 11 , demonstrates accurate performance of ELISA targeting antispike protein antibodies. Additionally, our ELISA results are supported by context regarding disease severity and the time of sampling relative to symptom onset. Our data on the kinetics of antibody responses build upon studies of hospitalised patients in China reporting a median 11 days to seroconversion for total antibody, with IgM and IgG seroconversion at days 12 and 14, respectively 18 , and others that report 100% IgG positivity by 15-19 days 19,23 . Our ELISA data show IgG titres rose over the first 3 weeks of infection and that IgM testing identified no additional cases. Methods to enhance sensitivity, especially shortly after symptom onset, could consider different sample types (e.g. saliva), different antibody classes (e.g. IgA) 24 , T-cell assays or antigen detection 25 . In contrast to others 19,26-28 , we did not find evidence of an association between disease severity and antibody titres. We observed several LFIA false positives, which may have potentially resulted from cross-reactivity of non-specific antibodies (e.g. reflecting past exposure to other seasonal coronavirus infections).
The main study limitation is that numbers tested were too small to provide tight confidence intervals around performance estimates for any specific LFIA device. Expanding testing across diverse populations would increase certainty, but given the broadly comparable performance of different assays, the cost and manpower to test large numbers may not be justifiable. Demonstrating high specificity is particularly challenging; for example, if the true underlying value was 98%, 1000 negative controls would be required to estimate the specificity of an assay to ±1% with approximately 90% power. Full assessment should also include a range of geographical locations and ethnic groups, children, and those with immunological disease including autoimmune conditions and immunosuppression.
In summary, antibody testing is a crucial component of measures that may be required to inform release from lockdown. Our findings suggest that while current LFIA devices may provide some information for population-level surveys, their performance is inadequate for most individual patient applications. The ELISA we describe is currently being optimised and adapted to run on a high-throughput platform and provides promise for the development of reliable approaches to antibody detection that can support decision making for clinicians, the public health community, policy-makers and industry. File 'Supplementary material' (PDF) contains the following extended data:

Data availability
• Supplementary methods.
• Figure S1. Sensitivity and specificity of lateral flow devices compared with RT-PCR confirmed cases and pre-pandemic controls (panels A and B) and compared with ELISA results (panels C and D).
• Figure S2. Comparison between ELISA and LFIA for SARS-CoV-2 designated negative and positive plasma. The article by Adams and colleagues "Antibody testing for COVID-19: A report from the National COVID Scientific Advisory Panel" compares the performance of "in-house" ELISA with 9 lateral flow Immunoassays (LFIA) from different companies, using samples from pre-pandemic period and RT-PCR-confirmed COVID-19 cases at different stages of the disease. The ELISA showed 85% sensitivity and 100% specificity. The LFIA show higher sensitivity and specificity compared with ELISA than with RT-PCR, and the performance of the LFIA were similar to each other. The authors conclude that ELISA performs better than LFIA for individual patient applications.

Comments
The authors describe that the ELISA used measured antibodies against the spike protein but not much info is available about the different LFIA and what parts of the virus may have been used to develop and optimize the antibody detection. This information is important to have a complete understanding of the comparison between the ELISA and the LFIA (Table  1).

1.
Most recommended quarantine/isolation period (may be country-dependent) is 10-14 days from symptom onset and the authors describe up to 28 days as acute period. A justification may be necessary. Was this predefined, based on information around viral shedding or antibody kinetics (similar to data presented in Figure 2D)? 2.
In line with Tables 1 and S4 and the false negative results, can the authors comment on the antibodies possibly targeting other parts of the virus other than spike proteins (especially for the ELISA since details of the LFIA are not available)? 3.

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and is the work technically sound?
Page 6, it was stated "We also considered performance relative to ELISA (Extended data, Table S5, Figure S1), because the LFIA devices target the same antibodies." Query: The viral protein used each of the LFIA device is not stated in the manuscript. Based on this sentence, are the authors stating that all of them use SARS-CoV-2 Spike residues 1-1208? If not, please state the viral protein or fragment used for each device.

2.
Page 9, "… with animal data demonstrating that SARS-CoV-2 anti-spike IgG contributes to a proinflammatory response associated with lung injury in macaques." Query: SARS-CoV-2 should be SARS-CoV.

3.
Page 10, it was stated "We observed several LFIA false positives, which may have potentially resulted from cross-reactivity of non-specific antibodies (e.g. reflecting past exposure to other seasonal coronavirus infections). Query: Are these LFIA using the same viral protein or fragment? What is the sequence homology between SARS-CoV-2 and endemic coronaviruses for each of viral protein used?

4.
Page 10, it was stated "In contrast to others, we did not find evidence of an association between disease severity and antibody titres" Query: Is there any difference in study design between this study and the published studies cited? Were the published studies measuring anti-Spike binding antibodies or were they measuring neutralizing antibodies? Are the sample size and distribution of cases (mild vs severe, etc) comparable? Please highlight any limitation that may explain the discrepancy.

5.
For the simulations shown in Figure 4. Query: Please provide details on method/program as well as assumptions used.

6.
The ELISA described by the authors is important and adds value to the development of serological assay for COVID-19. To briefly show the robustness of the assay, can authors provide information on (a) intra-assay variability, (b) inter-assay variability, (c) interoperators variations and (d) the turnaround time i.e. from sample to result?

7.
Is the work clearly and accurately presented and does it cite the current literature? Partly

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results?
Yes