Comprehensive, comparative evaluation of 25 automated SARS-CoV-2 serology assays

ABSTRACT The onset of the COVID-19 pandemic resulted in hundreds of in vitro devices coming to market, facilitated by regulatory authorities allowing “Emergency Use,” without a prior comprehensive evaluation of performance. The World Health Organization released Target Product Profiles specifying acceptable performance characteristics for SARS-CoV-2 devices. We evaluated 16 automated serology test kits that detect IgG or Total antibodies to SARS-CoV-2, along with a further nine tests that detect IgM-specific antibodies. All IgG or Total antibody tests reported a concordance with recent infection at 83.9% or greater, with 11/16 tests having greater than 90% sensitivity. All 16 tests reported greater than 96.3% specificity. There was a low level of false reactivity when testing all 25 tests on panels of samples containing potentially cross-reacting or interfering substances. A range of results were reported when testing seroconversion and dilution panels. The study further demonstrates that a comprehensive evaluation of the performance of test kits assessed against defined specifications is essential for the selection of test kits, especially in a pandemic setting. IMPORTANCE We have previously highlighted the fact that hundreds of SARS-CoV-2 serology tests were released months after the onset of the COVID-19 pandemic. Of the hundreds of studies investigating the test kits’ performance, few were comparative reports, using the same comprehensive sample set across multiple tests. Recently, we reported a comparative assessment of 35 rapid diagnostic tests (RDTs) or microtiter plate enzyme immunoassays (EIA) for use in low- and middle-income countries, using a large sample set from individuals with a history of COVID-19. Only a few tests meet WHO Target Product Profile performance requirements. This study reports on the performance of a further 25 automated SARS-CoV-2 immunoassays using the same panel of samples. The results highlight the better analytical and clinical performance of automated serology test kits compared with RDTs, and the importance of independent comparative assessments to inform the use and procurement of these tests for both diagnostic and epidemiological investigations.

I n November 2019, a novel acute respiratory disease (COVID-19) caused by a new coronavirus (SARS-CoV-2) was first recognized.Some of the first commercial diagnos tic tests available on the market were rapid, serology test kits (1).The emergence of automated tests for anti-SARS-CoV-2 was on average several months later, with the majority becoming available after the second half of 2020 and beyond.Test kit manufacturers were able to provide the tests under Emergency Use Listing (EUL), where regulators required limited clinical and analytical performance information (2,3).This allowed rapid access to diagnostics but removed the stringent regulatory requirements applied to other similar assays.Many studies reviewing the performance of these test kits were published (4)(5)(6)(7)(8)(9)(10)(11)(12)(13), but few studies used a comprehensive panel of well-characterized samples to evaluate the performance in a manner that would normally be required by regulators (1,(14)(15)(16).
The National Serology Reference Laboratory, Australia (NRL), in collaboration with the World Health Organization (WHO), implemented a study to assess 34 RDT and EIA using the well-characterized panel of samples (17).In addition to this WHO study, NRL offered a similar evaluation service to other manufacturers of laboratory-based, automated SARS-CoV-2 serology tests, drawing from the same panels of samples.A total of 16 IgG or Total antibody test kits were included in the study.These test kits detected either IgG only or Total antibodies, against viral spike or nucleocapsid antigens.A further nine test kits that detected anti-SARS-CoV-2 IgM were included in the study.Individual test kit summary statistics results of the study were published on the NRL website after each evaluation.This report represents a comparison of the performance of the tests.

RESULTS
The results of testing for each performance criterion are detailed below.

Concordance with recent infection
All 16 test kits testing for IgG or Total antibodies reported a concordance with recent infection of greater than 83.9%, with five test kits reporting less than 90% sensitivity (Fig. 1).The Ortho VITROS Total was the only test kit with 100% sensitivity.

Clinical specificity
All 16 test kits detecting IgG or Total antibodies reported a specificity of greater than 95%, with VIRCLIA IgG having the lowest at 96.3% (Fig. 1).Euroimmun Spike IgG, Sysmex HISCL N IgG, and Roche Elecsys N Total all reported 100%.

Analytical sensitivity/lot-to-lot variation
The test kits detecting IgG or Total antibodies reported reactivity when tested on three dilution series, ranging from a dilution of 1:2 to 1:1,024, indicating differences in analytical sensitivity between test kits, as well as between samples.No test kit reported a difference of greater than one-fold dilution difference between the two test kit lot numbers tested (Fig. 2).

Cross-reacting and interfering substances
All test kits, excluding the bioMerieux VIDAS IgM (due to insufficient tests) were tested against the 55-member cross-reacting panel and 35-member interfering substance panel (Table 1).Euroimmun NCP IgG (CMV IgM, Chlamydia psittaci IgM and Influenza A positive) and VIRCLIA IgG (CMV IgM, HIV and Parainfluenza positive) reported 3/55 cross-reacting samples and Sysmex HISCL S-IgG (two rheumatoid factors positive and one icteric sample) reported 3/35 interfering substance samples as reactive.All other test kits reported two or fewer reactive results for both the cross-reacting and interfering substance samples.Eight test kits (DiaSorin LIAISON Tri-S IgG, Ortho VITROS IgG, Ortho VITROS Total, Roche Elecsys S, Siemens Atellica Total, Sysmex HISCL N-IgG, Sysmex HISCL N-IgM, and Sysmex HISCL S-IgM) reported no cross-reacting or interfering reactivity.

Seroconversion panels
All 25 test kits were tested in the five seroconversion panels.Generally, IgG and IgM reactivity was detectable at approximately the same bleed for each individual.All IgM test kits demonstrated seroconversion followed by reversion to negativity on one or more of the panel samples (Fig. S1).All of the IgG and Total antibody tests remained reactive to the last bleed, the exception being Siemens Atellica IgG which reported negative results for two of the last three bleeds of patent MRNCOV-512.This may be false negative reactivity.Of note, Sysmex HISCL N-IgM only reported that 3/60 samples

Sero-reversion panels
The nine test kits detecting IgM were tested on the sero-reversion panels.A range of reactivities was reported by each test kit, ranging from all samples within a panel being reactive to all being non-reactive (Fig. S2).At least 4/9 test kits reported at least one reactive result for each of the 10 panels.

DISCUSSION
Serology tests for SARS-CoV-2 became available early in 2020, mainly in the form of RDTs (2).The diagnostic utility of these tests was unknown at the time, but many jurisdictions allowed their use as EUL (1,18,19).The advent of automated serology testing followed, with most major suppliers of continual access testing platforms developing and releasing serology test kits.Number of studies sought to assess the performance of these tests (6,10,13,20,21).In a recently published evaluation in collaboration with WHO, we described the performance of RDTs and EIA used to detect antibodies to SARS-CoV-2 (17).The majority of the 34 tests evaluated failed to reach the WHO Target Product Profiles (TPP) (22), with sensitivity and specificity ranging from 60.1% to 100% and 56.0% to 100% respectively.
In contrast, the automated assays evaluated using the same panel of samples reported superior performance characteristics compared with the RDT and EIAs.Whereas the TPP performance characteristics of RDT required sensitivity to being acceptable at >90% and desirable at >95% and specificity being acceptable at >97% and desirable at >99%; higher throughput assays had acceptable and desirable sensitivities of >95% and >98% and specificities of >97% and >99%, respectively (22).Eight and 15 of the 16 automated test kits testing for IgG or Total antibodies for SARS-CoV-2 achieved desirable sensitivity and specificity performance, respectively.
Only five and six RDT or EIA test kits reported no false reactivity when tested on the 55 samples containing potentially cross-reacting substances and the 35 samples containing potentially interfering substances, respectively.One RDT reported 47/55 and 26/35 reactive results for the cross-reacting and interfering substance panels.Compared with the 35 RDT and EIA test kits previously evaluated, the automated tests reported fewer false reactive results when tested on samples containing potentially interfering substances, with no automated test reporting more than three false reactive results on either cross-reacting or interfering panel samples.Eight tests reported no false reactive results on either panel.
Testing the automated tests on seroconversion panels and dilution series demon strated a range of analytical sensitivities for IgG, IgM, and Total antibody tests.This information may be useful in determining potential uses for serology assays, although it is well-accepted that serology is not useful in the clinical diagnosis of COVID-19 (23).Understanding the rise and fall of antibodies post-infection may be useful in understand ing if a person has been recently infected with SARS-CoV-2 even if rapid antigen or RNA tests are negative.The findings also contribute to the understanding that tests perform differently and that a scientifically robust assessment of their performance is vital for their selection and use.
The onset of the COVID-19 pandemic has highlighted some potential deficiencies in the way the scientific and regulatory communities react to such health emergencies.While the use for EUL of in vitro diagnostics devices (IVD) served the purpose of allowing rapid access to these tools by health workers, the decision also allowed many inferior test kits onto the market.EUL generally allowed manufacturer-declared evidence, without  comprehensive data to support the claims (23).It took several years for regulatory authorities to re-impose stringent regulatory requirements onto manufacturers.At the same time, numerous studies reported the performance characteristics of these test kits.Most of these studies used low-volume remnants of clinical samples, poorly designed protocols, and questionable conclusions, resulting in conflicting assessments of performance.Many studies were published without peer review (14).A limitation of this study is that it used commercially acquired from non-hospitalized patients from the USA or Germany in the panel of positive samples, acquired early in the pandemic.Therefore, the SARS-CoV-2 antibodies detected were post-infection with the Wuhan strain.The ability of assays to detect antibodies arising from infections with other variants of concern or post-immunization was not assessed.

Analyte
It is important that lessons are learned from this situation as outlined in a recent paper by FIND (2).The evaluation of the performance of test kits is a signifi cant undertaking that requires well-developed protocols, panels of well-characterized samples, and thoughtful analysis and reporting of results.The 100 days mission report states that "Stringent Regulatory Authorities" should work together to define interna tional assessment protocols and develop guiding principles, alongside more effective quality assurance processes (24).WHO has a network of IVD prequalification evaluation laboratories that perform this testing, as well as several other expert laboratories such as the Paul Ehrlich Institute and others.We would strongly recommend that a network of laboratories such as these be strengthened to respond rapidly to future health emergen cies such as the original SARS, MERS, Zika, COVID-19, Ebola, MPox, and other outbreaks that continually and increasingly arise.Access to clinical samples, ethics, material transfer agreements, importation permits, templated protocols, and other infrastructure required to evaluate novel IVDs in an emergency setting will be vital to advise the government, regulators and health workers on the performance of these IVDs.

Test kit selection
A detailed study protocol was developed, and samples used for testing were acquired.Each test kit manufacturer was provided the study protocol and was invited to partici pate in the evaluation.All manufacturers provided test kits and associated reagents to NRL at no charge.No exclusion criteria were implemented, however, test kits used to     f solely detect IgMspecific SARS-CoV-2 were tested on a limited set of panels.In total, 25 test kits from nine manufacturers were included in the study (Table 3).Not all tests were commercially available, with some being research use only.

Sample panels
The panels of samples used in the study are presented in detail elsewhere (17).Briefly, the test kits were evaluated using the following panels.

Sensitivity/concordance with recently confirmed SARS-CoV-2 infection
A total of 199 commercially acquired plasma samples were obtained from non-hospital ized individuals with a recent history of clinical infection with ancestral SARS-CoV-2, confirmed by various commercial NATs.All samples were collected between January and April 2020, and between 14 and 71 days post-infection.

Specificity
A total of 300 plasma samples obtained from health blood donors were stored in NRL's sample bank, having been collected prior to November 2019.These samples were assumed to be negative for SARS-CoV-2 and no additional confirmatory testing was performed.

Analytical sensitivity/lot-to-lot variation
The sensitivity panel samples consisted of 10 doubling dilutions of three of the sensitivity panel samples, from 1:2 to 1:1,024, prepared in human plasma negative for SARS-CoV-2 antibodies.All dilutions were tested on two reagent lots.

Cross-reacting and interfering substances
A total of 55 plasma or serum samples were known to contain potentially cross-reacting analytes, and a further 35 serum or plasma samples containing potentially interfering substances were tested in a single reagent/test lot.Details are presented in Table 1.

Seroconversion panels
Consisted of a total of 60 plasma samples, collected in the USA, from five differ ent SARS-CoV-2 NAT positive individuals at regular intervals from early infection to approximately 8 weeks post-symptoms.Results of testing were used to determine the number of days post-onset of symptoms the test kit first detected reactivity.

Sero-reversion panels
Consisted of 47 plasma samples obtained from 10 individuals in Germany, starting from no less than 18 days to no greater than 50 days post-infection.Each individual had between three and six bleeds each.The results were used to determine the ability of the test kit to detect waning IgMspecific antibodies.

Repeatability
To determine the repeatability of each test kit, a commercial quality control sample (DiaMex, Heidelberg Germany) was tested 30 times in the same test run.The %CV was calculated.

Testing and reporting protocol
NRL aliquoted the panel samples into single-use vials and randomized the vials into a complete evaluation panel, which was stored at −20°C until use.Only NRL knew the code for the randomization.The complete, randomized panel of samples was provided to the testing laboratory or the test kit manufacturer for testing.The complete panels were shipped on dry ice and stored frozen at the testing laboratory until use.All testing was performed as per the manufacturer's instructions for use (IFU).Results were provided to NRL as a Microsoft Excel file and/or as a copy of the printed results sheet.The results were copied or transcribed into further Microsoft Excel spreadsheets for decoding and analysis by NRL staff.All manual and electronic transcriptions were cross-checked by a second person.

1 Hepatitis
antigen/Hepatitis B c IgM positive 1 Hepatitis B surface antigen/Hepatitis B c IgM/Hepatitis B-e antigen positive Abbott Alinity IgG

TABLE 1
Cross-reacting panel comprised 55 samples containing common cross-reacting analytes and 35 samples containing potentially interfering substances

TABLE 2
Repeatability results, expressed as %CV, of automated SARS-CoV-2 serology test reporting quantitative or signaltocutoff results

TABLE 3
Final list of test kits included in the WHO SARS-CoV-2 serology evaluations, including abbreviations used in the report

TABLE 3
Final list of test kits included in the WHO SARS-CoV-2 serology evaluations, including abbreviations used in the report (Continued) a CLIA, chemiluminescence immunoassay.b CLEIA, chemiluminescent enzyme immunoassay.c CMIA, chemiluminescent microparticle immunoassay.d RBD, receptor-binding domain.e ECLIA, electrochemiluminescence immunoassay.