Comparison of fifteen SARS-CoV-2 nucleic acid amplification test assays used during the Canadian Laboratory Response Network’s National SARS-CoV-2 Proficiency Program, May 2020 to June 2021

Background On March 11, 2020, the World Health Organization declared a pandemic caused by the recently emerged severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This led to increased clinical testing and decentralizing of this testing from provincial health laboratories to regional and private facilities. Leveraging the results from the Canadian Laboratory Response Network’s National SARS-CoV-2 Proficiency Test (PT) Program, this study compares multiple commercial and laboratory-developed nucleic acid amplification tests, assessing both sensitivity and specificity across multiple users. Methods Each panel consisted of six blinded, contrived-clinical samples. Panels were distributed to international, provincial and territorial laboratories and subsequently to partner facilities. Participating laboratories were asked to run these sample through their respective extraction/PCR workflows and submit results to the National Microbiology Laboratory, outlining the nucleic acid extraction platform and nucleic acid amplification test employed, as well as the viral gene target and Ct values or equivalent obtained. Data were compiled for each molecular platform and gene target used. Results The PT schemes were deployed in May 2020, November 2020 and June 2021, resulting in 683 data sets using 37 different nucleic acid amplification tests. Over the course of three PT schemes, the average score obtained was 99.3% by participants demonstrating consistent testing between laboratories and testing platforms. Conclusion This study confirmed the rapid and successful implementation of a Canadian PT Program and provided comparative analysis of the various emergency use authorized and laboratory developed tests employed for the detection of SARS-CoV-2 and demonstrated an overall 99.3% test concordance nationwide.


Introduction
In late 2019, a novel respiratory virus, severe acute respiratory coronavirus 2 (SARS-CoV-2), emerged in the Hubei province of China and subsequently caused the coronavirus disease 2019 (COVID-19) global pandemic. As the case numbers rapidly grew, it became necessary to decentralize testing to support testing at the federal, provincial/territorial and municipal levels, including private laboratories, hospitals and healthcare facilities. The Canadian Laboratory Response Network (CLRN) at the National Microbiology Laboratory (NML) in Winnipeg, Canada provides high-consequence proficiency panels for biothreat agents to ensure that public health laboratories are ready to respond with high quality diagnostic testing. During the COVID-19 pandemic, the CLRN was leveraged to develop a Proficiency Test (PT) program to support facilities conducting SARS-CoV-2 clinical testing using molecular methods. Similar to other international efforts, the National SARS-CoV-2 PT Program supports the ability of public health testing facilities to establish competency and obtain or maintain accreditation to conduct SARS-CoV-2 clinical testing against a known reference standard to ensure consistency between testing platforms and laboratories across the country and across the globe (1)(2)(3). Nucleic acid amplification tests (NAAT) have been considered the gold standard method for the detection of active SARS-CoV-2 cases. Since the emergence of SARS-CoV-2 in December 2019, there have been a variety of NAATs developed, both laboratorydeveloped tests and commercial assays. This study provides a comparison of the various NAAT platforms employed within Canada over the course of three PT schemes from May 2020 to June 2021.

Materials and methods
Production, quality control and panel distribution Irradiated viruses were diluted in a pooled, negative human nasal secretion as the background matrix at varying concentrations and immediately aliquoted into pre-labelled tubes. Each panel consisted of six blinded, contrived-clinical samples. Samples were sorted by site number, packaged appropriately for transport and stored at −80°C until distribution.
Prior to distribution, quality control measures were taken to ensure sample homogeneity and stability. In short, ten aliquots of each sample were removed from storage, nucleic acids were extracted as per manufacturer's instructions (MagMax TM CORE Nucleic Acid Purification Kit, Applied Biosystems TM , Ontario) and assayed by quantitative real-time polymerase chain reaction (qRT-PCR) (QuantiNova ® Probe RT-PCR Kit, Qiagen ® , Ontario) targeting the E gene of SARS-CoV-2 (4). Coefficient of variations were calculated for each set of panel samples using GraphPad ® Prism's descriptive statistics. An average Ct value with a coefficient of variation less than 10% was necessary to pass sample homogeneity quality controls. Stability testing began day 1 post-production and continued at specified intervals for the duration of the PT scheme using the same approach outlined above. If quality controls passed for homogeneity and stability testing on day 1 and seven post-production, the panels were released for distribution. Stability testing continued for the duration of the test scheme. Panels were packed on dry ice and distributed to the international, provincial and territorial laboratories, who subsequently distributed panels to their partner facilities within their jurisdiction. Cold chain was monitored and if not maintained, a new panel was shipped directly from NML.

Participant selection and intended use
Provincial and territorial members of the Canadian Public Health Laboratory Network (CPHLN) approached the NML to assist the pandemic response by producing and administering a SARS-CoV-2 PT Program, as one was not readily available at the time. The CPHLN provincial and territorial partners provided NML with a list of participants and were responsible for distribution of the test panels within their respective jurisdictions. Participants included provincial and territorial laboratories, public health laboratories, hospitals and healthcare facilities in both urban and rural communities. Specific metadata and details on individual site licensing and accreditation for SARS-CoV-2 were not made available to NML.
The PT panel was intended to be used as an internal validation of SARS-CoV-2 molecular processes, which are performed in conjunction with a nucleic acid extraction method. This panel was not intended to be used on platforms requiring fresh swab material, or the detection of viral antigens or virus-specific antibodies.

Test result submission and analysis
Participating laboratories submitted results to NML outlining the nucleic acid extraction platform and NAAT employed, as well as the viral gene target and Ct values or equivalent obtained. Data were compiled for each molecular platform and gene target used. Coefficient of variation for each gene target within a single platform was determined using GraphPad ® Prism's descriptive statistics. Probit analysis using a 95% cut-off was used to determine limit of detection based on sample detection (5).

Results and discussion
The PT schemes were deployed in May 2020, November 2020 and June 2021, resulting in 683 data sets using 37 different NAAT ( Table 1). Each PT scheme assessed assay sensitivity and specificity. The most commonly used platforms were fully automated low-throughput assays such as the DiaSorin Simplexa TM COVID-19 Direct Molecular Assay, Cepheid Xpert ® Xpress SARS-CoV-2, Cepheid Xpert ® Xpress SARS-CoV-2/Flu/ RSV and BioFire ® FilmArray RP2.1 Test Panel. These systems were employed mainly in hospital laboratories and in rural communities. Larger diagnostic centres, such as provincial laboratories and reference centres, generally employed highthroughput assays, including the Roche Cobas ® SARS-CoV-2 Test (for Cobas 6800/8800), Seegene Allplex TM 2019 nCoV Assay, Thermo Fisher TaqPath TM COVID-19 Combo Kit and LDT targeting the E gene (Table 1). Panel results obtained using commercially available NAATs that have at least three datasets in any given test scheme are presented in Figure 1. Infrequently used platforms were not assessed further. Abbott produces two high-throughput, laboratory-based molecular assays for the detection of SARS-CoV-2: the Alinity m SARS-CoV-2 AMP Kit used with the Alinity m System; and SARS-CoV-2 RealTime PCR employing the m2000 RealTime System. Both systems obtained expected results for all samples across three test schemes. All sites demonstrated consistent results from November 2021 to June 2021 with coefficient of variations less than 10% ( Figure 1). The BD SARS-CoV-2 Reagents for the BD MAX™ System targeting the N gene were utilized for the detection of SARS-CoV-2. The BD MAX System is a fully automated system, allowing the user to run up to 24 samples at a time. Over the course of 13 months, the BD SARS-CoV-2 Reagents for the BD MAX System performed with variable accuracy. During the May 2020 test scheme, samples were accurately detected in all cases, but the coefficient of variation ranged from 4.8%-12.3%, indicating increased variation between users. Discordant results were observed during the November 2021 test scheme; 6/7 failures to detect SARS-CoV-2 were attributed to user error ( Figure 1, Table 2); therefore, the data obtained for Sample G-L were skewed and the accuracy and consistency were negatively affected. Removing these data points would regain an overall 100% target accuracy for the N1 target and 99% accuracy for N2; the latter target failed to identify the presence of Sample I ( Figure 1, Table 2). During the June 2021 test scheme, the BD SARS-CoV-2 Reagents for the BD MAX System performed with 100% accuracy. Ct values were consistent among all users denoted by a coefficient of variations of less than 5% ( Figure 1).  The Cepheid GeneXpert platform is readily used across Canada for the detection of SARS-CoV-2 employing the Xpert Xpress SARS-CoV-2 and Xpert Xpress SARS-CoV-2/Flu/RSV assays. The Xpert Xpress SARS-CoV-2 E assay performed with accuracy (100% detection rate) and consistency (coefficient of variation less than 5%) for all samples; however, discordant results were observed using the N target, specifically for Sample H. Sample H did not contain SARS-CoV-2 but did contain a moderate amount of influenza A virions (Ct 27); there were six instances where the SARS-CoV-2 N2 target produced a Ct greater than 40, which was deemed positive for SARS-CoV-2 by the GeneXpert software ( Figure 1, Table 2). Apart from Sample H, the Ct values for the N target were consistent and had a coefficient of variation less than 10%, Figure 1. The recently developed Cepheid Xpert Xpress SARS-CoV-2/Flu/RSV assay was employed during the June 2021 test scheme and the result output for SARS-CoV-2 was combined for both E and N2 targets. The platform had a 100% accuracy and produced very consistent results with a coefficient of variation less than 2% among all users (Figure 1). The Xpert Xpress SARS-CoV-2/Flu/RSV assay also correctly identified the presence of influenza A and B in Samples O and R, respectively (data not shown).
The Diasorin Simplexa COVID-19 Direct Molecular Assay is a low throughput, automated system that can run up to eight samples at once. Its main distinction from other similar systems, such as the BioFire Film Array and Cepheid GeneXpert platforms, is that it eliminates the nucleic acid extraction/purification step. Discordant results were observed for Sample G and Sample R, the ORF1a/b target missed detecting SARS-CoV-2 n=2/768 times (0.26%), while the S target did not detect SARS-CoV-2 n=3/768 times (0.39%) ( Figure 1, Table 2). According to the manufacturer, the S assay has a 95.8% detection rate of 500 copies/ml (2,000 copies/ml for 100% detection) and the ORF1a/b is detected 93.8% of the time at 1,000 copies/ml (2,000 copies/ml for 100% detection (6). Similar observations were observed here: the S assay performed better than the ORF1a/b assay ( Table 2). Sample G and R are approximately 1,100 and 3,500 copies/ml respectively, which is the range of the assay's limit of detection (LOD) for both targets, and is the likely cause for the discrepant results (Table 4). Furthermore, there was an additional discordant result for each target due to a software error that reported "no result" when Ct values were obtained for both targets ( Hologic produces two SARS-CoV-2 assays that were employed during the scope of the CLRN SARS-CoV-2 PT schemes: Panther Fusion SARS-CoV-2 assay and Aptima SARS-CoV-2 assay. The Panther Fusion SARS-CoV-2 assay was not presented here as only two sites employing this platform, while the Aptima SARS-CoV-2 assay was employed during the November 2020 and June 2021 test schemes with six and eight users respectively (Table 1). This platform demonstrated 100% concordance (n=90/90 samples); however, the Ct values obtained were quite variable, with coefficients of variation ranging from 5% to 19.5% across samples (Figure 1).
During the June 2021 CLRN PT scheme, the Quidel Lyra SARS-CoV-2 Assay targeting the ORF1a/b was employed for the first time by three participants (Table 1). This assay was able to correctly identify all test samples (n=18); however, the variability between Ct values was large, with a coefficient of variations ranging from 17.9 to 27.8 ( Figure 1). This variation in Ct values is largely attributed to one set of test panel results, which provided substantially lower Ct values than the other participants, indicating differences in threshold settings between participants.
The Seegene Allplex 2019 nCoV Assay is a multiplex RT-PCR assay that detects the E, N and RdRp targets and can be automated for high volume testing. This test performed well during the May 2020 and June 2021 PT schemes demonstrating a 100% concordance and consistent results conveyed by a coefficient of variation less than 10% ( Figure 1); however, a number of discordant results were observed during the November 2020 PT scheme, causing subsequent decreases in reproducibility and elevated coefficients of variation. Sample G was associated with n=3/19 E target failures, n=4/19 RdRp target failures and n=1/19 N target failures. While n=2/19 RdRp target failures were associated with the use of a nucleic extraction platform, the remaining failures were associated with a divergence from manufacturer's recommendations and did not employ a nucleic acid extraction step. Furthermore, the reported LOD for the Seegene Allplex 2019 nCoV Assay is approximately 4,000 copies/ml, which is higher than the Sample G titer and is likely responsible for the failure to detect SARS-CoV-2 in this sample (7) ( also associated with one discordant result for each target due to the inability to acquire a valid result. These remaining failures to detect SARS-CoV-2 were all associated with off-label use of not employing a nucleic acid extraction procedure, and are likely the cause of the discordant result since sample titers were all above 4,000 copies/ml. The practice of not implementing an extraction protocol was not observed in the subsequent test scheme. Overall, the E, RdRp and N targets produced discordances of 4.37%, 5.56% and 3.52%, respectively ( Figure 1,  The LDT were also employed during the CLRN SARS-CoV-2 PT Scheme from May 2020 to June 2021. Data sets obtained using LDTs that have at least three sets of submitted results in any given test scheme are presented ( Figure 2). In all cases, all tests were able to detect SARS-CoV-2 effectively and accurately from the test samples provided (Figure 2). The E and RdRp targets were used in all test schemes ( Table 1). The reproducibility of the E target and RdRp target ranged from coefficients of variation between 3.9% and 8.4% and between 3.2% and 10.2%, respectively ( Figure 2). The use of the 5' UTR target emerged during the November 2020 test scheme and results were consistently detected with coefficients of variation less than 7% (Figure 2). Laboratories began employing the N target test during the June 2021 test scheme with coefficients of variation ranging between 6.9% and 9.6% (Figure 2). It should be noted that, apart from the targeted gene, we do not have the specific details regarding the primer/probe sequences implemented by each user and it is possible that the sequences utilized are different. In general, Ct values were similar between all the target tests indicating similar detection affinities; however, a more detailed direct comparative analysis was not conducted, since the assays were not identical. Furthermore, shifts between gene targets are expected, as individual gene expression may differ during viral replication; but this finding could also be attributed to technical variations in the threshold/ detection settings by different laboratories. Overall, the 5' UTR target on average demonstrated the most consistent results with an average coefficient of variation of 4.3%, followed by the RdRp (4.7%), E (5.2%) and N (7.9%) targets. All targets performed within designated specifications of coefficients of variation of less than 10%.
Overall, these results provide insights into test sensitivity; each test scheme involved testing a sample, which contained low concentrations of virus particles, ranging from 1,100 to 1,600 copies/ml (Sample C, 1,600 copies/ml, Sample G, 1,100 copies/ ml or Sample P, 1,600 copies/ml). Effective test sensitivity was observed across all presented commercial and LDT assays employed across the country. A 100% concordance rate for these low concentration samples was observed for all SARS-CoV-2 targets, with a few exceptions. The BioFire Film Array RP2.1 test kit missed detecting Sample G 1/414 times (Table 3); however, this error occurred due to a procedural mishandling of the sample, and upon repetition for remediation purposes, it was detected. Therefore, this error was not included in the general assessment of sensitivity ( Table 3).
The Diasorin Simplexa COVID-19 Direct Molecular Assay missed detecting two low concentration samples, both targets were unable to detect Sample G on two occurrences and the S target failed to detect Sample R (3,500 copies/ml) in one instant (Table 2); however, these discordant results did not cause the 95% limit of detection rate to be affected. The Seegene Allplex 2019 nCoV Assay was associated with a number of failures to detect Sample G. The majority of these failures were attributed to off-label use, where a required nucleic acid extraction process was omitted; for this reason, these results were removed from the subsequent analysis of sensitivity. However, there were two instances associated with proper use, where the RdRp target failed to identify SARS-CoV-2 and were included in the analysis. These discordant results elicited a minor effect on test sensitivity; a 95% detection limit was determined to be 1,358 copies/ml ( Table 2). With the exception of the Seegene Allplex 2019 nCoV assay, all other assays had 95% detection limits below 1,100 copies/ml. These observed results are in line with the manufacturers reported limits of detection for their respective assays (6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16). While outside of the scope of the intended use of this PT scheme, this study was not able to calculate the limit of detection for all the assays due to lack of samples below detectable levels and therefore further comparison of assay sensitivity was not possible.
In addition to test sensitivity, specificity of the assays was also assessed during the PT schemes. More specifically, the May 2020 PT scheme focused on positive and negative agreement, while the November 2020 test scheme added a component for the detection of other respiratory pathogens of significance, and finally the June 2021 test scheme built upon the last by including relevant SARS-CoV-2 variants of concern ( All commercial and laboratory developed tests were successfully able to detect the variants of concern. Of note, the ThermoFisher TaqPath COVID-19 Combo Kit had a drop off in one of its three target genes; the S gene was not able to detect the B.1.1.7 variant, while the other two target genes were successfully identified. According to the manufacturer's recommendations for reporting, a positive result requires n=2/3 targets to have Ct values less than 37; therefore, the loss of the S gene did not impair the assays ability to detect the presence of SARS-CoV-2 in Sample N (14). Failure of the BioFire Film Array RP2.1 to detect SARS-CoV-2 P.1 was attributed to a technical error and not an assay failure; therefore, this test was not included in the analysis. The BioFire Film Array RP2.1 successfully detected the P.1 variant in all other attempts (n=48).
Overall, test specificity was comparable across all three PT schemes and platforms; a 99.5% negative agreement was observed.

Conclusion
Over the course of three PT schemes conducted across Canada between May 2020 and June 2021, the average score obtained by participants was 99.3%, demonstrating consistent testing between laboratories and testing platforms. Similarly high levels of agreement have been observed internationally. The American Proficiency Institute conducted a study across the United States and reported an overall score greater than 97% (3). Similarly, the Royal College of Pathologists of Australasia conducted three PT schemes within Australia and New Zealand between March 2020 and November 2020, with an initial score of 75% concordance early in the pandemic but then dramatically increasing to 95% concordance in the two latter test schemes (2). Finally, a third program from South Korea demonstrated 93% agreement (1). While each program varied in its sample composition and intended uses, it is encouraging to see that rapid deployment of SARS-CoV-2 testing resulted in consistently high degrees of agreement across the globe.
The ability to support quality assurance of testing measures through the provision of an external PT Program is essential during a novel or emerging public health threat. CLRN provides a framework to support the quality assurance required for the decentralization and increase in testing capacity within Canada. All Canadian public health laboratories follow a quality management program required by their respective jurisdictions, and on-site verification and validation schemes are essential to achieve these processes. Furthermore, the comparison of PT panel results allows for the assessment of various NAAT platforms at different locations across multiple users providing an overall assessment of platform performance. The cumulative performance of the NAAT employed during the three CLRN SARS-CoV-2 PT schemes was 99.3% concordant. A future consideration would be to collect additional data from participants to gain a greater scope of demographics, population statistics and accreditation status. This study demonstrates the rapid and successful implementation of a Canadian PT Program and provided comparative analysis of the various emergency use authorized and laboratory developed tests employed for the detection of SARS-CoV-2.