Nasal Swab Performance by Collection Timing, Procedure, and Method of Transport for Patients with SARS-CoV-2

ABSTRACT The urgent need for large-scale diagnostic testing for SARS-CoV-2 has prompted interest in sample collection methods of sufficient sensitivity to replace nasopharynx (NP) sampling. Nasal swab samples are an attractive alternative; however, previous studies have disagreed over how nasal sampling performs relative to NP sampling. Here, we compared nasal versus NP specimens collected by health care workers in a cohort of individuals clinically suspected of COVID-19 as well as SARS-CoV-2 reverse transcription (RT)-PCR-positive outpatients undergoing follow-up. We compared subjects being seen for initial evaluation versus follow-up, two different nasal swab collection protocols, and three different transport conditions, including traditional viral transport media (VTM) and dry swabs, on 307 total study participants. We compared categorical results and viral loads to those from standard NP swabs collected at the same time from the same patients. All testing was performed by RT-PCR on the Abbott SARS-CoV-2 RealTime emergency use authorization (EUA) (limit of detection [LoD], 100 copies viral genomic RNA/ml transport medium). We found low concordance overall, with Cohen’s kappa (κ) of 0.49, with high concordance only for subjects with very high viral loads. We found medium concordance for testing at initial presentation (κ = 0.68) and very low concordance for follow-up testing (κ = 0.27). Finally, we show that previous reports of high concordance may have resulted from measurement using assays with sensitivity of ≥1,000 copies/ml. These findings suggest nasal-swab testing be used for situations in which viral load is expected to be high, as we demonstrate that nasal swab testing is likely to miss patients with low viral loads.

C ontrolling the COVID-19 pandemic will require a massive expansion of testing for SARS-CoV-2. Until recently, nasopharyngeal (NP) swabs were the U.S. Centers for Disease Control and Prevention's (CDC) preferred specimen type, as these specimens were thought to provide the most robust detection of patient infection. However, there are conflicting reports as to which of several specimen types bear the highest viral load (1)(2)(3), and ultimately, the "preferred-specimen" specification was removed from interim CDC guidance on 29 April 2020 (4). Sensitivity is a complex issue, however, as detection in the upper airways (nasopharynx and oropharynx) is affected by multiple factors, including duration of illness prior to testing (5) and the limit of detection (LoD) of the reverse transcription (RT)-PCR assay used (6).
Availability of NP swabs and the resources to establish NP collection sites with specimen collection personnel have remained critical bottlenecks. To resolve these issues, health care systems have adopted multiple different strategies, including engaging industrial manufacturers to mass produce novel 3D-printed NP swabs (7), as well as evaluating different specimen types and alternative sample collection strategies (8)(9)(10)(11)(12)(13)(14)(15)(16). Assessment of nasal swabs is a rapidly growing area of interest, specifically because this specimen type involves a less invasive procedure than NP swabs. Accordingly, such samples can be self-collected by patients with a simple set of instructions, alleviating the need for medical personnel for specimen collection and reducing use of personal protective equipment (PPE) in short supply.
Many of the U.S. Food and Drug Administration emergency use authorization (FDA EUA) RT-PCR assays have approval for use of nasal swabs as a specimen type, but how well nasal swabs perform compared to NP swabs remains unclear. Recommendations by the Infectious Disease Society of North America caution that levels of evidence are low. To date, nasal swab studies have shown conflicting results, with some researchers reporting similar test performance to NP swabs and others finding decreased sensitivity (8,10,(12)(13)(14)(15)(16). Reconciling these differences is challenging, as these studies employed different sampling materials, collection methods, and RT-PCR assays. To address these conflicting reports, here we describe the results of a multiarm, 308-subject study comparing sampling in two different clinical scenarios (initial presentation versus follow-up), two different health care worker nasal swab collection procedures, and three different transport conditions, including in viral transport media (VTM) and dry transport. We discuss our findings in the context of prior reports to more systematically assess nasal swab test performance and its preferred role(s) in addressing diagnostic and epidemiologic needs in the COVID-19 pandemic.

MATERIALS AND METHODS
Trial design. This was a multiarm study involving initial versus follow-up presentation, three different specimen-transport conditions, and two collection procedures, using a standard NP swab as a control.
Transport conditions and swabs used. Standard nasal swabs (see immediately below) were compared for subjects presenting for their first COVID-19 test versus subjects with a previous test presenting for follow-up, collected via a shallower/shorter versus a deeper/longer collection method (see Fig. S1 in the supplemental material), under three different specimen-transport conditions: (i) a guanidine thiocyanate (GITC) transport buffer, part of the Abbott Multi-Collect specimen collection kit (catalog no. 09K12-004; Abbott Laboratories, Abbott Park, IL), (ii) dry, with no buffer, and (iii) in modified CDC viral transport medium (VTM) (Hanks' balanced salt solution containing 2% heat-inactivated fetal bovine serum [FBS], 100 mg/ml gentamicin, 0.5 mg/ml fungizone, and 10 mg/liter Phenol red, produced by the Beth Israel Deaconess Medical Center [BIDMC] Clinical Microbiology Laboratories [17]). The nasal swab used was the included Abbot swab for the GITC arm and the Hologic Aptima multitest swab otherwise (catalog no. AW-14334-001-003; Hologic, Inc., Marlborough, MA), all with polyester/nylon/rayon spun material. The NP swab used was the Copan BD ESwab collection and transport system swab, with a head of flocked nylon (catalog no. 220532; Copan Diagnostics, Inc., Murietta, CA). (Note that only the NP swab from the Copan kit was used: the transport medium for the NP swab was 3 ml VTM, not the 1 ml liquid amies transport medium that is part of that kit; see "Swab Collection Protocols," below.).
Participants and collection. Participants were adults over 18 years of age tested for SARS-CoV-2 during the normal course of clinical care, based either on clinically suspected COVID-19 infection or follow-up after previous SARS-CoV-2-positive RT-PCR testing. Participants were asked to be swabbed twice, first with one of the nasal swabs under study (see below for swab collection protocols) and then with a standard NP swab. To control for potential variability related to self-swabbing, sample collection was performed by trained nurses or respiratory therapy staff ("study staff") with training and oversight from the respiratory therapy department at Beth Israel Deaconess Medical Center (BIDMC) drive-through/walk-up ("drive-through") COVID-19 testing sites. Individuals with known thrombocytopenia (,50,000 platelets/ml) were excluded from the study to avoid risk of bleeding. This study was reviewed and approved by BIDMC's institutional review board (IRB protocol no. 2020P000451).
PCR compatibility. Although all of the above swabs are routinely used for PCR testing, as a double-check, each swab type was assessed for PCR compatibility by overnight incubation in 3 ml of modified CDC VTM (allowing potential PCR inhibitors time to leech into media), spiking 1.5 ml of medium with 200 copies/ml of control SARS-CoV-2 amplicon target (twice the LoD of our system), vortexing, and testing using the Abbott RealTime SARS-CoV-2 assay on an Abbott m2000 RealTime system platform (18), the assay and platform used for all testing in this report, following the same protocol used for clinical testing (see below). All swabs examined in this study passed this quality-control testing for lack of RT-PCR inhibition based on observation of cycle threshold (C T ) values within expected quality control limits (17).
Swab collection protocols. For the shallower/shorter collection procedure (henceforth, "shallow"), for each naris, the swab tip was inserted into the nostril, the patient was told to press a finger against the exterior of that naris, and the swab was rotated against this external pressure for 10 seconds; this procedure was repeated with the same swab on the other naris, and then the swab was placed into the collection tube for transport to the laboratory for testing (Fig. S1a). For the deeper/longer collection procedure (henceforth, "deep"), the swab was inserted into the naris until resistance was felt, and the swab was then rotated for 15 seconds without external pressure (Fig. S1b); this procedure was repeated with the same swab on the other naris, and the swab was then placed into the collection tube for transport (15). The NP swab sample was collected from a single naris by standard technique: insertion to appropriate depth, 10 rotations regardless of time, removal, and placement into a transport medium tube containing 3 ml of VTM (4). To maximize collection of material from the nares, in all cases, sampling using the nasal swab (both nares) was performed first, before the NP swab.
Sample processing and testing. Samples were sent to the BIDMC Clinical Microbiology Laboratories for testing. Dry swabs were eluted in 2 ml of Abbott mWash1, which consists of 100 mM Tris with guanidinium isothiocyanate (GITC) and detergent. Swabs transported in GITC buffer were supplemented with 1 ml of Abbott mWash1 solution at the lab in order to achieve minimum volume requirements for testing, for a final volume of 2 ml. The NP swab and final nasal swab were each transported (separately) in 3 ml VTM. The length of time between collection and processing was the same, 4 to 14 h for each pair of NP/nasal swabs from the same subject. Tests were performed with 1.5 ml of sample medium (1.5 ml/2 ml = 75% or 1.5 ml/3 ml = 50% of the total medium in the tubes) using the Abbott RealTime SARS-CoV-2 assay for EUA for use with nasopharyngeal and nasal swabs (18). This dual-target assay detects both the SARS-CoV-2 RdRp (RNA-dependent RNA polymerase) and N (nucleocapsid) genes with an in-lab-verified LoD of 100 copies/ml (17,19).
Statistical analyses. For concordance testing, RT-PCR results were considered categorically either positive, if above the reporting threshold of 31.5, or negative; testing was performed using Cohen's kappa (κ) (20).
For analyses based on cycle-threshold (C T ) values, for discordant samples (positive nasal swab/negative NP swab result or vice versa), the negative result was assigned a C T value of 37, the total number of cycles run. Conversion to viral load was performed as described previously (19).
Significance testing. We tested whether C T values for a given set of nasal swabs differed from the C T values for the paired NP swabs (the controls) using Wilcoxon's paired t test. This tested the null hypothesis that values for controls and prototypes are drawn from the same underlying distribution. The false-discovery rate (FDR) was used to account for multiple testing, with a significance threshold of a = 0.01.
To test whether the κ for a given subgroup of size n differed from that of a larger group, we bootstrapped by randomly sampling n datapoints from the larger group, calculating κ for that randomly sampled subset, and repeating this process 10,000 times to generate a distribution (histogram) of κ values; this distribution constitutes a null model of the κ one would expect to observe by chance in a sample of n results, given the data in the larger group. Using this distribution, we then calculated the probability of observing a κ at least as high as the κ actually observed for the n datapoints in the given subgroup, to test for consistency with expectation. We again used FDR; inconsistency (P , 0.05 or P . 0.95) would reject the null hypothesis that the study arm and the larger pool are statistically indistinguishable (as measured by kappa). For completeness, we performed the same bootstrap analysis to compare procedure 1 (shallow) and procedure 2 (deep) results to all results.
Literature review. We searched PubMed and the preprint servers bioRxiv and medRxiv through 1 June 2020 for all literature on nasal swab sampling for SARS-CoV-2 and extracted sample sizes, collection methods, RT-PCR assay information, and 2 Â 2 contingency table data comparing nasal swabs to NP swabs wherever available.
Data availability. All C T values are available upon request. Conversion from C T value to viral load for the assay and platform we used is available via the ct2vl Python 3 library, which can be downloaded/installed from PyPI at https://github.com/ArnaoutLab/ct2vl. PHI-scrubbed data and analysis code can be found at https://github.com/rarnaout/Covid_diagnostics/tree/main/Covid_Nasal_SI. Table 1 shows the numbers of patients tested in each of the six arms of our nasal versus NP swab study. Visual inspection of plots of the C T values of the nasal swab versus NP swab controls suggested worse performance for nasal swabs across all six arms, with no obvious differences between the two swab procedures or among the dry swab, VTM, or GITC collection methods (Fig. 1). Statistical testing confirmed that results for each arm were indistinguishable from the overall results, supporting the functional equivalence of all swab/transport-condition combinations ( Table 1). The only exception was for comparisons involving initial testing, for which C T values were lower than for the overall data set and lower than in follow-up testing (5 to 30 days after the initial test; Fig. S2). For concordant positives (n = 41), comparison of C T values between nasal and NP swabs showed higher C T values for nasal swabs than for NP swabs, suggesting slightly but consistently lower yield from the nasal swabs (Wilcoxon P , 0.0001). Consistent with this conclusion, there was a marked increase in false negatives for NP swabs with higher C T values (lower viral loads), resulting in low concordance overall (Cohen's kappa = 0.49) (Fig. 1). Bold values indicate statistical significance (P value of 0.01 corrected for false-discovery rate using the Benjamini-Hochberg method).

RESULTS
c P values for difference of κs are comparing the given row to all data. d P values for difference of κs are comparing the given row to all data/data with the same timing (separated by a slash).
e P values for difference of κs are comparing the given row to all data/data with the same procedure (again separated by a slash).
Our finding of low overall concordance was in contrast to some previous reports which found nasal swab collection to exhibit excellent sensitivity as well as C T -value concordance (13,15), but was consistent with others (14, 21), including, for example, one recent study at a New York, USA, hospital that also noted lower nasal swab concordance for higher C T values (16). Close review of these previous reports revealed that they differed in the type of specimen and/or result they used as a reference (e.g., any test-sample positive versus using NP swabs as the gold standard) and in the parameters they used in order to describe test performance (e.g., positive percent agreement versus sensitivity). To control for at least the latter, we extracted 2 Â 2 contingency-table data from these reports to facilitate comparison to each other and to our own    (Table 2; Table S1). Notably, many of these studies used a modified version of the CDC assay that did not report a LoD. Furthermore, of the studies that report the C T values of their results, no viral-load conversion was provided, which is important since different RT-PCR assays and platforms have unique conversions between C T value and viral load. Therefore, we were unable to systematically compare nasal-swab performance at low viral loads in these reports. These differences left open the possibility that inconsistent comparative performance of nasal-swab sampling might be explained largely by differences in assay LoD, and possibly also by patient viral load. Nasal-swab sampling protocols and transport medium conditions varied between studies; there was no obvious correlation between concordance and whether specimens were collected by the subjects themselves or by health care workers, or the relative timing of collection.
We therefore revisited the trend we observed of a rise in nasal swab false negatives at higher C T values (low viral loads). Recently, we demonstrated that C T values for the SARS-CoV-2 RT-PCR assay and platform used in the present study are reliable quantitative measures of viral load and introduced a conversion from C T value to viral load (on the Abbott m2000, a viral load of 100 copies/ml corresponds to a C T value of 26, and 1,000 copies/ml corresponds to a C T value of ;21.7) (19). Building on those findings, here, we asked what the concordance would have been, for our nasal versus NP data had the LoD of our assay been higher than its actual 100 copies/ml. Specifically, we recalculated kappa for different LoD cutoffs and found that kappa rose steeply from ;0.5 (low concordance) to 0.8 to 0.9 (excellent concordance) as the LoD cutoff was increased from 100 copies/ml to 1,000 copies/ml and beyond (Fig. 2). This finding strongly supports the view that nasal swabs miss many if not most patients with low viral load (below ;1,000 copies/ml) but are reliable for patients with medium or high viral loads, potentially resolving disagreements among previous reports.

DISCUSSION
Resolving the damage that the COVID-19 pandemic has wrought will require scaling up testing to unprecedented levels. For this reason, there is widespread interest in developing alternatives to NP swab sampling for COVID-19 diagnosis, such as nasal swabs. Proponents argue that the self-administration of these swabs would vastly increase testing capacity, save PPE, and ease the burden on health care workers. ) plotted against assay LoD for all data (thick line), only initial-testing data (thin solid line), and only follow-up-testing data (dotted line). With its LoD of 100 copies/ml (solid arrowhead), the Abbott assay detects false negatives in nasal-swab samples, resulting in low overall concordance (κ = 0.49); even lower concordance for follow-up testing (κ = 0.27), likely because viral loads in this population are lower than they are overall; and still-low concordance for initial testing (κ = 0.71), despite viral loads being higher for initial tests than overall. In contrast, an assay with an LoD of 1,000 copies/ml (open arrowhead) would have missed these false negatives, which would have yielded substantially higher observed concordances regardless of subset.
Independently, the ability to transport swabs to testing locations without need of transport media such as VTM would further streamline testing processes. Reflecting this interest in nasal swabs, the U.S. CDC has removed the "preference" specification for NP swabs from their interim guidance and note that nasal swabs are an acceptable alternative specimen as of 29 April 2020 (4). However, confidence in population-scale testing strategies based on nasal swabs is complicated by conflicting reports as to how well they perform relative gold standard NP swabs.
We found quite weak concordance between nasal and NP swabs, with Cohen's kappa values of 0.26 to 0.54 for the six arms and 0.49 overall (Fig. 1), in agreement with some prior studies but in stark contrast with others ( Table 2; Table S1). Our results strongly suggest that concordance between nasal and NP swabs depends on the LoD of the PCR assay used to measure positivity, with concordance roughly proportional to viral load; low viral loads may go undetected, depending on the LoD of the assay used. (Fig. 2) (19). We find that nasal swab samples reliably detect patients with viral loads of $1,000 copies/ml but miss many patients who have lower viral loads (19). Often, repeat testers present with low viral loads, which may explain the difference in concordance between the initial and follow-up arms of this study and suggests that highsensitivity assays are necessary to detect viral material in these so-called long-haulers. One possibility is that in cases of high viral load, replicating virus may be more likely to spread to respiratory epithelium bordering and/or in the deeper portions of the anterior nares, where it can be recovered by nasal swab. Note that the expected decrease in reproducibility for viral loads near the limit of detection is insufficient to explain our findings, since if, e.g., nasal and NP sampling were equally sensitive, the decrease in reproducibility would affect them equally, with observations of NP1/nasal2 and NP2/ nasal1 being equally common, which was not seen.
Our findings may reconcile disagreements in prior reports which have compared nasal swab performance only as a function of C T values, which are not comparable from study to study, not viral load, as we have done here. We hypothesize that the testing sites in these studies may have selected for patients early in the course of disease, when viral load is high (13,15). For example, one study (13) that showed high concordance used an assay with a negative C T cutoff of 40, and only a few patient samples had C T values above 35. The discrepancy between cutoff and C T values suggested preferential sampling of patients with only high viral loads. (Note that a C T value of 35 can correspond to different viral loads in different assays, and the LoD of this assay was 4,167 copies/ml, over 40 times the cutoff in the assay we used.) Notably, the patient population in the present study consisted of both first-time and repeat testing. Many of the latter have been observed to exhibit low-level viral load for weeks in the absence of severe symptoms, and enrichment of these patients may impact the overall performance of NP and nasal swabs in individual studies. In other studies, such differences may be obscured depending on the limit of detection.
Interestingly, we found no difference among transport medium conditions or between sampling protocols, suggesting that the lower sensitivity of nasal swab sampling is an overall limitation of the anatomical location of nasal swab specimens and that the protocols and medium conditions we tested are interchangeable. This was consistent with our review, which demonstrated no obvious correlation between concordance and whether the sample was self-collected or collected by health care workers (which we expect to be roughly bracketed by our two collection procedures). Thus, for patients above a critical threshold of 1,000 copies/ml (Fig. 1), nasal swabs collected in VTM, GITC transport medium, and as dry swabs are all likely to perform equally well in the population, providing multiple potential options for specimen acquisition.
Our results suggest several settings in which nasal swabs may and may not best be used. Peak infectiousness is likely to occur near or shortly before symptom onset (22,23), and nasopharyngeal viral load is often undetectable a week after symptom onset (2). Lower-sensitivity testing would likely miss patients with early-developing presymptomatic infections and patients presenting multiple days after symptom onset.
Notably, for those presenting later to care, a false-negative diagnosis could bear significant clinical implications, erroneously reassuring the patient and clinical team and excluding them from potentially useful and rationed therapies such as remdesivir (24) or others. Importantly, based on viral load distribution in first-time tested individuals at our institution, ;20% of newly presenting SARS-CoV-2-positive individuals would be missed if sampled solely using nasal swabs (19), highlighting the potential magnitude of this problem.
Nevertheless, nasal swabs provide considerable advantage in terms of ease of collection and potential self-collection. Based on our results, they would serve best in high-test-volume, point prevalence screens in healthy populations, for example, in businesses and universities, where identification of highly infectious individuals will be a prelude to targeted testing with the most sensitive techniques possible to quell outbreaks and forestall local spread. Conversely, nasal swabs should not be used for screening symptomatic and, especially, hospitalized patients, where the more sensitive and resource intensive nasopharyngeal sampling would be justified and help direct care and the most appropriate use of infection control resources. In summary, while nasal swabs are a welcome addition to the armamentarium of tools needed to combat COVID-19, we should be well aware of possible limitations in diagnostic sensitivity and use this resource judiciously.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. SUPPLEMENTAL FILE 1, PDF file, 1.2 MB.