Simulation of pooled-sample analysis strategies for COVID-19 mass testing

Abstract Objective To evaluate two pooled-sample analysis strategies (a routine high-throughput approach and a novel context-sensitive approach) for mass testing during the coronavirus disease 2019 (COVID-19) pandemic, with an emphasis on the number of tests required to screen a population. Methods We used Monte Carlo simulations to compare the two testing strategies for different infection prevalences and pooled group sizes. With the routine high-throughput approach, heterogeneous sample pools are formed randomly for polymerase chain reaction (PCR) analysis. With the novel context-sensitive approach, PCR analysis is performed on pooled samples from homogeneous groups of similar people that have been purposively formed in the field. In both approaches, all samples contributing to pools that tested positive are subsequently analysed individually. Findings Both pooled-sample strategies would save substantial resources compared to individual analysis during surge testing and enhanced epidemic surveillance. The context-sensitive approach offers the greatest savings: for instance, 58–89% fewer tests would be required for a pooled group size of 3 to 25 samples in a population of 150 000 with an infection prevalence of 1% or 5%. Correspondingly, the routine high-throughput strategy would require 24–80% fewer tests than individual testing. Conclusion Pooled-sample PCR screening could save resources during COVID-19 mass testing. In particular, the novel context-sensitive approach, which uses pooled samples from homogeneous population groups, could substantially reduce the number of tests required to screen a population. Pooled-sample approaches could help countries sustain population screening over extended periods of time and thereby help contain foreseeable second-wave outbreaks.


Introduction
The incubation period of coronavirus disease 2019 (CO-VID- 19) can be as long as 14 days and an unknown proportion of asymptomatic carriers is capable of transmitting the infection, these two factors present substantial challenges for controlling and mitigating the disease. [1][2][3][4][5][6] Although around 80% of people with COVID-19 are reported to have mild disease, 7 the remaining 20% often have severe symptoms and could potentially overwhelm health-care facilities that are already overstretched. 8 Consequently, most countries aim to avoid large surges in patients with COVID-19 and to level the demand for health care, particularly for intensive care beds for patients with respiratory failure. 9 Severely affected nations in the northern hemisphere have adopted drastic containment measures, including the complete lockdown of regions and countries.
As of March 2020, few cases have been reported in Africa or Latin America. However, researchers predicted that Africa would face importation and spread of COVID-19. 10,11 Although most countries in sub-Saharan Africa are screening targeted travellers, this has proven ineffective due to the disease's natural history, specifically the potential for spread during the incubation period. Unless swift and collective interventions are instituted, the effect of COVID-19 might be devastating for countries with fragile health systems. 11 As a consequence, the World Health Organization (WHO) recommended that countries, particularly those experiencing their few first cases of COVID-19, should perform active surveillance, including testing, isolating cases and tracing contacts. 9 It is highly unlikely that infection transmission will be eliminated in the next few months in countries with well-es-tablished outbreaks. Instead, the epidemic will predominantly be controlled, which will lead to the stepwise withdrawal of restrictions, albeit with localized flare-ups that could necessitate the return of strict containment measures. In second-wave outbreaks, comprehensive, rapid and cost-effective, localized mass testing may be required to identify both symptomatic and asymptomatic cases and prevent further spread.
In both scenarios, settings with a few first cases and second-wave outbreaks, all symptomatic and asymptomatic cases of COVID-19 must be identified rapidly. Confirmation of infection, particularly in asymptomatic individuals, relies on real-time polymerase chain reaction (RT-PCR) tests for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). 4 Although RT-PCR tests have been used in epidemiological studies, 12,13 they are time-consuming and costly. However, mass testing is important for a wide-range of COVID-19 control strategies and evidence of its effectiveness in a local population has been reported in the small Italian town of Vo', which has around 3000 inhabitants. 14,15 After isolation of the approximately 3% of the population who tested positive, transmission ceased and only six individuals were still infected after 14 days. In larger populations, however, such comprehensive surveillance may be impracticable or too costly and the test workload may rapidly outstrip capacity and resources. 16 Many low-or middle-income countries with constrained resources will find it even more difficult to carry out extensive testing and long-term lockdowns may not be an option because economic necessity could preclude self-isolation.

Andreas Deckert, a Till Bärnighausen a & Nicholas NA Kyei a
Objective To evaluate two pooled-sample analysis strategies (a routine high-throughput approach and a novel context-sensitive approach) for mass testing during the coronavirus disease 2019 (COVID-19) pandemic, with an emphasis on the number of tests required to screen a population. Methods We used Monte Carlo simulations to compare the two testing strategies for different infection prevalences and pooled group sizes. With the routine high-throughput approach, heterogeneous sample pools are formed randomly for polymerase chain reaction (PCR) analysis. With the novel context-sensitive approach, PCR analysis is performed on pooled samples from homogeneous groups of similar people that have been purposively formed in the field. In both approaches, all samples contributing to pools that tested positive are subsequently analysed individually. Findings Both pooled-sample strategies would save substantial resources compared to individual analysis during surge testing and enhanced epidemic surveillance. The context-sensitive approach offers the greatest savings: for instance, 58-89% fewer tests would be required for a pooled group size of 3 to 25 samples in a population of 150 000 with an infection prevalence of 1% or 5%. Correspondingly, the routine high-throughput strategy would require 24-80% fewer tests than individual testing. Conclusion Pooled-sample PCR screening could save resources during COVID-19 mass testing. In particular, the novel context-sensitive approach, which uses pooled samples from homogeneous population groups, could substantially reduce the number of tests required to screen a population. Pooled-sample approaches could help countries sustain population screening over extended periods of time and thereby help contain foreseeable second-wave outbreaks.
Strategies for COVID-19 mass testing Andreas Deckert et al.
PCR testing and highly-automated, matrix, sample pooling), 16,[18][19][20][21][22] extracts from a random number of samples from a heterogeneous population group are combined into a single tube for pooled PCR analysis. These strategies have been shown to be cost-effective during mass testing compared with individual testing. 18,19,23 Recent research on establishing the optimal pool size that maintains the testing accuracy for SARS-CoV-2 PCR assays has found that accuracy is retained in a pool size of up to 32 samples. 22,24,25 It appears that costs can be reduced substantially without sacrificing accuracy.
The aims of this study were to evaluate the performance and resource needs of two pooled-sample analysis strategies for the mass-testing of SARS-CoV-2 infection during the current COVID-19 pandemic and to investigate how infection prevalence influences the optimum number of samples that can be pooled and, therefore, the number of tests required. The two strategies evaluated were: (i) routine, high-throughput, two-step, pooled-sample PCR analysis involving heterogeneous sample pools (hereafter referred to as the routine high-throughput approach); and (ii) a novel approach involving pools derived from homogeneous population groups that are purposively formed in the field (hereafter referred to as the contextsensitive approach). 22,24,25 With the routine high-throughput approach, first sample pools are composed randomly in the laboratory for analysis. Then, in a second step, all samples that contributed to any pool that tested positive for SARS-CoV-2 are analysed individually. 16,[18][19][20][21]23,26 However, during COVID-19 outbreaks, there is a high likelihood that some members of homogeneous groups (e.g. families, office colleagues or neighbours) will become infected once one individual has imported the infection into the group. Response teams carrying out contact tracing could take advantage of this situation and designate homogenous groups in the field for subsequent pooled-sample analysis. With the context-sensitive approach, first groups of similar people of a defined size are formed and swab tests of all group members undergo pooledsample RT-PCR analysis. Again, in the second step, all members of any group whose pooled sample tested positive are investigated individually. This second approach could require an even lower number of tests than routine highthroughput testing, thereby reducing both costs and the workforce needed for population screening.

Methods
The cost-effectiveness of pooled-sample PCR screening is commonly assessed using computer simulations. [27][28][29] For our comparison of the number of tests required with two mass testing strategies, we applied Monte Carlo simulation techniques because of the wide range of uncertainty in some parameters during the current COVID-19 pandemic. 30,31 Routine high-throughput approach We investigated the performance of the routine high-throughput approach to pooled-sample analysis in two populations of 150 000 and 15 000, respectively, for a SARS-CoV-2 infection prevalence ranging from 0.5-20%, in incremental steps of 0.5%. We varied the group size from 2 to 100; correspondingly, the number of groups in a population of 150 000 varied from 75 000 to 1500, respectively. To simulate the spread of the infection, we first formed the groups and then determined the number of infected individuals within each group by applying a binomial distribution (parameters: overall prevalence and group size). The total number of tests required was the sum of the number of pooled groups (in the first step, all groups were tested) and the number of groups that tested positive times the group size (in the second step, all members of groups that tested positive were tested individually). The results of the simulation are presented as a three-dimensional graph that shows how the number of tests required varies with group size and infection prevalence. As we used stochastic variables, the surface of the plot contained some small-scale ripples, which we smoothed using a spline smoothing function. All simulations were conducted in SAS 9.4 TS1M4 (SAS Institute Inc., Cary, United States of America).

Novel context-sensitive approach
We repeated the simulation for the context-sensitive approach with homogeneous pooled samples. With homogeneous groups, it is reasonable to assume that the within-group variation in any characteristic is smaller than the between-group variation. Hence, if one member of a pooled group is infected with SARS-CoV-2, there is a high likelihood that other group members are also infected. In addition, we assumed that the within-group infection prevalence decreases nonlinearly with increasing group size because the composition of the group becomes more diverse as it gets larger. This relationship was assumed to be: where p group is the average within-group prevalence expressed as a percentage, p all is the percentage overall prevalence and s group is the size of the pooled group. For example, Fig. 1 shows the relationship between the within-group prevalence and group size for an overall prevalence of 0.5%. For a given overall prevalence, we calculated how many pooled groups would test positive for different group sizes and within-group prevalences. Furthermore, we performed a Bernoulli experiment for each group (parameter: probability that a group will test positive). Subsequently, we estimated the number of people who would test positive in each pooled group that tested positive using binomial distributions (parameters: within-group prevalence and group size). As a control measure, we calculated the overall prevalence from simulation data and found that there was a negligible difference from the initial assumed overall prevalence (available in the data repository). 32 The other steps in the simulation were the same as those for the routine highthroughput approach. For the context-sensitive approach, we also performed a sensitivity analysis by determining how the number of tests saved would be affected by altering the functional form of the relationship between the within-group prevalence and the size of the homogeneous groups. In addition, to account for actual variations in group size (e.g. for households, offices in a company or seat rows in an aircraft), we investigated a scenario in which 20% of groups had two members, 30% had three members, 25% had four members, 15% had five members and 10% had six members.
For the two approaches, we calculated the percentage reduction in, and a reduction factor for, the number of tests Research Strategies for COVID-19 mass testing Andreas Deckert et al.
required relative to individual sample analysis for different group sizes and for an infection prevalence of 1 and 5%. Here, we did not apply a smoothing function. The reduction factor provides another way of looking at resource savings that might be understood more intuitively than a percentage. For a population size N p , the reduction factor, RF, was defined as: where N is the number of tests required with the pooled-sample approach.

Routine high-throughput approach
As expected, the analysis showed that the number of tests required increased as the prevalence of infection increased. 29 For an overall infection prevalence of 1 or 5%, the number of tests required with the routine highthroughput approach was 24-80% less than with individual sample analysis for group sizes of 3 to 25 in a population of 150 000 ( Fig. 2 and data repository). 32 The corresponding reduction factors are shown in Fig. 3. Given this low prevalence, a substantial reduction in tests required can be achieved with a wide range of group sizes. For example, with a prevalence of 1%, selecting a group size between 5 and 50 implies at least 58% fewer tests. With a high prevalence of 10%, a reduction in the number of tests of around 40% can still be achieved but the selected group size must be close to 3 (data repository). 32 When the prevalence is high and the group size is large, the number of tests required slightly exceeds the number required for individual testing. Fig. 4 shows the number of tests required with the routine high-throughput approach for a wide range of prevalences and groups sizes in a population of 150 000. The minimum number of tests required in this population was 20 388, which was achieved when the prevalence was 0.5% and the group size was 14. This result corresponded to 86% (129 612/150 000) fewer tests and a reduction factor of 7.4 compared with individual testing. The surface plot for a population of 15 000 was similar. 32

Novel context-sensitive approach
Our analysis of the context-sensitive approach showed that the number of tests required increased as the prevalence increased, as it did with the routine high-throughput approach.
For an overall infection prevalence of 1% or 5%, the number of tests required was 58-89% less than with individual sample analysis for group sizes of 3 to 25 in a population of 150 000 ( Fig. 2 and data repository). 32 The corresponding reduction factors are shown in Fig. 3. With this low prevalence, a substantial reduction in tests required was achievable with a wide range of group sizes. For example, with a prevalence of 1%, selecting a group size between 5 and 50 implies at least 76% fewer tests. With a high prevalence of 10%, a reduction of around 65% is still achievable, though the selected group size must be close to 10 (data repository). 32 Fig. 5 shows the number of tests required with the context-sensitive approach for a wide range of prevalences and groups sizes in a population of 150 000. The minimum number of tests required in this population was 10 740, which was achieved when the prevalence was 0.5% and the group size was 27. This result corresponded to 93% (139 260/150 000) fewer tests and a reduction factor of 14.0 compared with individual testing. Our sensitivity analyses confirmed that the context-sensitive approach was superior to the routine high-throughput approach for other forms of functional relationship between the within-group prevalence and the size of the pooled group (details available in the data repository). 32 Our investigation of the scenario with a predefined mix of group sizes and a SARS-CoV-2 infection prevalence of 1% in a population of 150 000 found that 67% fewer tests would be required than with individual testing. This reduction fell between the reduction of 48% for a group size of 2 and 81% for a group size of 6.

Discussion
We compared the effects of two pooledsample analysis strategies on the overall number of tests required for population screening during a COVID-19 outbreak. Using Monte Carlo simulations, we found that both the routine high-throughput approach and the novel context-sensitive approach could save substantial resources during surge testing and enhanced epidemic surveillance. The routine high-throughput approach has already been proven to be cost-effective. 26 However, the contextsensitive approach, which involves pooling samples from homogeneous groups, has a greater potential for reducing the number of tests needed for population screening.
Our simulation reflects the conditions both at the start of a general outbreak and during a second-wave outbreak in a local area where the overall prevalence of infection is low. When the prevalence in a population of 150 000 is 0.5%, the number of tests required using the context-sensitive approach varies only slightly for a wide range of group sizes. With a group size ranging from 8 to 50, between seven and 14 times fewer tests would be required compared to individual testing. Even with a group size of 5, five times fewer tests would be required. This wide range of acceptable group sizes makes this approach well suited for outbreak investigation in real-world settings. In practice, field teams could form homogenous groups of different sizes based on local conditions. Further, we found that even in areas with a high prevalence of around 10%, the reduction in the number of tests required would be substantial for a group size of around 10. The reduction in the numbers of tests required would also be large with the routine high-throughput approach across a wide range of group sizes in scenarios with a low infection prevalence but the number would be higher than with the context-sensitive approach. For example, if a pool size of 10 had been used in Vo' in Italy, 15 the estimated number of tests required with the routine highthroughput approach would have been almost twice that needed with the context-sensitive approach (i.e. 1040 versus 560), assuming the within-group prevalence declined exponentially with increasing pool size.
Effectively curbing a COVID-19 outbreak involves the prompt identification and isolation of infected individuals in a short period of time. 9,15 Curbing the outbreak is particularly important for low-and middle-income countries,

Fig. 3. Reduction factor for the number of tests required with pooled-sample analysis relative to individual testing during a COVID-19 outbreak, by analysis strategy, pooled group size and SARS-CoV-2 infection prevalence
Reduction factor for the number of tests required

Number of samples in each pooled group
Routine high-throughput approach, SARS-CoV-2 prevalence 1% Routine high-throughput approach, SARS-CoV-2 prevalence 5% Context-sensitive approach, SARS-CoV-2 prevalence 1% Context-sensitive approach, SARS-CoV-2 prevalence 5% The context-sensitive approach involved analysing pooled samples from groups of similar people of a defined size. The reduction factor was defined as the number of tests required for individual testing of a population divided by the number of tests required in the same population using a pooled-sample approach. Our simulation considered a population of 150 000. Although the graph shows a continuous variation in tests required, in the simulation group size was varied in discrete steps.

Research
Strategies for COVID-19 mass testing Andreas Deckert et al.
where major outbreaks could exert extreme pressures on resource-poor health systems. Although widespread RT-PCR analysis provides the best method for detecting cases, individual testing will most likely not be affordable in these countries. Consequently, pooled-sample analysis could provide a better option, especially during surge testing and enhanced epidemic surveillance. Our analysis demonstrates that pooled testing could also save resources when used instead of individual testing during second-wave outbreaks, such as in Vo' , where the entire population was tested and only those who tested positive were isolated. 15 The next step in reaping the benefits of the context-sensitive pooledsample approach is to develop implementation strategies for real-life epidemic and health systems contexts. For example, during ongoing active surveillance, several households in a specific region could be randomly selected to monitor infection prevalence and detect flare-ups early, with the specific households selected changing over time. Although some asymptomatic infected individuals could be missed, this approach would help keep the prevalence low until herd immunity is achieved or a vaccine becomes available. The timing of testing after infection is not critical because testing is ongoing and clusters can be detected on a rolling basis.
The context-sensitive approach could be implemented easily in any surveillance strategy, especially in highincome countries with civil registration systems. Individuals could be allocated to homogenous groups for pooling before field work. It may be possible to identify all symptomatic and asymptomatic individuals if a time-limited, local lockdown is in place at the time of testing. Then, only those who test positive would have to be isolated, whereas others should continue to adhere to preventive measures, such as physical distancing and wearing facemasks in enclosed public places. This approach may enable sustainable COVID-19 control without drastic populationwide measures. Although the ability of PCR-related approaches to identify all infected individuals is limited by the technique's sensitivity and specificity, the accuracy of SARS-CoV-2 RT-PCR assays does not appear to be reduced by the use of small-or medium-sized sample pools. Moreover, recent studies on SARS-CoV-2 and other infectious agents indicate that the sensitivity and specificity of PCR assays remain high for medium-sized sample pools. 16,[22][23][24][25]33 However, additional PCR amplification cycles may be required to retain accuracy with larger sample pools. 25 In technically well-equipped countries where high-throughput pooled PCR analysis can be performed, a multistep approach could be a good option for larger communities and cities. The testing algorithm could follow a tree structure, starting with a few large groups, such as blocks of houses, and then testing sequentially smaller groups. This approach could further reduce the number of tests required.
One strength of our simulation model is that it can be easily adjusted once more accurate estimates of disease prevalence in communities are available. We assumed that the overall prevalence is low at the beginning of an outbreak and that infections occur mainly in clusters. Later in an outbreak, infections will be spread more broadly throughout the entire population. Hence, at an early stage, a low overall prevalence is likely to be accompanied by a high within-group prevalence in a few infected groups. Our simulations captured this situation. In reality, within-group prevalence will most probably depend on the context, our model can be adjusted accordingly. For instance, for homogeneous groups formed among households in high-income countries, the within-group prevalence might decrease rapidly with group size because typical households are small and the space available per person is large. In contrast, for groups formed among larger households in densely populated areas, the slope might be flatter. The functional form of the relationship between within-group prevalence and pooled group size may be different for homogeneous groups formed from people travelling on an aircraft or working together. Our sensitivity analyses showed that our assumption of a negative exponential relationship

Number of test required P r e v a l e n c e o f S A R S C o V -2 i n f e c t i o n ( % )
10 20

50
SARS-CoV-2: severe acute respiratory syndrome coronavirus 2. Notes: The routine high-throughput approach involved analysing pooled samples from heterogeneous groups of people of a defined size for real-time polymerase chain reaction testing for SARS-CoV-2. Our simulation considered a population of 150 000. Although the figure shows a continuous variation in tests required, in the simulation both prevalence and group size were varied in discrete steps.
Strategies for COVID-19 mass testing Andreas Deckert et al.
gave a conservative estimate of the benefits of the context-specific approach; alternative relationships yielded even more favourable results (data repository). 32 Early findings suggest that the within-group prevalence falls sharply as group size increases, though the maximum group size was limited to five in a very specific and localized highincome setting. 34 When we assumed a steeper exponential curve, we found that the context-sensitive approach was still better at preserving resources than the routine high-throughput approach.