Evaluating COVID-19 reporting data in the context of testing strategies across 31 low- and middle-income countries

Highlights • Statistical change detection methods differentiate epidemiological changes.• Efficient surveillance is more associated with open testing than high testing rate.• Non-pharmaceutical interventions align with epidemiological changes across low-and-middle-income countries.• Rwanda stands out as having an efficient surveillance system for coronavirus disease 2019.• Subnational data reveal heterogeneous epidemiological dynamics and surveillance.


Introduction
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), the cause of coronavirus disease 2019 , was first identified in Wuhan, China in December 2019. Since then, countries have scrambled to monitor the severity and trajectory of the COVID-19 outbreak and to control its progression using non-pharmaceutical interventions (NPIs). Disease surveillance has mainly relied on case counts to inform public health policies ( WHO, 2020 ). However, there has not been a robust evaluation of case counts as a metric for epidemiological dynamics, nor the varied surveillance approaches used to track disease trajectories.
Case-based surveillance systems have known weaknesses, including the strong influence of testing rates which vary widely across space and time ( Haider et al., 2020 ). Case counts can be measured inconsistently, testing capacity is limited, and eligibility policies are variable. It is critical to understand the limitations of available data and to identify metrics that are robust to these challenges, particularly for low-and middle-income countries (LMICs).
There is general recognition that surveillance system performance can be a challenge in LMICs, and that understanding disease surveillance is key to system improvement and production of representative data ( Petti et al., 2006 ). Existing efforts to evaluate LMIC surveillance systems, however, are largely qualitative, country-specific or based on commentary ( Alwan, 2020 ;Farahbakhsh et al., 2020 ;Ibrahim, 2020 ). Further, most nationallevel studies of NPI impacts focus on high-income countries, but there is evidence that these insights cannot be readily generalized to LMIC settings ( Brauner et al., 2020 ;Chen et al., 2020 ;Dehning et al., 2020 ;Flaxman et al., 2020 ;Haider et al., 2020 ;Hsiang et al., 2020 ;Islam et al., 2020 ). This leaves an important knowledge gap in understanding how to evaluate and interpret COVID-19 epidemiological data from LMICs.
To address the gap in systematic interpretation and evaluation methods, statistical analysis techniques were leveraged to detect changes in underlying properties of COVID-19 time series surveillance data across 31 LMICs. With this information, detected change points were categorized as likely driven by epidemiological changes or non-epidemiological influences, such as noise. This provides a quantitative and automated approach to analysing epidemiological surveillance data. Imperfect information is used despite data weaknesses, deriving insights from information available in LMICs that may otherwise be overlooked. The approach is fast and highly portable, well suited to looking across countries, and has minimal data requirements. This paper presents the methods for the analysis, including the statistical model, change point categorization, and evaluation of epidemiological change co-occurrence with NPIs. Next, the paper discusses validation of the method, the usefulness of open testing, comparisons of country surveillance characteristics, and consideration of subnational dynamics. Finally, the authors elaborate on the significance of the results, broader conclusions, and relevance for public health applications.

Methods
The methods are outlined in Figure 1 for two example countries: South Africa and Bangladesh. Details about each step are presented in the following subsections.

Data
National-level case and testing data were used, as well as records on national policies for testing and NPIs ( Hale et al., 2020 ;Roser et al., 2020 ). Test positivity was calculated by dividing cases by tests. Testing policy is indicated by ordinal values: zero indicates no testing policy; one indicates testing of those with symptoms who meet specific criteria (e.g. known contact with a positive individual); two indicates testing of any symptomatic individuals; and three indicates open public testing. For South Africa, provincial-level data on COVID-19-confirmed deaths, cases, tests and excess mortality were also used ( Bradshaw et al., 2020 ;Mkhize, 2020 ;National Institute for Communicable Diseases, 2020 ;Statistics South Africa, 2020 ).
Countries were selected for analysis based on three conditions: available case data, available testing data, and human development index (HDI) score. Of those with data, the countries in the lowest third of HDI score were included, all of which are considered lowor middle-income in 2020-2021 by the World Bank. All data used in this research are public. Further details on data and definitions are given in Appendix A.1 .

Change point detection PELT change detection
Change point detection is a set of approaches for identifying points in time where the statistical properties of a time series change ( Truong et al., 2020 ). In this study, change point detection was applied to epidemiological time series (cases, tests and positivity) and national policy time series; details are given in Appendix A.2 . Without a-priori knowledge of the appropriate number of changes, the pruned exact linear time (PELT) algorithm must be assigned a penalty for the number of changes to identify. In the absence of an established method for this parameterization when working across time series, a novel systematic approach for penalty selection was developed which enables comparison across time series and countries; details are given in Appendix A.3 .

Method validation
PELT was applied to synthetic case count data generated by the stochastic agent-based COVID-19 simulator (COVASIM) in order to test PELT as a robust method for change detection in epidemiological time series ( Kerr et al., 2021 ). The model scenario inputs include step-wise changes in contacts per person per time which represent NPI implementation, as well as a change in testing policy from symptomatic to asymptomatic testing. The model generates a simulated time series of cases and tests per 10 0 0 people, from which a positivity time series was calculated. The change point detection methods described above are applied to the 7-day mean of the time series to align with the data smoothing used with the empirical time series.

Change type categorization
Change detection identifies changes that may be related to data quality, stochasticity and testing dynamics, in addition to epidemiological changes. The likely causes of changes identified by the PELT algorithm were classified based on the co-occurrence of changes from different time series. This categorization simplifies the interpretation of epidemiological surveillance, separates signal from noise, and enables broad comparison across countries and testing dynamics.
Detected change points were combined across cases, tests and positivity time series to create change point groups. The tolerance for temporal association was set at ±7 days to account for 7day smoothing and weekly data reporting practices. These change groups were categorized as shown in Figure 2 , with details of the interpretation described in Appendix B . To capture all changes that may be epidemiological, both Categories D and E were included as epidemiological change in the analysis. These categories are heuristically defined, but they are informed by validation using the COV-ASIM simulations and a qualitative understanding of epidemiological surveillance dynamics.

NPI alignment
Change points classified as epidemiological were assessed for whether they were associated with NPI changes. Timings of known NPIs in the empirical data were lagged by 9 days to account for virus incubation time and the delay from symptom onset to testseeking ( Qin et al., 2020 ). A change point was considered to be aligned with an NPI when two conditions were met: (1) an epidemiological change co-occurs with an offset NPI; and (2) the change in NPI stringency is inverse to the concurrent change in positivity slope. The second condition included occasions when stringency increased and positivity decreased, as well as occasions when stringency decreased and positivity increased.

Synthetic modelling validates PELT as a robust method for change detection in epidemiological time series
Applicability of the PELT method for epidemiological systems was validated before applying it to the surveillance data. PELT change detection was applied to data from the transmission model described above. The sensitivity of change point detection to parameterization is illustrated in the top two case rate time series of Figure 3 . The bottom positivity time series shows the detected change points for all time series, parameterized by the method described in Appendix A.3 . PELT successfully identifies step changes in NPI and testing policies, as well as slope changes in cases, tests and positivity. Further, the categories of change point groups are correctly identified in line with the classification scheme, labelled on the positivity time series in black and purple boxes and described in Figure 2 .

Testing rates and policies impact how surveillance measures should be interpreted
The relevance of testing rates and the influence of testing policy are illustrated using time series for Bangladesh in the context of local events ( Figure 1 ). Cases peaked in early July, an apparent epidemiological turning point if cases were considered alone. Simultaneously, however, a new policy was implemented to charge for testing, and thus there was a decline in testing ( Cousins, 2020 ). This resulted in no change in positivity, and contradicts the interpretation of the case reduction as a declining outbreak. Similarly, the dip in case rate in early August was accompanied by a dip in testing during the Eid al-Fitr holiday; again, there was no change in positivity.
While this recommends positivity as a surveillance metric instead of case counts alone, further consideration of testing policy complicates the picture. Test eligibility in Bangladesh is based on symptoms rather than open testing, meaning that positivity is influenced by the prevalence of both COVID-19 and other respiratory illnesses. This limits the potential for positivity to detect epidemiological changes, and the positivity curve for Bangladesh is largely flat. An elaboration of COVID-19 surveillance considerations is given in Appendix C .
Epidemiological change detection is more influenced by testing policy than by testing rate PELT change detection and change point categorization were applied to all 31 LMICs in the dataset. Surveillance system efficiency was quantified as the percentage of all detected change points classified as epidemiological (i.e. epidemiological change detection rate). Linear fits of epidemiological change detection were compared by tests per 10 0 0 people and by testing policy ( Figure 4 ).  Note that binned calculations cause the maximum epidemiological change detection rate to differ between the two plots.
The results indicate that the ability to identify epidemiological change has a stronger relationship with testing policy than with tests per 10 0 0 people. Open testing is the only testing policy bin with a mean or median epidemiological change detection rate as high as 50%, but with a wide range, indicating that open testing policy is necessary but not sufficient for quality surveillance (with outlier exceptions).
Further, LMICs have the testing capacity to measure prevalence with precision. Based on the 95th percentile of their daily testing rates, nearly all LMICs could measure down to 1% prevalence with a margin of error no larger than ±1% if random sampling was used for testing ( Figure 5 ). Only three countries hover around the margin of error to prevalence ratio of 1: Malawi, the Democratic Republic of Congo and Togo. It should be noted that true random sampling is difficult to achieve in any setting, but open testing policies can approximate random sampling more closely than symptomatic testing.   Ethiopia the lowest. The percentage of NPIs that are aligned with a detected epidemiological change is shown in Figure 6 B, again led by Rwanda. Rwanda performs well by these metrics regardless of change detection parameterization ( Appendix A.4 ). Nearly all countries in this analysis show at least one detected epidemiological change. Conversely, approximately half of the countries in this analysis show zero alignment of any type of NPI with an epidemiological change, although the number of NPIs implemented in these countries spans a wide range.

NPI alignment with detected epidemiological changes is bimodal and significant
The significance of NPI alignment with detected epidemiological changes was tested through comparison with alignment rates when NPIs were assigned a random date. All types of NPIs measured in this study had significant rates of alignment with epidemiological changes when the zero-alignment mode was excluded (maximum P -value = 1.38e-10). The distributions of random NPI alignment were calculated by re-assigning random dates to NPIs by type and then finding alignment rates for n = 150 bootstrapping. Across NPIs, the rate of random NPI alignment with epidemiological change had a mean of 11.6% and a standard deviation of 3.64% (grey violin distributions, Figure 7 ). When analysed by country, nearly all NPI alignment rates were either higher or lower than the random date distributions (cyan circles, Figure 7 ). This indicates two modes of detected NPI alignment. Excluding the mode of zero NPI alignment, mean NPI alignment ranged from 50% for restrictions on internal movement to 33% for restrictions on gathering (black squares, Figure 7 ). Differences in alignment rates between NPI types were not significant. Potential differences in NPI alignment rates were confounded by synchronous implementation of NPIs, although there is some evidence to support the effect strength of workplace closing and stay-at-home requirements ( Appendix E ).

National-level results obscure subnational heterogeneity in epidemiological dynamics and surveillance
To investigate subnational heterogeneity, the same analyses as above were conducted but at the province level in South Africa. Figure 8 A shows substantial variability in provinces by both NPI alignment rate and by epidemiological change detection rate. In line with results from national-level data, the epidemiological change detection rate was not correlated with mean tests per 10 0 0 people. Due to reporting limitations, the NPIs here are national policies.
Three edge cases were selected from the scatter plot in Figure 8 A (Limpopo, Northern Cape and Western Cape) to compare time series of positivity, COVID-19-confirmed deaths and total estimated excess mortality ( Figure 8 B). The differences in the timing and trajectories of the time series illustrate strong subnational variability in underlying epidemiological dynamics that may be overlooked when time series are aggregated to the national level.
Variation among provinces in the difference in magnitude between excess mortality and COVID-19 deaths points to differences in their surveillance systems. Western Cape is the only province where the magnitude of excess deaths resembled that of COVID-19-confirmed deaths throughout the time series. In Northern Cape, the peak of excess deaths was approximately three times higher than the COVID-19-confirmed deaths, suggesting substantial underreporting.

Discussion
This study demonstrated a standardized and quantitative approach to the analysis of epidemiological surveillance time series that can be automated for improved interpretation and comparison across countries. The interpretation of epidemiological trajectories is more informative when cases are normalized by tests, highlighting the disadvantages of symptomatic testing for outbreak tracking and public health purposes. These findings align with literature emphasizing the importance of positivity and test sampling strategies ( Hilborne et al., 2020 ;Pearce et al., 2020 ). The finding of strong alignment of NPIs with epidemiological changes is consistent with existing literature on global NPI impacts ( Haug et al., 2020 ;Islam et al., 2020 ;Liu et al., 2020 ). When the analysis of change types are applied to evaluate the efficiency of national surveillance systems, Rwanda stands out as a country with a strong surveillance system, which is consistent with qualitative evaluation ( WHO Regional Office for Africa, 2020 ).
This approach substantially broadens the scope of previous analyses of COVID-19 surveillance data in LMICs. Statistical change detection methods were used on COVID-19 surveillance time series from 31 LMICs to differentiate epidemiological changes from changes related to stochasticity, data quality and non-epidemiological dynamics. This maximizes the insights gained from limited data, reduces erroneous interpretations of epidemiological time series, and enables quantitative comparisons of disease surveillance approaches. The epidemiological change detection rate was used as a proxy for surveillance system efficiency, and was shown to be not as strongly associated with testing rate as with open testing policies. Substantial variation was found in epidemiological and surveillance dynamics across countries and in the subnational analysis.
This analysis has limitations related to the data as well as the methods. Simultaneously, these data challenges are precisely the motivation for developing the methods: maximizing information with limited data. The data are potentially biased by unmeasured factors such as fluctuations in testing capacity and undocumented population sampling strategies over time, delays and temporal uncertainty due to reporting systems, and incentives for case-finding. Defining co-occurrence when working with imprecise time series is a challenge, partially mitigated by considering uncertainty bounds when defining change groups. Of course cooccurrence does not establish causality. In PELT change detection, the changes detected are influenced by the choice of the sparsity parameter. However, in a sensitivity analysis of the novel parameterization approach, Rwanda remained the leader in surveillance system performance, regardless of the parameterization choice.
Results from this analysis highlight that surveillance data must be used carefully to ensure proper programmatic responses. As a sufficient and less resource-intensive approximation of random sampling, open testing would enable better estimation of disease prevalence and examination of NPI impacts in geographies without reliable hospitalization data, death records or seroprevalence surveys. NPIs without epidemiological changes may indicate inefficacy of policies, but may also indicate shortfalls of surveillance systems, which undermines the ability of policy makers to make evidence-based decisions. The methods could be further developed and applied not just to COVID-19 but also to surveillance interpretation for other poorly measured diseases, enabling more informed decision-making and targeted improvements in surveillance systems.

Conflict of interest statement
None declared.

Funding
This publication is based on models and data analysis performed by the Institute for Disease Modeling at the Bill & Melinda Gates Foundation. The funder had no influence on the analysis or conclusions presented here.

Appendix A.1. Data and definitions
The case rate was defined as the number of individuals confirmed positive for the SARS-CoV-2 virus per population, regardless of symptoms. The testing rate per population was defined as the number of people tested (i.e. excluding duplicate confirmatory tests) divided by the population, regardless of the test outcome. To address the dependence of case rate on testing rate, case counts were normalized by the number of tests conducted, creating the alternate metric of test positivity rate.
For the purposes of comparing between countries and over time, 'mean testing policy' was defined as the average over time of the ordinal value representing the national testing policy. Thus, lower values represent more restricted testing over longer periods of time. Social distancing policies tracked in the dataset include the following: closing schools, closing workplaces, cancelling public events, restricting gathering sizes, closing public transport, stay-athome requirements, restricting in-country mobility, and restricting international travel.
Weekly cases, testing and death data were interpolated using a cubic spline. All daily cases, testing and death data were smoothed using a centred 7-day rolling average. Error bars on plots show standard error.

Appendix A.2. PELT change detection
The naive approach to generating an exact solution to time series segmentation is to test all possible solutions. For an unknown number of changes, this also requires testing a sufficiently large set of possible number of changes. The PELT change detection method was used to address these computational tractability issues.
PELT minimizes the sum of costs from a criterion function across time series segments while balancing model complexity by implementing a linear penalty function and change point pruning. At each iteration of cost minimization for a potential set of change points, time points that cannot be a global minima are removed from future consideration. The PELT method, developed with applications in genetics and finance in mind, is increasingly used for climate and epidemiological applications ( Killick et al., 2012 ;Sissoko et al., 2017 ;Ouedraogo et al., 2018 ).
To detect changes in slope of the epidemiological time series, the first derivative was used as input for the PELT algorithm. For detection of discrete step changes in policy time series, the data were fed directly into the change detection algorithm without taking a derivative. For all time series, the radial basis function kernel was used for the PELT detection algorithm.

Appendix A.3. PELT parameterization
To date, there is no established method for parameterizing the PELT change density penalty across time series when the number of changes is not known. One of the ways to choose penalty values across time series would be to unify the number of changes detected in each time series. This, however, imposes the assumption that all time series exhibit the same general change frequency and only the point in time of a change is unknown. This paper presents a novel approach for systematic parameterization when identifying an unknown number of changes in slope over many time series, as for this case with multiple epidemiological time series across countries. To accomplish this, change detection was conducted in a sweep over parameter space. The change points detected using a given value in parameter space slice the time series into segments, each of which is input into a linear regression. The standard error for each of those linear regressions is calculated and then averaged, weighted by segment length.
The mean standard error associated with each penalty value, when plotted over parameter space, is characterized by a series of plateaus that correspond to plateaus in the number of changes found with each penalty value ( Figure 9 , top row). Descending through penalty values in the penalty parameter space, the lowest penalty associated with each plateau is selected to represent that plateau.
Each time series is thus associated with a sparse set of penalty values, ordered from largest penalty (low change point density) to smallest penalty (high change point density). The penalty values are unique to each time series, but represent the same ordered progression of plateaus. To illustrate, change detection with different ranked penalties for South Africa and Bangladesh are shown in green in Figure 9 .
Penalty values for each unique time series can then be chosen based on their order in the ranked plateau list. This enables a principled approach to parameterization that creates change density parity across time series, allowing for the likelihood that some time series are characterized by more changes than others. Among all time series and countries in this analysis, the minimum number of plateaus detected was four. Therefore, the fourth penalty value was chosen for all time series.

Appendix A.4. Parameterization sensitivity analysis
A parameterization sensitivity analysis was conducted to evaluate the influence of penalty selection on the analysis results. Results of country ranking by epidemiological change detection rate were compared for different penalty plateau selections. Skipping penalty rank one for which no changes may be detected (see examples in Figure 9 ), it was found that, regardless of the penalty rank used, Rwanda appeared at the top of the list with the highest epidemiological change detection rate ( Figure 10 ).

Appendix B.1. Heuristic interpretation
Detected change points across cases, testing and positivity time series are combined into groups by temporal co-occurrence. These groups are then categorized by their constituent time series ( Figure 2 ). Dynamical interpretation of the constituent time series aids in the characterization of each change group category, as follows.

Category A: Single variable change
As positivity is defined according to the arithmetic relationship, Positivity = Cases/Tests , a change in any one of the variables should be accompanied by a change in at least one of the other variables. A single change in only one of the variables indicates that the change arises from issues in the data or noise. These single variable changes often occur early in the time series, when the numbers of cases and tests are smaller, signal-to-noise ratios are lower, and confidence intervals are larger.

Category B: Cases and tests change
Tests and cases move up or down together. What might look like a significant change in cases is associated with a change in testing, likely not a change in epidemiology. Factors affecting testing include testing capacity, care-seeking behaviour, and testing sampling policy. With this change category, the change in testing could be a change in capacity or care-seeking, but the lack of change in positivity indicates that testing is still sampling the same population in the same way, without changes in epidemiology.

Category C: Tests and positivity change
Positivity change is driven by testing change, not a change in cases. An increase or decrease in testing does not impact absolute numbers of detected cases, which suggests a change in test sampling. Dynamics that would produce this pattern include, for example, adding population with lower prevalence in the case of open testing, or limiting testing to a higher-prevalence population in the case of symptomatic testing. It is also possible, however, that a change in testing sampling masks a simultaneous change in epidemiology. In this situation, the change in testing would have to precisely offset the change in epidemiology to observe this category of change association. Category C is thus designated to likely indicate a non-epidemiological change.

Category D: Cases and positivity change
Positivity change is driven by a change in cases without a change in testing. This suggests a change in epidemiology, but the significance may be different under random vs symptomatic testing. Under random testing, this type of change arises only with a change in SARS-CoV-2 epidemiology. Under symptomatic testing, the restriction of sampling to COVID-like illness (CLI) means that a change in the epidemiology may be confounded by a change in CLI epidemiology. Note also that symptomatic testing captures changes in symptomatic SARS-CoV-2 (i.e. cases of COVID-19) alone. Another possible explanation for this combination of changes is a change in sampling without a change in the absolute number of tests. This might occur, for example, in a switch from symptomatic to open testing. For this reason, this change combination was categorized as likely instead of certainly epidemiological.

Category E: All three variables change
With a change in cases, tests and positivity, it remains difficult to disentangle epidemiological from non-epidemiological factors. Category E can be considered a combination of Categories C and D, and the testing and case changes may or may not be independent.
To capture all changes that may be epidemiological, Categories D and E were considered as epidemiological changes, and Categories A, B, and C were categorized as non-epidemiological changes.
A principal component analysis (PCA) supporting the separability of change categories is detailed in Appendix B.2 .

Appendix B.2. PCA analysis of change categories
In addition to the dynamical interpretation of constituent time series ( Appendix B.1 ), the separability of change categories is shown with PCA. The surveillance results of different countries are  quantitatively characterized by PCA of the relative frequency with which they detect different categories of changes. PCA establishes how categories do or do not represent axes of difference across countries.
Based on the curve of explained variance ratio by PCA components ( Figure 11 A), the first three PCA components were selected to examine factor loadings ( Figure 11 B). Each component is dominated by a single category, in PCA component order: Category D (epidemiological change); Category B (testing artifacts); and Category E (confounded). Each of these PCA components is anticorrelated with Category A (noise). These relationships between the different change categories are consistent with the dynamical interpretation. Figure 12 shows the frequencies of the change categories that dominate the factor loadings for all countries in the dataset.

Appendix C. Surveillance considerations
The considerations for three components of SARS-CoV-2 epidemiological surveillance -population, testing and their role in surveillance metrics -are laid out in basic terms below. The testing strategy of random testing with the surveillance metric of positivity is shown to be the combination that best represents SARS-CoV-2 prevalence. Here, the terminology of SARS-CoV-2 is used to include all asymptomatic and symptomatic infections. Population is composed of people with and without SARS-CoV-2. Of those with SARS-CoV-2, some are asymptomatic and some are symptomatic. Of those without SARS-CoV-2, some are nonsymptomatic, others have symptoms of non-CLI and some have CLI symptoms.
Relevant components of testing include eligibility for testing under a given testing framework, as well as testing rate and capacity. Under random sampling, the general population is eligible for testing; symptomatic testing restricts eligibility to people with CLI symptoms. Testing rate is a measure of tests conducted per total population, while testing capacity indicates the proportion of eligible individuals who are actually tested.
Detected cases as a surveillance metric is a function of the number of tests, the eligible testing pool, and the total cases within the testing pool. Positivity is defined as detected cases per tests conducted.
Applying these formulations to surveillance metrics, one can see that detected cases under symptomatic testing is not only a function of number of tests conducted, but also of the number of individuals exhibiting CLI symptoms. CLI, in turn, is a function of non-SARS-CoV-2 CLI and symptomatic SARS CoV-2.
Positivity under symptomatic testing is normalized for number of tests conducted, measuring not general prevalence in the population, but the portion of CLI that is symptomatic COVID-19. Metrics derived from symptomatic testing do not account for asymptomatic SARS-CoV-2 and are confounded by non-SARS-CoV-2 CLI.
As with symptomatic testing, detected cases under random testing are a function of number of tests. The sampling, however, is taken from the general population, and thus positivity under random testing is a metric that represents prevalence.
As testing rate approaches total eligible under symptomatic testing, cases detected approaches CLI COVID-19 cases. Note, however, that testing coverage (i.e. tests/eligible) is not only influenced by the number of tests processed, but also by reporting rate. Those who seek testing are a subset of those who would be eligible for testing.
Assuming capacity to test all eligible individuals and perfect reporting rates, symptomatic testing would still yield only the num-ber of symptomatic COVID-19 cases. For random testing, testing rate is equivalent to testing coverage, and case count depends on testing. The random testing positivity metric does not depend on testing rate, and captures both symptomatic and asymptomatic cases of COVID. The relationship shown empirically in Section 3.3, wherein increasingly open testing policies are associated with increasingly effective epidemiological change monitoring, supports the equation-based result that random testing is more suited to epidemiological surveillance.

Appendix D. Summary statistics
For the purposes of understanding the sensitivity of a given level of testing, the standard error for positivity (number of positive tests per total number of tests), is defined as follows: where n equals total number of tests, N equals population, and p equals the number of positive tests per the total number of tests. The corresponding margin of error equals one-half of the confidence interval, and when calculated at the 95% confidence level is as follows: ME ( 95 %) = 1 . 96 * SE Note that this formulation of confidence interval is not reliable when the number of tests is very small, or probabilities are very close to zero or one. Under the condition of true random testing, positivity is a direct measure of prevalence. At any given prevalence, the margin of error can be calculated for the number of tests administered and the total population. This calculation was carried out for all LMICs in the dataset. The margin of error was then normalized by the given prevalence rate. Based on these relationships, ME ( 95 %) /Prevalence is higher at lower prevalence. In other words, precise measurement becomes increasingly more difficult as prevalence decreases. Figure 13 illustrates aligned NPI co-occurrence. Correlation score is an indicator of how often a given type of aligned NPI is implemented simultaneously with another aligned NPI. Although alignment of an NPI with an epidemiological change can be established, in the case of co-occurrence of two or more aligned NPIs, it is challenging to separate possible effects among the NPI types. Nonetheless, low correlation score accompanied by high frequency Figure 13. Top: Correlation matrix of co-occurrence of non-pharmaceutical interventions (NPIs) aligned with epidemiological changes. Correlations are normalized along the y-axis: counts of co-occurring NPIs are divided by NPI counts on the diagonal. Bottom: Correlation score and frequency of aligned NPIs by type. Correlation score is the sum of the correlation matrix along the y-axis, normalized to 1. Normalized frequency is the count of aligned NPI by type normalized to 1. may be indicative of NPIs more likely to be the dominant forcing. This is the case with stay-at-home requirements and workplace closing (bottom plot of Figure 13 ). Conversely, high correlation associated with low frequency indicates NPIs that do not often align with epidemiological change independently from other aligned NPI types. The NPIs of cancelling public events and restrictions on internal movement are examples of this case.