Effectiveness assessment of non-pharmaceutical interventions: lessons learned from the COVID-19 pandemic

Effectiveness of non-pharmaceutical interventions (NPIs), such as school closures and stay-at-home orders, during the COVID-19 pandemic has been assessed in many studies. Such assessments can inform public health policies and contribute to evidence-based choices of NPIs during subsequent waves or future epidemics. However, methodological issues and no standardised assessment practices have restricted the practical value of the existing evidence. Here, we present and discuss lessons learned from the COVID-19 pandemic and make recommendations for standardising and improving assessment, data collection, and modelling. These recommendations could contribute to reliable and policy-relevant assessments of the effectiveness of NPIs during future epidemics.


Introduction
During the COVID-19 pandemic, governments worldwide have implemented non-pharmaceutical interventions (NPIs), such as school closures and stay-at-home orders, to control the spread of SARS-CoV-2. Many studies have assessed the effects of these NPIs on disease dynamics and health outcomes using empirical data (figure 1). 1,3-9 Timely, reliable, and consistent results from such studies can support evidence-based health policy. However, the practical value of these studies during the COVID-19 pandemic was limited by substantial methodological variation and challenges in synthesising results across studies. Effectiveness assessments did not follow a common framework or best practices, which resulted in many statistical and epidemiological models, incomparable NPI definitions, and unsuitable measures of effectiveness. 1 These factors have made it difficult to assess the validity of individual studies or to synthesise their evidence. To improve the applicability of future studies for decision making in public health, best practices regarding data and methods used to assess interventions should be established.
In this Viewpoint, we comment on methodological aspects of data-driven effectiveness assessments for NPIs implemented during the COVID-19 pandemic. Based on a review of the methodologies 1 and learnings from our own research, 3-5,10-13 we discuss considerations regarding the assessment approach, data collection and reporting, and modelling approaches to avoid common sources of bias and to improve the comparability and policyrelevance of studies (figure 2). We also highlight possible challenges for future research.

Measures of effectiveness
The primary rationale of NPIs is to reduce person-toperson transmission by altering contact rates and patterns, or probability of infection upon contact. By contrast, about one in three studies during the first year of the COVID-19 pandemic quantified the effectiveness of NPIs only in terms of absolute observed outcomes, such as the number of avoided cases or deaths. 1 Although such outcomes can be of interest to health policy, they can be highly misleading. Two NPIs can have the same effect on transmission but different effects on observed outcomes, which depend on the timing of implementation and the timeframe of their evaluation. 14,15 For example, NPIs implemented at the start of an epidemic might appear less effective because the daily number of new infections avoided through reduced transmission was small. Moreover, NPIs do not immediately influence observed outcomes such as the number of cases or deaths, but rather have a lagged effect. This lag is due to stochastic delays in disease progression and case ascertainment that can distort effectiveness estimates. 4,12 To avoid biases and misinterpretation, we argue that the effectiveness of NPIs should primarily be measured by relative changes in person-to-person transmission. On a population level, such changes can be quantified via different epidemiological parameters (eg, transmission rates or effective reproduction numbers). 3,7 This approach distinguishes the general working principle of NPIs from their context-specific implemen tation, and accounts for the exponential growth dynamics and relevant time lags in an epidemic.
Epidemiological parameters quantifying transmission can be inferred from different epidemiological outcomes, including case and death counts, allowing direct comparison of effectiveness estimates based on different outcome data as a sensitivity check. 16 Different measures of transmission are closely related to each other (eg, growth rates can also be derived from model-based estimates of the effective reproduction number). 17 Therefore, by using transmission rates or reproduction numbers as a common effectiveness measure, results from different analyses could be compared without sacrificing their applicability to specific public health questions. For example, transmission dynamics can be translated into downstream health outcomes as part of real-time scenario modelling. 18 Outcomes of interest are calculated from expected changes in person-to-person transmission due to NPIs, by use of different assumptions about disease progression and reporting. This approach allows assessment of different implementation strategies,

Assessment of individual NPIs
In this Viewpoint, we consider NPIs as population-level, public health interventions implemented with the goal of reducing transmission via behavioural changes. Therefore, although an individual wearing a mask would not be considered an NPI, a general mandate by the government to wear masks in all public places would be considered an NPI. Evidence for the effectiveness of individual NPIs, such as school or business closures, is important as it allows policy makers to prioritise the most effective and cost-efficient NPIs, and to establish a combination of NPIs sufficient to control the spread of an epidemic. However, because multiple NPIs are often implemented on the same day or in close succession, disentangling the effectiveness of individual NPIs in a single population is difficult. Consequently, most empirical studies during the COVID-19 pandemic were restricted to assessing specific combinations of multiple NPIs (eg, a lockdown comprising school closures, business closures, and gathering bans). 1 Nevertheless, if the timing and composition of bundles of NPIs varies between populations, studies can still generate insights into the effectiveness of individual NPIs by jointly analysing multiple populations with similar epidemiological characteristics. 3-6

Variation in NPI effectiveness
The effectiveness of NPIs during the COVID-19 pandemic was often assessed within single populations. 1 However, the same intervention might not be equally effective across populations with different demographic and economic characteristics, 13 or the effectiveness might vary across populations depending on which other NPIs are already implemented there. 19 Empirical studies should account for these factors by analysing data from multiple Viewpoint populations and quantifying not only the average but also the variation in NPI effectiveness between populations. Furthermore, estimates for the effectiveness of NPIs might change between epidemic waves because of changes to the epidemiology of the pathogen (eg, shorter generation interval), human behaviour (eg, adherence to interventions), or protective measures (eg, mask wearing or air filters).

Reporting of outcome data
Studies assessing the effectiveness of NPIs during the COVID-19 pandemic have primarily relied on epidemiological count data reported by public health authorities, such as confirmed cases, hospitalisations, and deaths. This population-level data from traditional surveillance will probably remain an important pillar during future epidemics albeit complemented with data from household surveys, and from environmental, genomic, and digital surveillance. 8,9,20,21 Although these types of data all have their individual strengths and limitations, their usefulness in the context of NPI effectiveness assessment will strongly depend on the consistency of reporting between populations and over time. For example, irregularities in ascertainment over time might interfere with trends in epidemiological outcomes that would otherwise be attributed to transmission dynamics. Public health authorities should report such irregularities by sharing meta-data 22 (eg, about changes in case definitions, testing schemes, and reporting delays). Moreover, public data providers mostly aggregated confirmed case counts by date of report, 23,24 but not by date of confirmation, testing, or symptom onset, as such information was not consistently provided by public health authorities. Without such information, researchers have to account for unknown and potentially country-specific reporting delays when estimating NPI effectiveness using intermediate outcomes related to time of infection.

Collection and categorisation of intervention data
Governments worldwide implemented different NPIs at varying times. Intervention data specify when, where, and which NPIs were implemented. To assess the effectiveness of a set of similar NPIs across populations, a systematic categorisation of NPIs is necessary. Several public databases have been developed for this purpose (eg, the Oxford COVID-19 Government Response Tracker 2 or the Complexity Science Hub COVID-19 Control Strategies List 25 ). However, coding often involves subjective decisions, 26 (eg, coding the bans of gatherings with a limit of 10 or 50 people). 3-5 Subjective coding decisions might also explain discrepancies between multiple public databases. 27 We argue that NPI databases should collect and provide raw intervention data (ie, comprehensive textual descriptions of each intervention, such as specific regulation, scope, date of announcement, and date of enforcement), accompanied by meta-information based on a common standard (eg, a hierarchical classification of NPI data into categories of increasing granularity). 25,28 For example, to record a ban of gatherings, raw intervention data should specify the exact limit on the number of people in a gathering. By keeping data collection separated from  Viewpoint the coding of interventions, researchers can apply different codes to the same raw data. Differently coded NPI data could be used to study sensitivity regarding NPI effectiveness. 5,6 Although such analyses could inform about the influence of subjective coding decisions, 26 few studies have considered differently coded NPI data. 1 We suggest several steps to ensure high quality of intervention data. First, data should be collected at the level at which decisions are made, which can be both at the national and subnational level. 1 Second, to ensure consistency in the reported dates of NPIs, a difference should be considered between the date when an NPI is announced and the date when it is mandated, as studies have shown that behavioural changes often preceded the mandated date of NPIs. 29,30 Third, high data quality can be obtained with measures such as independent double entry, 4,12 or by consulting local residents or native speakers in case of language ambiguities. 5 Finally, for modelling purposes, data quality can be more important than comprehensive coverage (ie, data should be collected from fewer popu lations when resources are scarce).

Modelling Comparison and validation of modelling approaches
Various models have been used to assess NPI effectiveness during the COVID-19 pandemic. 1 Semi-mechanistic or mechanistic models (eg, compartmental or renewal process-based transmission models) are required to infer transmission rates or reproduction numbers. The infection and ascertainment process can be modelled at different levels of complexity, including stochastic delays, multiple compartments, and population structure. 31 The effects of NPIs on transmission can be estimated in a separate step 6 or can be integrated into the mechanistic model. 3-5 These aspects leave a wide range of modelling choices and extensions, which should be verified through model validation and comparison. Within studies, model validation could adhere to model-specific workflows 32 or general workflows for data analysis (eg, validating the model with simulated data before analysis, 1,13,33 evaluating the fitted model during analysis, 1,3,4 and assessing model predictive accuracy on hold-out data post-analysis). 4, 19 An example of this kind of workflow can be the Bayesian workflow. 34 Although model validation can ensure reliable inference, comparing models between studies can inform about the added value of new models, or the importance of specific modelling choices and extensions. 5,19,35 Such comparisons require public access to data and code, 22 but they could be further facilitated by developing software packages specifically for assessing NPI effectiveness that can be used easily by other researchers. 36

Accounting for additional factors influencing outcomes
Observational studies measure changes in epidemiological outcomes after the implementation of NPIs. However, these outcomes might also be influenced by several other factors, including voluntary behavioural changes, 37,38 changes in pathogen characteristics, vaccination programmes, and additional interventions. If not accounted for, such factors can bias effectiveness estimates (eg, by confounding the relationship between NPIs and outcomes, or because unexplained changes in outcomes are wrongly attributed to NPIs). If these factors can be observed or represented through reasonable proxies, often, including them in the model is useful. However, adjusting for mediating factors that are also influenced by NPIs, such as human mobility, should be done with care, because including such factors as covariates can change the interpretation of NPI effects. To avoid bias from unmodelled factors, specific methodologies have been used such as synthetic controls, 39,40 or accounting for noise in transmission dynamics. 19 To examine further potential bias, studies have done sensitivity analyses with data for which simulated but realistic NPIs are added or previously observed NPIs are hidden from the model. 4,41 Although such approaches can improve the robustness of assessments, they will always rely on assumptions about how unobserved factors influence outcomes; therefore they cannot rule out all forms of bias.

Quantification of uncertainty
The quantification and reporting of uncertainty are important but often neglected when assessing NPI effectiveness. Many analyses during the COVID-19 pandemic provided no uncertainty quantification. 1 Various sources of uncertainty can be of relevance in the context of epidemiological modelling, including uncertainty from chance events in disease transmission, progression and reporting, uncertainty in epidemiological parameters and the correct coding of NPIs, and uncertainty regarding underlying model assumptions. 42 Accounting for uncertainty in epidemiological parameters is particularly important for newly emerging pathogens, for which knowledge about transmission and disease progression is still poor. For example, mis-specification of the generation interval distribution can bias estimates of the effective reproduction number. 43 Uncertainty can be quantified directly as part of a model, or indirectly by analysing sensitivity to varying inputs and assumptions. The uncertainty and sensitivity of estimates are relevant for decision makers and should be treated as an essential part of a research report. Some modelling approaches are more suited for thoroughly assessing uncertainty than others. Multistep approaches often first estimate an intermediate outcome from observed data and then estimate the effect of interventions on a point estimate of the intermediate outcome, thus only accounting for uncertainty in the second step. By contrast, single-step approaches combine both analyses in a single model and thereby offer a more complete quantification of uncertainty. Finally, the extent to which uncertainty was quantified should also be considered when comparing results from multiple, different analyses, as a naive weighing of evidence by www.thelancet.com/public-health Vol 8 April 2023 e315 Viewpoint reported uncertainty might increase the unexplained heterogeneity and lead to incorrect conclusions.

Conclusions
The public health response during an epidemic can be improved through evidence-based choices of NPIs. These choices require research into the benefits and societal costs of different NPIs. 44 In this Viewpoint, we have outlined requirements for assessing NPI effectiveness quickly and reliably with observational data (panel). To ensure the robustness of study results, best practices must allow for methodological diversity and ensure comparability across studies at the same time. To achieve these goals, studies require consistent reporting of epidemiological outcomes, careful collection and coding of intervention data, use of accepted and robust modelling frameworks, and standardised measurement and reporting of effectiveness (figure 2). If these prerequisites are not established in advance, researchers might face a trade-off between timeliness and reliability when assessing NPI effectiveness. Therefore, promotion of common standards and development of appropriate methodologies and user-friendly software before the next public health emergency is important.
Several challenges in assessment of NPI effectiveness remain. First, subgroup-specific effectiveness of NPIs is rarely assessed, 1 despite evidence that some population subgroups might disproportionally contribute to disease spread 45 or are unequally associated with epidemio logical outcomes. 46 Estimates of subgroup-specific NPI effectiveness could improve the modelling of downstream outcomes (eg, avoided deaths) and inform targeted policy, but corresponding assessments will require further methodological development and more detailed data. Second, it is important to understand how behaviour and exposure mechanisms mediate the effects of NPIs on transmission. 31 Studies about socioeconomic and individual risk factors for infection 47,48 can offer complementary evidence about NPI effectiveness, especially if combined with insights about individuals' behavioural response to NPIs. Individual-level insights remain important to understand differences in adherence and potential side-effects. For example, mask mandates could increase attendance at events, or school closures could increase remote working by parents. Studying such mechanisms and their role in curbing transmission might help to understand why and when specific NPIs are effective and how different NPIs interact with each other. Although some studies have assessed the effect of NPIs via changes in population-level mobility, 49,50 more detailed insights could be gained by the use of individuallevel data collected from surveys or contact tracing apps. Third, causal interpretations of NPI effectiveness rely on assumptions that might not be satisfied. In particular, effectiveness estimates of interventions might be biased by proactive population behaviour and spillover effects, 8,30,38 and the choice and timing of implementation of NPIs might be influenced by previous evidence and experiences, 51,52 introducing potentially unmeasured confounders. Finally, there are many subtle issues that require further investigation-for example holistic assessment of health policy measures such as travel restrictions, which benefit from a coordinated response across countries.
Observational studies will remain an important source of evidence for NPI effectiveness and could become a valuable component of real-time surveillance during future epidemics, or exceptional waves of endemic diseases. Early insights on the relative effectiveness of different NPIs could be shared across countries where effects are expected to be similar-for example, by collating estimates from different research groups via periodic reports or publicly available dashboards (similar to the COVID-19 forecast hub from the European Center for Disease Control). The successful integration of NPI effectiveness assessments into surveillance during the COVID-19 pandemic and beyond will depend on close collaboration among stakeholders involved in epidemic preparedness on issues regarding data collection, epidemic modelling, and decision making.

Contributors
AL, NB, and WV conceptualised this paper. AL and NB wrote the original draft. All authors wrote, reviewed, and edited the paper.

Panel: Recommendations
• Quantify non-pharmaceutical intervention (NPI) effectiveness via relative changes in person-to-person transmission (eg, transmission rates or effective reproduction numbers) • Exploit variation both over time and between populations to assess the effects of single NPIs rather than a combination of multiple NPIs • Take variation in NPI effectiveness between countries into account by analysing data from multiple populations • Strive for consistent epidemiological outcome data and provide details about the ascertainment process to account for inconsistencies between populations and over time • Separate collection and categorisation of intervention data to allow different categorisation to be applied to the same raw data (eg, as part of a sensitivity analysis)