Global evidence for ultraviolet radiation decreasing COVID-19 growth rates

Significance There is interest in whether COVID-19 cases respond to environmental conditions. If an effect is present, seasonal changes in local environmental conditions could alter the global spatial pattern of COVID-19 and inform local public health responses. Using a comprehensive global dataset of daily COVID-19 cases and local environmental conditions, we find that increased daily ultraviolet (UV) radiation lowers the cumulative daily growth rate of COVID-19 cases over the subsequent 2.5 wk. Although statistically significant, the implied influence of UV seasonality is modest relative to social distancing policies. Temperature and specific humidity cumulative effects are not statistically significant, and total COVID-19 seasonality remains to be established because of uncertainty in the net effects from seasonally varying environmental variables.


A.1 SEIR model
In this section we consider a set of simulations using the standard SEIR model. The SEIR model of infectious disease has ordinary di↵erential equations that relate susceptible (S), exposed (E), infectious (I), and recovered (R) (i.e., SEIR) compartments. We use the SEIR model to motivate and check the accuracy of our statistical approach.
We assume an SEIR model having constant rates of exposure ( ) and recovery ( ) but time-variable transmission ( (t)), The population represented in this model is compartmentalized among susceptible (S), exposed (E), infectious (I), and recovered (R). Throughout these simulations we let = 1 4.6 and = 1 5 with units of 1 day , following [1]. We let take an average 0.45 1 people day , which corresponds to an R 0 of 2.25, consistent with previous estimates [2,1]. 1 We consider a population of 1 million individuals over 100 days. While we do not observe E, I or R directly in the COVID-19 data, we do observe the number of confirmed positive cases.
We model the evolution of these confirmed cases (C) as, C 0 (t) = ⇣I(t), allowing a portion ⇣ of the infectious population to be tested each time period. We let ⇣ = 1 14 with units of 1 day following ref. [3]. 2 The dynamics of C are proportional to those of R, in that C = ⇣ R. Conceptually, this formulation allows for the possibility that a single patient tests positive multiple times, which is consistent with some reports. 3 We define the growth rate of C as C t = ln(C t ) ln(C t 1 ). This growth rate is the outcome of interest in this study because it is policy relevant, observable in the COVID-19 data, and not a↵ected by di↵erences in testing rates between regions. 4 Although many factors may influence the time-variable nature of transmission, (t), here we focus on weather as the single cause of changes in transmission.
To study how shocks to influence the growth rate of C, we first simulate the evolution of the disease in the SEIR model deterministically using the semi-implicit Euler method and analyze the impact of an idealized single-day perturbation to transmission (Fig. S12A). We let transmission vary over time with linear disturbances due to changes in weather, U : (t) = 0 + 1 U t . We parameterize U t to generate a top-hat perturbation, equal to zero except for a single day equaling one. This leads to a proportional change in equal to 1 that we let take the value of 0.05 (Fig. S12B). Relative to a control run with constant , the growth rate in I, I , is seen to rapidly rise and then undergo a quasi-exponential decay (Fig.   S12C,D). The response of the growth rate in C to changes in is lagged and smoothed relative to that of I. This perturbation in ensuing growth rates from a day's change in weather is what we seek to estimate in our empirical model. The delay between the perturbation of weather and ensuing changes in growth rates hightlights the need to model lagged e↵ects. The integration of this curve is the cumulative e↵ect of a single day's change in weather on the ensuing case growth rate. In the next experiment we test the ability of temporal distributed lag regression models to capture this cumulative e↵ect.
We use a stochastic version of the SEIR model to inform how a temporal distributed lag regression model captures the delayed e↵ects of weather-induced shocks to transmission on the case growth rate (Fig. S13A) [46]. In this experiment, the transmission rate is again represented as a linear function of U , but U t is now parameterized as the sum of a sinusoid in t and Gaussian noise. This allows transmission to vary over time ( Fig. S13B). Similar results are found prescribing U t to evolve following an autoregressive moving average model with a Gaussian innovation distribution. Pooling simulated data from an ensemble of 500 runs of the stochastic SEIR model, with each 100-day run indexed by r, we estimate the e↵ect of U on I and C (Fig. S13C) using a distributed lag regression model ( C r,t = c 0 + P 17 l=`↵`U r,t `+ ✏ r,t ). Here, c o in an intercept and ✏ r,t the error term. When estimating the regression model we use simulated data from t = 18, the earliest time point that has 17 lags of weather, to t = 45, to avoid the influence of depletion of S. The estimated lagged e↵ects of U on C and I (Fig. S13D,E) represent the estimated e↵ect of changing a single day's weather on the ensuing growth rate. The similarity in magnitude and structure of these statistically estimated e↵ects to the dynamic response of C and I obtained in the idealized single-day pulse experiment using the deterministic model (Fig. S12C,D) gives confidence in our application of the lagged regression model to COVID-19 case data.
We test the ability of the distributed lag regression model to capture the cumulative e↵ect more formally by estimating the regression model and its associated cumulative e↵ect on 100 di↵erent simulated datasets, each containing 500 runs of the stochastic SEIR model. We calculate the cumulative e↵ect for each regression model by summing over the 17 lagged daily coe cients, and we calculate the cumulative e↵ect for the deterministic model by integrating the simulated perturbation in C over the 17 days following the perturbation in . The cumulative e↵ect estimated by the distributed lag regression model closely corresponds to the cumulative e↵ect simulated in the idealized deterministic model experiment. Specifically, the average of the estimated cumulative e↵ect is within 8% of the value from the idealized experiment (Fig. S13F), with the di↵erences possibly related to the deterministic versus stochastic integration performed in the two di↵erent simulations. 5 The ability of the distributed lag regression model to closely capture e↵ects in simulated data motivates its use in observed COVID-19 case data.
In a final experiment we test the sensitivity of the estimated cumulative e↵ect to di↵erent frequencies of changes in U . Running the stochastic SEIR simulation and temporal distributed lag regression model with weather-induced perturbations of as in Fig. S13A, but with frequencies varying from 0.02 to 0.3 1 days results in only small changes in the cumulative e↵ect (Fig. S13G). This stability supports the application of our empirically determined cumulative e↵ect, identified o↵ of daily changes in weather and case growth rates, to simulate longer-term seasonal e↵ects.
Though we hypothesize that the influence of weather on COVID-19 growth rates estimated in this study 5 The influence of a perturbation of on ensuing growth rates undergoes quasi-exponential decay. Thus, long lags are required to capture the entire cumulative e↵ect. Using 17 lags captures 84% of the entire cumulative e↵ect using this set of model parameters, whereas using 25 lags captures 97% of the cumulative e↵ect. When comparing the cumulative e↵ect estimated by the regression model to that from the idealized pulse experiment we use 17 lags in each. We note that when the estimated delay period is extended to 25 days the di↵erence between the average cumulative e↵ect estimated by the lag regression model and the idealized pulse experiment is reduced from 8% to 3%. The number of lags used, however, involves a trade-o↵ between the bias and variance of the estimated cumulative e↵ect, whereby increasing the number of lags reduces the bias but increases the variance of the estimated e↵ect. The specification of a lag length of 17 days for the empirical model is based on existing empirical estimates of the delay interval between exposure and case confirmation (Section A.2) and is supported by our model simulations to capture the preponderance of the e↵ect. We show robustness of the cumulative e↵ect to changes in lag length in Fig. S7. are due to changes in transmission, we do not estimate the e↵ect of weather on transmission directly because doing so would require additional assumptions. Changes in transmission have a complex relationship with changes in the growth rate. Even in the simple case of the model being in equilibrium in the disease-free limit (i.e. approximately all of the population being susceptible) the populations of E, I, and C grow at an asymptotic rate equal to = ( + )+ p ( ) 2 +4 2 [6]. Solving for , di↵erentiating and re-arranging gives: @ @ = 1 ⌘ where ⌘ = (1 + + 2 ) > 1. This shows that an equilibrium change in causes a damped equilibrium change in , and that the degree of damping is a function of the model parameters.
The dependence of @ @ on other model parameters, which are imprecisely known, complicates estimation of the impact of weather on transmission even in this relatively simple equilibrium setting. Stochastic changes in over time due to changes in weather introduce further complexity to the relationship between transmission and the case growth rate. Thus, we estimate the impact of weather directly on the growth rate and leave estimation of weather e↵ects on transmission to future work. We are unaware of an analytical solution for the growth rate of I, or C, under a time-variable , and note that obtaining such a solution would be useful for purposes of optimizing inferences from changes in the growth rate.
Note that the shapes of the lagged responses seen in both the stochastic and deterministic models are determined by the assumptions of the SEIR model (e.g. an exponentially distributed infectious period).
Given that the dynamics of COVID-19 is unlikely to satisfy these assumptions, we should not expect lagged responses recovered from the data to precisely match simulated responses. Further, while the primary mechanism through which weather is thought to impact COVID-19 growth is through changes in transmission, which motivates these simulations, it is possible that weather also impacts the testing rate, recovery rate, or incubation period. Thus, the estimated impacts of weather on C should be interpreted as the combined e↵ect of potentially multiple channels -both biological and social.

A.2 Statistical model
The SEIR simulations in Section A.1 suggest that a distributed lag regression model can capture delayed e↵ects of weather-induced shocks to transmission on the COVID-19 growth rate. Implementing such a statistical model on actual data, however, requires additional consideration of the various potential confounding factors that can influence the true data generating process, but which were not included in the simple SEIR model we examine.
In general, there are four challenges to causal estimation in this setting. First, surface weather conditions vary systematically as one moves away from the equator towards higher latitude locations. For example, temperatures and specific humidity both decline at higher latitudes. Because similar latitude-dependent gradients exist for other potentially relevant environmental conditions like natural disaster exposure and socio-economic indicators like GDP, a cross-sectional analysis of local mean climate conditions and COVID-19 infection rates may be biased by such confounding factors. Second, for a given location, environmental conditions generally trend over the course of a calendar year. Because COVID-19 infection rates trend as well, such temporal dependence may also confound empirically estimated weather e↵ects on COVID-19 with other gradually evolving determinants of infection. Third, many local environmental conditions are strongly correlated. These correlations would confound causal estimates if key variables are omitted from the analysis. Lastly, any convincing causal estimate must take into account the time delay between COVID-19 transmission and detection.
This study takes a quasi-experimental statistical approach that addresses such potentially confounding factors in order to isolate random variation across a set of environmental conditions: UV, temperature, humidity, and precipitation. This "reduced-form" empirical approach is agnostic regarding the mechanisms through which climate variables govern the growth rate of cases, but by providing plausibly causal estimates of the role that each plays in the evolution of the virus, allows one to make counterfactual simulations of future conditions under alternative environmental conditions. Furthermore, such estimates provide empirical grounding for the parameters of more process-based models like the SEIR model.
Specifically, we estimate a longitudinal (i.e. panel) regression model using daily confirmed COVID-19 cases from 173 countries from January 01, 2020 to April 10, 2020. Our outcome of interest is the growth rate of cumulative COVID-19 cases in administrative (i.e., national/subnational) unit i between days t and t 1, Because of the delay between initial COVID-19 exposure and confirmed detection (Section A.1), we model the growth rate in cumulative COVID-19 cases using the following distributed lag model: where for administrative unit i and day t, UV i,t `, T i,t `, H i,t `, and P i,t `a re population-weighted daily average UV (in kJ/m 2 hour), temperature (in degrees centigrade), specific humidity (in %), and precipitation (in mm), respectively, observed`days ago. In robustness checks (Fig. S6 We are interested in quantifying the total e↵ect of environmental exposure in a single period as it manifests over subsequent time periods. In a temporal-distributed lag model like Eq. S1, this total e↵ect is captured by the sum of lagged e↵ects for each weather variable, or the "cumulative e↵ect." To see this, observe that the e↵ect of, say, UV it on subsequent COVID-19 growth rates is: such that the total (or cumulative) e↵ect of period UV it on subsequent COVID-19 growth rates up to 17 days later is P L =0 ↵ UV . The estimated uncertainty in the cumulative e↵ect takes into account the variances 5 of each lagged e↵ect as well as their covariances, specifically: Importantly, in calculating the cumulative e↵ect, we include all estimated lagged e↵ects within the 17 day interval, including e↵ects that are imprecisely estimated. This is because with heterogeneity in delay intervals across individuals, one would expect population-level studies such as ours to detect population-weighted lagged e↵ects throughout the 17 day interval. As such, our approach must include such noisy estimates as they reflect existing uncertainties. We calculate the cumulative e↵ect and its standard error separately for each weather variable using the estimated coe cients and covariance matrix from the model in Eq. S1.
To isolate plausibly random variation in weather conditions [11,21], we include Z it , a vector of semiparametric controls. In our baseline specification, Z it includes a full set of national/subnational unit-specific dummies, which remove any time-invariant di↵erences in growth rates of COVID-19 cases and environmental variables across administrative units. These spatial "fixed e↵ects" address the concern that baseline population characteristics (e.g. economic activity, population density) may be correlated both with COVID-19 infection rates and with average weather conditions. Second, Z it includes day-specific dummies to remove any common global determinants of COVID-19 growth rates. These temporal fixed e↵ects account for global daily circumstances that may influence COVID-19 growth rates such as WHO's declaration of COVID-19 as a global pandemic. To account for local trends in both COVID-19 growth rates and weather during this period, Z it includes country-by-week dummies, which flexibly account for country-specific temporal trends and shocks in COVID-19 growth rates and weather. Importantly, these dummy variables capture gradually occurring local trends across the globe as COVID-19 evolves. The influence of adding this suite of controls on the residual variation in UV, temperature, and COVID-19 growth rates is shown visually for two selected regions in Fig. 2A. Finally, we cluster standard errors, ✏ it at the administrative level. This allows for datadriven heteroskedasticity and serial correlation of arbitrary form in the error terms of each administrative unit.
The results of several robustness checks are given in Table S1 wherein a suite of controls are alternatively examined. Examined controls include the number of days since the initial outbreak of COVID-19 in each location (col. 1), country-specific linear trends (col. 2), country-by-week fixed e↵ects (col. 3, and in combination with others in cols. 4-6), a placebo "lead" weather variable measuring future exposure (col. 4), timing of policies such as school closures (col. 5), stringency of the COVID-19 testing regime at national level (col. 6), increased spatial resolution of our week-specific dummies to include subnational administrative unit-by-week fixed e↵ects (col. 7), and use of country-by-day fixed e↵ects, as opposed to country-by-week, (col. 8), which requires dropping all data from countries without subnational COVID-19 records, as daily weather variables are collinear with these dummy variables.
To control for social distancing policies (Table S1, col. 5), which vary across space and time, we add to the regression model in Eq. S1 a dummy variable equal to 1 when any one of three policies are in place: school closures, work from home ordinances, and event cancellations (Section B.3). To control for changes over space and time in the degree of COVID-19 testing (Table S1, col. 6), we use national-level records from OxGCRT (Section B.3) that categorize each country's testing regime. Categories are: No testing policy (coded as 0), only testing those who have symptoms and meet specific criteria, such as being essential workers or coming into contact with a known case (coded as 1); testing of anyone showing COVID-19 symptoms (coded as 2); and open public testing, such as drive-through testing, available to asymptomatic people (coded as 3). 6 A variable indicating which regime a country falls into on any given day is added to the regression model shown in Eq. S1. Modelling this ordinal variable as four binary variables gives nearly identical results.
Finally, we estimate a Poisson Pseudo-Maximum Likelihood estimator, in place of the ordinary least squares regression shown in Eq. S1 (col. 9, and col. 5 of Fig. S5). We do so for two reasons: first, the distribution of new cases is very skewed; second, if climate conditions operate solely through the transmission parameter (Section A.1), changes in climatic conditions cannot lead to negative growth rate e↵ects, as transmission cannot be negative. Because the former concern is not empirically large (in Fig. S14, we show that our residuals from estimation of Eq. S1 are approximately normally distributed), and because climate variables may influence growth rates through transmission as well as other channels, such as behavior regarding testing, we include this model as a robustness check only. The estimating equation relates new cases realized between day t 1 and day t, denoted C it , to lagged climatic exposure as follows: We control for lagged cumulative cases C it 1 as new cases C it are proportional to the level of infected people in the population (Section A.1). All other variables are defined as in Eq. S1. While standard Poisson models impose that the first and second moments of the outcome be equal, we address this overdispersion issue by clustering standard errors at the administrative unit level. This adjustment relaxes the assumption of equal first and second moments by allowing arbitrary forms of within-administrative unit heteroskedasticity and serial correlation in the error term ✏ it [13].
Eq. S1 implicitly assumes a linear relationship between environmental conditions and COVID-19 growth rates. To explore potential nonlinearities in these relationships without requiring polynomial terms for every lagged weather variable in Eq. S1, we estimate an alternative model in which we impose that weather conditions have constant linear (Eq. S3) and quadratic (Eq. S4) e↵ects throughout the 17-day delay period: There does not appear to be strong nonlinearities in the UV, temperature, or specific humidity e↵ects.
Precipitation e↵ects appear to exhibit nonlinearities but this relationship is not statistically significant.
In Fig. 3A,C, we show cumulative e↵ects of lagged responses of COVID-19 to UV, temperature, and specific humidity. We additionally show heterogeneity in this cumulative e↵ect across policy regimes (purple diamonds) and duration of outbreak (green squares). The former coe cients are generated by estimating a version of Eq. S1 in which each weather variable lag is interacted with the corresponding lagged value of a policy dummy variable. This dummy variable is equal to 1 when any one of three policies are in place: school closures, work from home ordinances, or event cancellations (Section B.3). "Pre-policy" cumulative e↵ects are then computed using estimated lagged e↵ects of each weather variable when the policy dummy is set to 0; in contrast, "post-policy" cumulative e↵ects are computed using estimated coe cients when the policy dummy is set to 1. Similarly, to recover heterogeneity by duration of outbreak, we define a dummy variable equal to 1 when an observation for a given location and day occurs at least 30 days after the first recorded COVID-19 case within that population. This dummy variable is then interacted with each lagged weather variable in a regression otherwise identical to Eq. S1. Cumulative e↵ects for the first month of outbreak are computed using estimates of lagged weather variable e↵ects when the outbreak duration dummy is 0; "after first month" cumulative e↵ects are similarly computing by setting the outbreak duration dummy to 1.

A.3 Smooth fits to estimated lags
In Eq. S1, we non-parametrically estimate a set of lagged coe cients for each weather variable, allowing for arbitrary dynamic structure in the e↵ect of climatological conditions on subsequent COVID-19 growth rates.
While this approach is highly flexible, it is demanding on the data, leading to noisy estimates of the lag structure. Because the true lagged response of the COVID-19 growth rate to environmental factors is likely to be smooth over time (ection A.1), in Fig. 3B we show a smoothed fit to these estimated lag coe cients.
To do so, we fit a restricted cubic spline with four degrees of freedom to lag coe cients estimated in Eq. S1 and shown for the central estimate in col. 1 of Fig

A.4 Seasonal simulations
To conduct seasonal simulations, we calculate the daily seasonal climatology of UV, temperature, and specific humidity by averaging daily data from the ERA5 reanalysis product over the years 2015 to 2019 (Section B.4).
In Fig. 4B-C, we represent the monthly e↵ect of each climate variable on the predicted COVID-19 growth rate as the product of the cumulative e↵ect of each variable estimated in Eq. S1 and the average hourly weather over each calendar month. To capture di↵erential seasonality across time, we show in Fig. 4B-C the di↵erence between predicted growth rates under the climatology of January and under the climatology of June. In Fig. S11, we show the analogous di↵erence between June and December.
To compare the estimated influence of seasonality on COVID-19 growth rates to that of social distancing policies, we use estimates from ref. [16]. In this paper, the authors estimate the impact of a collection of social distancing policies on daily growth rates in confirmed COVID-19 cases (i.e., the authors use the same outcome variable as used throughout this analysis) across six countries: Iran, China, South Korea, United States, Italy, and France. We use the authors' estimates of the impact of all social distancing policies combined ( Fig. 2b in ref. [16]); omitting the impact of policies in China during the first week of lockdown (which is statistically insignificant), e↵ect sizes range from -0.2 to -0.49 across countries. These magnitudes 8 imply that the imposition of social distancing policies lowers daily COVID-19 growth rates by 20 to 49 percentage points. In comparison, the largest influence of seasonality that we recover for each region of the the period between January 01, 2020 to April 10, 2020. 6 We omit observations from the Diamond Princess cruise ship, due to uncertain weather exposure of the passengers. Table S2 describes the characteristics of and sources for the COVID-19 case data we collected and compiled at the subnational level. Below, we provide some additional detail regarding data cleaning and manipulation for each individual country.

B.1.2 Subnational COVID-19 data
In most countries, we directly obtain subnational reports of the daily number of newly confirmed COVID-19 cases. To compute cumulative case counts at the daily level, we then compute cumulative sums for each subnational unit. When only cumulative COVID-19 cases are available on a daily basis, we take first di↵erences in the time series for each subnational unit to obtain the number of new cases detected on each day. If not mentioned otherwise, we assume that missing values after the start of the epidemic in a given subnational unit correspond to zero new cases. Because we obtain subnational case data from ref. [16] for Iran and China, we follow their imputation method for addressing missing data in these two countries; details of this method are described by the authors. 7 In many countries, additional data cleaning was required to accurately and consistently match new cases to the day on which they were detected, as opposed to the day on which they were reported. Harmonizing the data in this way reduces measurement error when estimating a common lagged response across the pooled sample. To do so, we track the dates and hours of the day on which new cases were released; when new cases are obtained from morning reports (before noon), we assign cases to the previous calendar day. Details on such corrections are presented below for each country. We compare our compiled subnational COVID-19 case data with case data reported at national level by John Hopkins University (JHU) and by the European Center for Disease Prevention and Control (ECDC) (Fig. S1). 8 Austria (1st administrative level) See Table S2 for details. Because no alternative archived reports are available for Austria, we verify our data against data stored in the GitHub public repository "covid-19eu-data," which provides time series for COVID-19 cases in European countries based on the scraping of o cial reports. 9 Our figures correspond to the o cial afternoon reports.
Belgium (1st administrative level) See Table S2 for details. We append data from two versions of the Wikipedia article "2020 coronavirus pandemic in Belgium." The current Wikipedia page (as of April 13, 2020) provides data starting on March 1, 2020. Data from January 30 to March 1, 2020 were webscraped from an earlier version of the same article (accessed on April 6, 2020). The distribution of cumulative cases on March 1, 2020 in the current article matches those from the previously collected time series. We drop data for April 7 and April 8, 2020, as we detect a discontinuous drop in new cases and increase in missing values. Our numbers have been verified against Sciensano data 10 for the days covered by both sources.
Brazil (1st administrative level) See Table S2 for details. For São Paulo, we add an additional case to the cumulative case count for February 25, 2020, based on newspaper reporting that a single case was already present. 11 We confirm that our data match the o cial source.
Chile (1st administrative level) See Table S2 for details. As stated in the daily reports, the information provided in o cial publications document cases reported on the previous day. In order to associate new cases with the date of detection and not the date of announcement, we correct the data webscraped from Wikipedia by lagging each date by one.
France (1st administrative level) See Table S2 for details. France overseas territories have been removed from the analysis due to the low number of cases at the time of data collection (116 cases distributed over 7 territories on March 25, 2020). On March 25, 2020, the French Public Health Agencies stopped publishing COVID-19 cases data at the regional level. Because the cumulative number of cases proposed by Wikipedia after this date is systematically below o cial figures at the national level and because the corresponding sources are not verifiable, we retain data only until March 25, 2020. It was not possible to find archived reports for French COVID-19 cases at the regional level for data verification purposes. We thus compare our data against the time series o↵ered on the open platform for French public data. 12 The number of cases reported in both datasets are very similar.
Germany (1st administrative level) See Table S2 for details. While the Robert Koch Institute (RKI) publishes case data for COVID-19, these data do not exist prior to March 3, 2020. We therefore rely on webscraped data from Wikipedia, which is obtained from newspaper articles, and validate these data against those available from RKI. Between March 4 and March 10, 2020, the Wikipedia figures do not match the Robert Koch Institute (RKI) reports. Hence, we recode manually the series for these days, using the o cial data from RKI. Due to the relative novelty of the epidemic at that time, some reports are inconsistent with the figures presented in preceding reports. In case of such inconsistencies, we consider the most recent report as the most reliable one and correct our number of cases accordingly.
On March 17, 2020, the RKI stopped updating its data manually and switched to an automated process based on the data electronically transmitted up to 11:00pm on the previous day. 13 After this date, we correct the date in our data by lagging records by one day, in order to retrieve the accurate day of detection. For Our panel data on cumulative cases begins with 14 cases in Bavaria on February 24, 2020. As it is unlikely that these 14 cases appeared all at once, we set the initial value of our new cases series as missing.
Iran (1st administrative level) See Table S2 for details. The number of new cases for all regions on March 2 and March 3, 2020, are missing, due to an absence of reporting. These missing values have been imputed following the method implemented by ref. [16], who used and verified the same source of data.
Netherlands (1st administrative level) See Table S2 for details. The table we obtain from Wikipedia associates the number of new confirmed cases with the day on which they were first announced. As o cial reports for the Netherlands are published in the morning, we correct the date by lagging reported cases by one day, relative to that provided by Wikipedia.
Portugal (1st administrative level) See Table S2 for details. O cial reports until March 9, 2020, were published in the late afternoon. On March 10, however, the Directorate-General of Health began to publish morning reports compiling the number of total confirmed cases up to midnight on the previous day.
As the Wikipedia article we webscrape does not consider this change, we correct the date for each day after  Table S2 for details. The time at which o cial counts have been released changes over the sample period. Until March 1, 2020, updates to case records often occur twice per day. The confirmed new cases announced in each report are those that have been detected since the last report: the new cases announced in the afternoon have thus been detected within the day, since the morning count. From January 30 to March 1, we group the morning count with the afternoon count of the previous day, to get a detection period covering 9:00am on the previous day to 9:00am on the current day. On these dates, the date of the afternoon count has been kept in order to match the day of detection. On March 1, we sum the cases obtained from the afternoon report to the new cases extracted from the evening report, released at midnight. From March 2 forward, the Korean Center for Disease Control (KCDC) publishes morning reports containing information for the previous day, from 00:01am to 11:59pm. As the Wikipedia article correctly considers this change, no change has been made on the date of new cases for March 2 onward.
Spain (1st administrative level) See Table S2 for details. Spanish reports are published in the morning and contain information about the previous day. We thus correct the date in the data obtained in a public GitHub repository by lagging case counts by one day in order to accurately recover the day of detection.
The number of new cases for Ceuta and Melilla have been summed to match the spatial shapefile we use for aggregating gridded climate data.
Sweden (1st administrative level) See Table S2 for details. Swedish o cial data regarding the COVID-19 epidemic are updated daily at 11:30am. To match the day of detection, we lag the case count reported by Wikipedia by one day. The first COVID-19 case in Sweden was observed on February 3, 2020, in the Jönköpping region. As no additional cases were detected during the three weeks following the occurrence of this first case, we drop it from our continuous confirmed new cases series, which begin on February 25, 2020. However, we keep it when computing the number of cumulative cases. United Kingdom (1st administrative level) See Table S2 for details. Because there were about 1700 cases that had not been precisely located within England at the time of initial data collection (April 6, 2020), we aggregated case data to the level of England, Wales, Scotland, and Northern Ireland, instead of using National Health System (NHS) regions. In the United Kingdom, confirmed new and cumulative COVID-19 cases are announced in the morning. We thus lag cases reported by Wikipedia by one day to accurately reflect date of detection. Since the date of our initial data collection, Public Health England has published complete time series at the NHS region level and at the county level. We checked our data against these series and verified that they were nearly identical.
China (2nd administrative level) See Table S2 for details. We drop 29 cities that could not be merged with climate data, based on publicly available geographic shape files. It was not possible to check the time at which city reports were issued. As a result, we consider the date of announcement as the date of detection. Missing data have been imputed following the interpolation method performed by ref. [16].
Italy (2nd administrative level) See Table S2 for details. The number of cases is updated daily at the end of the afternoon. Our data are almost identical to those obtained at national level from JHU, although JHU data display a break on March 12 due a to a delay in JHU updates, an issue reported for several countries in this dataset. 15 United States (2nd administrative level) See Table S2 for details. For the United States, all cases are counted on the date they are first announced, and cases are located at the place where they are treated.
Although the New York Times mostly uses the o cial counties as the unit of analysis, a few exceptions are worth mentioning: 16 • The five boroughs of New York City have been gathered under the label "New York City"; • The COVID-19 cases for Cass (MO), Clay (MO), Jackson (MO) and Platte (MO) counties are exclusive of the cases detected in Kansas City, shown on their own. We drop observations under the label "Kansas City, Missouri," which does not correspond to any o cial county; • All cases for Chicago are reported within Cook County (IL).
We download all the data from the New York Times repository, and keep only the cumulative cases, computing new cases using first di↵erences. Because some county-level series start with strictly positive numbers (up to 37 cases on the first day), we define the first observation of each new cases series as missing, but keep this number in our cumulative cases series.

B.2 Population data
Our main outcome variable is the first di↵erence in the natural logarithm of daily cumulative cases per 1 million people. At the national level, we use the country-level population in 2018 (the most recent year available) from the World Bank's World Development Indicators. 17 No homogeneous source of data have been found at the subnational level. We therefore obtain the most recent data available from each country's national o ce of statistics. Detailed information on each source can be found in Table S3.

B.3 Policy and COVID-19 testing data
We collect data on the intensity of COVID-19 testing and on social distancing policies from ref. [16] and the Oxford COVID-19 Government Response Tracker (OxCGRT) [15], described below. Briefly, from these data we obtain -for each administrative unit in our analysis -a policy variable, which is equal to 1 if any policy that closes schools, closes workplaces, or cancels public events is implemented, and 0 otherwise. We also obtain an ordinal variable with four levels that describes the intensity of testing.
Social distancing policy data from ref. [16] The first set of variables we use has been compiled by ref. [16] in their study on the e↵ect of large-scale anti-contagion policies on the COVID-19 pandemic.
The authors collect policy data at the subnational scale for China (2nd administrative unit), France (1st administrative unit), Iran (1st administrative unit), Italy (2nd administrative unit) and the United States (1st administrative unit). We directly merge these policy data into our database for the corresponding dates.
For the United States, we match each county within a state to the state-level (i.e. 1st administrative level) data from ref. [16], as no county-level policy data are available.
We use three variables from this study, which match with variables within the OxCGRT dataset. They are defined in in ref. [16] as: 16 Detailed information are provided on the COVID-19 GitHub repository of the New York Times. 17 Available at: http://data.worldbank.org/data-catalog/world-development-indicators.
1. school closure: "A policy that closes school and other educational services in that area." 2. work from home: "A policy that requires people to work remotely. This policy may also include encouraging workers to take holiday/paid time o↵." 3. event cancel: "A policy that cancels a specific pre-scheduled large event (e.g. parade, sporting event, etc.). This is di↵erent from prohibiting all events over a certain size." All these variables are binary variables. They take the value 1 starting on the day a policy is implemented, and 0 if the policy is not implemented. Only policies that are legally enforced are considered here; optional policies and non-binding recommendations from governments are not included.
The Oxford COVID-19 Government Response Tracker (OxCGRT) The second set of policy variables we use contains data at the national level for over one hundred countries across the world. The original dataset has been compiled by a group of researchers a liated to the Blavatnik School of Government at Oxford [15]. 18 To match the data from ref. [16] we collect three variables from this database: 1. School closing: "Record closings of schools and universities".
These variables were initially coded as categorical variables taking the values: not implemented, optional and legally enforced. 19 We recode these variables as binary variables to match the format of policy data from ref. [16], setting the value to 1 if the policy is legally enforced and zero otherwise. Final data manipulation Both the the data from ref. [16] and the OxCGRT data have been merged with our epidemiological dataset using the corresponding dates and administrative levels, giving preference to subnational data when available. From these data we create a binary policy variable that takes the value of 1 if any policy closing schools, workplaces or public events was enacted and 0 otherwise.
(in J/m 2 hour), 2-meter temperature (in degrees centigrade), total 2-meter precipitation (in mm), and 1000 hPA specific humidity (in kg/kg). 2-meter and 1000 hPA roughly correspond to conditions near the earth's surface. In robustness checks (Fig. S6), we include relative humidity, which we sample at 1000 hPA. We average UV, temperature, and specific humidity across hours in the day to obtain daily average measures, while we sum precipitation across hours in the day to obtain daily total precipitation. We link gridded weather data to administrative-level COVID-19 cases by aggregating grid cell information over administrative (e.g. country, province, or county) boundaries. To capture climatic conditions reflective of population exposure, we average across grid cells weighting by the cross-sectional gridded distribution of population in 2011 from LandScan [18]. For example, administrative-level daily population-weighted average temperature is computed as T it = P g2i ! gi T gt , where g indicates grid cell, i indicates an administrative unit, and ! gi is the share of administrative unit i's population that falls within grid cell g.
To estimate projected seasonal conditions and their influence on COVID-19 transmission, we construct daily gridded UV radiation, 2-meter temperature, and 1000 hPA specific humidity from ERA5, as described above, over the last five years (2015-2019). Using the average conditions for each calendar day over the past five years as a proxy for expected seasonal variation through 2020 and into early 2021 (Fig. S4), we compute daily averages across all five years of daily temperatures at both grid cell level (Fig. 4C) and aggregated to latitudinal groups (Fig. 4A), in the latter case using the same aggregation method described above.  Figure S1: Comparison of assembled subnational data to JHU-CSSE and ECDC national-level data sources. To verify the quality of our assembled subnational COVID-19 case records, we show here the daily evolution of confirmed new cases at the national level across three distinct data sources. First, we aggregate our subnational records to the national level; these time series are shown in purple. Second, we show in red national time series from the publicly available JHU Center for Systems Science and Engineering (JHU) data. Finally, in green we show national time series for European countries provided by the European Center for Disease Prevention and Control (ECDC). In most cases, the largest di↵erence across datasets is due to the adjustments in dates we have made to accurately associate new cases to the date of their detection (Section B). Daily growth rate of confirmed cases (%) Figure S2: COVID-19 case growth rates in regions with subnational data. Here we aggregate confirmed subnational case data to the national level and plot the national COVID-19 growth rate over time from the regions where we have subnational data; subnational data from these regions comprise the majority of the sample observations used to estimate Eq. S1. Note that two missing values exist in Germany due to negative new case values, which we also drop from the analysis. Note further that in the analysis we use the subnational growth rates directly without aggregating. Figure S3: Correlation between daily environmental variables accounting for semi-parametric controls. Correlation between daily average UV (kJ/(m 2 hour)), average temperature ( C) , average humidity (%) and total precipitation (mm), after removing the semi-parametric controls in Eq. S1 (described in Section A.2). Linear fits are shown, with associated R 2 values.   Figure S6: Empirical estimates of the dynamic relationship between COVID-19 and local climatic conditions using specific and relative humidity. Each column of this figure shows the estimated cumulative e↵ect and dynamic response of the daily growth rate in confirmed COVID-19 cases to lagged 3-day average UV (gold), temperature (maroon), specific humidity (green), relative humidity (brown), and precipitation (blue). The left column shows our baseline specification (Fig. 3A,B). The right column is identical, except that the model is estimated using relative humidity (%) instead of specific humidity (kg/kg⇥100, or %). Note that these percentages represent conceptually di↵erent quantities. Relative humidity is the ratio of the partial pressure of water to the equilibrium vapor pressure, multiplied by 100, at a given temperature (i.e. what percent "full" of water is the air). Specific humidity gives the percent of an air parcel's total mass that is composed of water. The mean of relative humidity in our sample is 70% and the mean of specific humidity is 0.50%, which suggests that their estimated influences on the COVID-19 growth rate (i.e. the product of the estimated coe cients and changes in humidity) are of roughly similar magnitude.  Figure S7: Empirical estimates of the dynamic relationship between COVID-19 and local climatic conditions using di↵erent distributed lag lengths. Each column of this figure shows the estimated cumulative e↵ect and dynamic response of the daily growth rate in confirmed COVID-19 cases to lagged 3-day average UV (gold), temperature (maroon), specific humidity (green), and precipitation (blue) occurring up to 20 days prior. All coe cients in each column were estimated jointly in a statistical model leveraging a rich set of semi-parametric controls to isolate idiosyncratic variation in each weather variable (Section A.2). Point estimates are indicated by circles and 95% confidence intervals are indicated by vertical lines. The first row omits the 15-17 day lag, which is included in our baseline specification (Fig. 3A,B), the second row replicates our baseline specification, and the third row adds an additional 18-20 day lag.  Figure S9: Estimates of cumulative e↵ects of environmental variables on daily COVID-19 growth rates are insensitive to outliers. Daily growth rates in confirmed COVID-19 cases can be highly variable ( Fig. 2A). To ensure that our primary estimates are not overly influenced by individual geospatial units with high variance, here we show the results of a block jackknife sensitivity analysis in which the entire time series corresponding to each of 3,235 geospatial units is removed from the dataset, and the primary estimating equation (Eq. S1) is re-run. Each subfigure shows a histogram of the cumulative e↵ect (over 2.5 weeks, as reported in Fig. 3A) of UV, temperature, specific humidity, or precipitation, estimated using each of these 3,235 samples. Point estimates and corresponding 95% confidence intervals as computed in the main text using the full dataset (Fig. 3A) are shown with vertical dotted lines, demonstrating that primary estimates reported in the main text are robust to possible outliers.     Figure S12: Simulated idealized dynamic response of the growth rate of infectious and confirmed cases to perturbed transmission in a deterministic SEIR model. We simulate the evolution of COVID-19 in a SEIR model deterministically using the semi-implicit Euler method (A, Section A.1). We let transmission, , vary over time with linear disturbances due to changes in weather. In this idealized case, we generate a table-top perturbation in the weather, equal to zero except for a single day equaling one. In turn, this generates a day-long increase in (B), which creates lagged increases in the growth rate of infected, I , and growth rate of confirmed, C , people, relative to a control run with constant (C, D). This lagged response of C to a single day change in weather is what we seek to capture in the COVID-19 confirmed case data using a statistical model.  Figure S13: Recovering the dynamic influence of changes in transmission on the case growth rate by applying a temporal distributed lag regression model to simulated data from a stochastic SEIR model with time-varying transmission. We simulate the evolution of COVID-19 stochastically using a SEIR model (A, Section A.1). We let the weather forcing, U , which has a linear influence on transmission, , be the sum of a sinusoid in t and Gaussian noise (B). We examine how a series of weather-induced time-varying shocks to transmission impact the growth rate of infectious and confirmed populations, I and C (C). Pooling observations from an ensemble of 500 runs, we estimate the e↵ect of contemporaneous and lagged weather on I and C using a distributed lag regression model and recover a lagged response of these growth rates to weather-induced changes in transmission (D, E) similar in structure and magnitude to the idealized response from the deterministic experiment (Fig. S12C,D). Panel F shows the distribution of estimated cumulative e↵ects for C , which are the sum of lag coe cients like those shown in E, from 100 regression models trained on synthetic data. The vertical blue line shows the mean of these estimated cumulative e↵ects. The red line shows the cumulative e↵ect simulated in an idealized pulse experiment using a deterministic model (Fig. S12D). The agreement between the cumulative e↵ect estimated by the regression model and that in the idealized pulse experiment (error < 8%) motivates our application of the temporal lag regression model to COVID-19 data. Panel G shows the cumulative e↵ect for C estimated on data simulated using di↵ering frequencies of weather forcing.  Figure S14: Distribution of residuals in daily growth rates. We estimate a distributed lag regression model in which the outcome variable is the growth rate in cumulative COVID-19 cases (Eq. S1). Here, we show the distribution of residuals after estimation of Eq. S1 using our baseline specification, which includes administrative unit fixed e↵ects (i.e. dummy variables), country by week of year fixed e↵ects, and day of year fixed e↵ects (regression results shown in col. 3 of Table S1).

D Supporting Tables
(  Table S1: Empirical estimation of the relationship between COVID-19 and climatological variables. Columns (1)- (8) show estimates of the distributed lag regression model from Eq. (S1) using daily longitudinal data across a pooled sample of national and subnational data (Fig. 1). The outcome is the daily growth rate of cumulative confirmed cases for columns (1) through (8). In column (9), a Poisson distributed lag regression model is used (Eq. S2). All models include administrative unit (e.g. country, province, or county) and day of year fixed e↵ects, and all control for distributed lags in daily precipitation (in mm) and specific humidity (in %). Columns (1)-(8) include distinct semi-parametric and other controls: (1) a "fixed e↵ect" (dummy variable) for the number of days since the outbreak began; (2) linear countryspecific time trend; (3) country by week fixed e↵ects; (4) country by week fixed e↵ects, including leads of climate variables; (5) country by week fixed e↵ects, including controls for temporally and spatially-varying social distancing policy controls (Section B.3); (6) country by week fixed e↵ects, including a control for the stringency of COVID-19 testing at country level (Section B); (7) administrative unit (e.g. country, province, county) by week fixed e↵ects; (8) country by day fixed e↵ects. Standard errors clustered at the administrative unit level are in parentheses. P-values from two-sided t-tests with *** p<0.01, ** p<0.05, * p<0.  obtained from the Github repository of the Global Policy Lab: https://github.com/bolliger32/gpl-covid 6 The compiled data have been scraped from the Wikipedia article"2020 coronavirus pandemic in France", https://en.wikipedia.org/wiki/2020 coronavirus pandemic i n France. 7 Robert Koch Institute, https://www.rki.de/DE/Content/InfAZ/N/Neuartiges Coronavirus/Fallzahlen.html.
The data have been scraped from the Wikipedia article "2020 coronavirus pandemic in the United Kingdom", available at https://en.wikipedia.org/wiki/2020 coronavirus pandemic in the United Kingdom.