How do weather and climate change impact the COVID-19 pandemic? Evidence from the Chinese mainland

The COVID-19 pandemic continues to expand, while the relationship between weather conditions and the spread of the virus remains largely debatable. In this paper, we attempt to examine this question by employing a flexible econometric model coupled with fine-scaled hourly temperature variations and a rich set of covariates for 291 cities in the Chinese mainland. More importantly, we combine the baseline estimates with climate-change projections from 21 global climate models to understand the pandemic in different scenarios. We found a significant negative relationship between temperatures and caseload. A one-hour increase in temperatures from 25 °C to 28 °C tends to reduce daily cases by 15.1%, relative to such an increase from −2 °C to 1 °C. Our results also suggest an inverted U-shaped nonlinear relationship between relative humidity and confirmed cases. Despite the negative effects of heat, we found that rising temperatures induced by climate change are unlikely to contain a hypothesized pandemic in the future. In contrast, cases would tend to increase by 10.9% from 2040 to 2059 with a representative concentration pathway (RCP) of 4.5 and by 7.5% at an RCP of 8.5, relative to 2020, though reductions of 1.8% and 18.9% were projected for 2080–2099 for the same RCPs, respectively. These findings raise concerns that the pandemic could worsen under the climate-change framework.


Introduction
The world has been badly hit by the COVID-19 pandemic, with more than 42.7 million confirmed cases and over 1.1 million deaths as of October 25, 2020 (Johns Hopkins Coronavirus Resource Center 2020), and the number of cases is still growing. Numerous mitigation efforts have proven to be effective at curbing the transmission of the virus, such as travel restrictions, early identification, isolation of cases, etc (Hellewell et al 2020, Lai et al 2020, Prem et al 2020. On the other hand, the role of weather conditions (temperature and humidity) amid the ongoing pandemic remains unclear, yet has gained enormous attention.
Existing results on this topic are mixed, not only in terms of the limited lab experiments but also in fast-growing empirical studies. For instance, from the lab-experiment perspective, Chin et al (2020) found that the virus was highly stable at 4 • C, but sensitive to heat. Omer et al (2020) claimed that infectivity lasts for a shorter time at temperatures greater than 30 • C. However, Kratzel et al (2020) challenged the above lab-based results and found that higher temperatures (up to 30 • C) do not necessarily inactivate the virus. From an empirical perspective, Li et al (2020a) stated that higher temperatures reduced caseload. In contrast, Yao et al (2020) argued that there was no association between COVID-19 transmission and temperature. Baker et al (2020), on the other hand, concluded that the climate only drives modest changes to the pandemic's size. These results are informative and promising, however are subject to inherent drawbacks. For instance, controlled experimental conditions usually fail to adequately mimic those in the real world, whereas empirical studies may be plagued by confounding effects, omitted variable biases, and other internal and external validity threats.
In the work described in this paper, we constructed a rigorous econometric model and relied on small-scale variations in hourly temperatures and a rich set of covariates to identify the impacts of weather conditions on the COVID-19 pandemic in 291 cities in the Chinese mainland from January 24 to February 29, 2020. Specifically, we contribute to the existing literature from several perspectives.
First, we differentiate our work from the existing literature (Islam et al 2020, Ma et al 2020, Xie and Zhu 2020, Li et al 2020a, Wang et al 2020b by adopting hourly temperatures. These small-scale temperatures avoid the potential misidentifications that can happen in models with aggregated temperatures (Schlenker and Roberts 2009, Auffhammer et al 2013, Auffhammer 2018. Second, unlike the literature that focuses on simulating transmission dynamics by epidemiological models (Giordano et al 2020, Prem et al 2020, we develop a more flexible semiparametric Poisson model with fixed effect that is capable of taking full advantage of the hourly temperature variation. Besides, the fixed effect is designated to account for any time-invariant unobservable characteristics (i.e. population mix, proximity to the epicenter, inherent hospital capacities, and different living styles, etc.) that might confound the estimates. Third, we took account of city lockdowns, virus incubation periods, outdoor population movement, and changes in case-diagnosis criteria in our model. Existing studies either considered only a few of these potential impactors or completely omitted them from their models, which may have caused biased estimation results.
More importantly, we combined our baseline estimates with climate change projections from 21 state-of-the-art global climate models under Representative Concentration Pathways (RCP) 4.5 and RCP8.5, respectively. We aim to answer the question of how the confirmed cases would change for a hypothetical pandemic between 2040 and 2059 (middle of the century) and between 2080 and 2099 (end of the century), relative to 2020. Studies have suggested that climate change might contribute to the emergence and spread of various infectious diseases (Semenza and Menne 2009, Lindgren et al 2012, Altizer et al 2013. Besides, COVID-19 may resurge and spread in places where the pandemic was thought to be well under control. It is therefore insightful to examine the severity of a hypothetical pandemic in the framework of future climate change.

Data collection
Our data set consists of daily new confirmed cases, weather data, and population movement data in 291 cities in the Chinese mainland from January 24 to February 29, 2020. We chose this time window based on two considerations. First, Wuhan, the epicenter in China, was locked down on January 23. As we aimed to identify the relationship between weather conditions and the spread of the virus, migrations from Wuhan could have been a major threat to our identification. Thus, we focused on the period right after lockdown. We also performed corrections on the daily caseloads to further eliminate the impacts of travelers that had already traveled to those 291 cities before January 23 (see Methodology). Second, the period under study generally coincides with the two stages mainly experienced during the outbreak in China: a rapid increase phase and a stationary phase, as shown in figure S1 (available online at stacks.iop.org/ERL/16/014026/mmedia). From March onwards, new cases were mostly imported. For instance, total daily new domestic cases in those 251 cities remained below five for the whole of March.
It should also be noted that we excluded cities in Hubei province from our sample. There are several reasons for this restriction. First, as the epicenter in China, the healthcare system in Hubei was overwhelmed by the sheer number of patients who needed testing. Contracted patients may have recovered on their own or died before even being officially tested (Fang et al 2020). Therefore, the reported caseloads in Hubei may have missed a considerable number of cases. By contrast, the cases in cities outside Hubei were much fewer. We argue that the healthcare system was unlikely to miss a large proportion of cases. Second, from a statistical point of view, the sheer number of cases in Hubei would highly skew the overall case distribution in our sample, and thus reduce the statistical power of our regressions and make the estimations less reliable.

Confirmed case data.
The original city-level confirmed case data were collected from the provincial health commissions and prepared in a ready-to-use format by the Wind Economic database. We directly downloaded the case data from the database 9 (Wind 2020). It is noteworthy that the Wind Economic database did not perform any additional manipulation on the original case data. See supplementary figure S1 for the development of cases over time and figure S2 for the spatial distribution of cases.

Weather data.
Weather data, including temperature (in • C) and relative humidity (in %), were obtained from the China Meteorological Data Service Center (National Meteorological Information Center 2020). The original data were observations from over 300 monitoring stations distributed across the Chinese mainland. To get the hourly data, we extrapolated the daily minimum and maximum temperatures based on a location-dependent sine curve, see Luedeling (2020) for the technical details. It it worth noting that the extrapolation process was performed at the monitoring-station level, which preserved temperature variations between stations (Deschênes and Greenstone 2011).
Following that, we converted the station-level hourly temperatures to the city level. For those cities with no stations, we adopted the temperatures from the closest station 10 . For those cities with multiple stations, we simply took the average. Finally, we constructed 18 temperature bins with an interval of 3 • C from the small-scale hourly temperatures, as shown in figure 1(a). Moreover, figure 1(b) below presents significant spatial variations in temperature that greatly contributed to our model identification.

Population movement data.
Population movement data are represented by a mobility index, which is a normalized ratio of movement within 24 h by the city's residential population. This index is consistent across cities and time. We obtained the index from Baidu Migration, offered by Baidu, the largest Chinese search engine (Fang et al 2020). The data is based on real-time location records for every smartphone using the company's mapping app and thus ideally reflects population movement.
The company provides both intra-city and intercity mobility indices. We include the former as a covariate in our model to address the concern that the estimated temperature effects on cases could be confounded by intra-city population movement. On the other hand, we use the inter-city movement to correct 9 The Wind Economic database is a highly reputable commercial data provider. A commercial account is required to download data from this source. 10 There were 39 cities (13%) that did not have monitoring stations. For these cities, we adopted temperatures from the closest station to the city's centroid. In most cases, stations were available within a relatively short distance. For instance, the nearest station was within 5.5 km of the centroid, the longest distance was 63 km, while the mean distance was 37.5 km.
confirmed case data that were possibly 'contaminated' by travelers from Wuhan (the capital of province Hubei and the epicenter in China), which we will describe in detail in the modeling section below.

Climate change projection data.
We obtained the output of 21 global climate models from the Center for Climate Simulation of NASA (the National Aeronautics and Space Administration of the United States) (NASA 2020). Each of the models provides future projections of the daily minimum and maximum temperatures under RCP4.5 and RCP8.4, respectively. We extracted projections for January 24 to February 29 for the years 2040 to 2059 and 2080 to 2099 to match our research time window in 2020.
Note that the projection data are for a grid with a spatial resolution of 0.25 • by 0.25 • (25 km by 25 km). We converted the grid-based data to city-level data by averaging over the grid cells that overlapped each city, weighting them by the area of the grid cell falling inside the city. Finally, the projected daily temperatures were interpolated to an hourly basis following the same process as that used for the weather data.

The baseline regression model
We primarily employ a Poisson model with fixed effects to accommodate the daily confirmed case counts. The log transformation of our Poisson model takes the form below: where i indicates the 291 cities in our sample, t indicates the reporting dates between January 24 and February 29. We account for an incubation period of 6 d, as indicated by t − 6 11 on the right-hand side of equation (1). That is, we match the number of cases to the weather conditions on the date of infection instead of the date of reporting. The city-level fixed effects, w i , capture any unobservable, time-invariant citylevel characteristics that might confound the estimates (Deschênes and Greenstone 2011, Davis andGertler 2015, Li et al 2019). Case i,t represents city-level daily confirmed case counts. It is worth noting that we corrected the case counts reported in the first week of our study period (January 24-January 31) to eliminate the potential confounding effects of importers from Wuhan. This concern arises because several epidemiological studies found that the transmission of COVID-19 in cities 11 The estimated incubation period ranged from 2 d up to 24 d (Bai et al 2020, Guan et al 2020, Hoehl et al 2020, Li et al 2020b, Mcmichael et al 2020, Xiong et al 2020. As more and more confirmed cases were analyzed, the estimated average incubation period converged to 5-7 d. Thus, we chose 6 d as our baseline incubation period and performed robustness checks with 5 and 7 d, respectively. The results are presented in supplementary figure S5. to February 29, 2020. First, we drew the temperature bin histogram for each of the 291 cities. Second, we obtained the national histogram (figure 1(a)) by averaging the temperature bins across cities with their populations in 2019 12 acting as weights (the sum of the weights was normalized to one). In this manner, the total number of hours in the histogram (figure 1(a)) remained at 888 (24 h by 37 d). In (b), the spatial distribution of mean temperatures is shown from January 24 to February 29 across the 291 cities in the Chinese mainland.
outside Hubei province was highly related to travelers from Wuhan until late in January , Li et al 2020b though Wuhan was locked down from January 23. To correct this, we ran an auxiliary regression as shown in equation (2).
MirIn it denotes the summed travelers from Wuhan to city i during the last 6 d. ζ it is an error term that is independent of MirIn it by construction. We therefore replaced the daily confirmed case counts reported during from January 24 to January 31 with the residualζ it from the regression (2).
Temp ij,t−6 denotes the number of hours in temperature bin j in city i for day t − 6. Here, we dropped the bin for [−2,1] to avoid the perfect collinearity problem. In other words, coefficients in the remaining bins denote impacts on cases relative to the omitted bin. There is no golden rule for picking a reference bin in the literature. What matters is to interpret the results relative to the reference. The choice of reference point should not be a major concern.
Nonetheless, in practice, if a U-shaped (Davis and Gertler 2015, Li et al 2019) (or inverted U-shaped) curve is expected, the point at the bottom (top) of the curve is usually set as the reference point. Following this conventional wisdom, we ran an auxiliary Poisson regression with the daily temperature and its quadratic term and chose the point at the top of the curve (roughly at 0 • C) as our reference point, within the [−2,1] temperature bin). Rh i,t−6 and Rh2 i,t−6 indicate the linear and quadratic terms for the relative humidity.
Another concern is that the weather variables may have multicollinearity problems, for example, if the temperatures were related to humidity. However, this may not be a major threat to our estimations, given the robustness of our results (Wooldridge 2010). Nonetheless, we performed a multicollinearity test to check the degree to which weather variables were related to each other. The results are presented in supplementary table S1. The mean varianceinflation factor was 1.81, far below the threshold of 10 (Wooldridge 2013); thus, multicollinearity is not a primary issue.
Travel i,t−6 represents the intra-city population movement in city i on day t − 6. Test is a dummy variable, which is 0 before January 29 and 1 otherwise. We set up this dummy variable to account for the impacts of a significant change in the diagnostic criteria. On January 28, 2020, the Chinese government released the third edition of 'The Novel Coronavirus Pneumonia Prevention and Control Protocol,' in which the diagnostic criteria were relaxed by additionally including mild cases and asymptomatic infections in confirmed-case reporting and management, regardless of epidemiological history (The State Council of the People's Republic of China 2020).
ξ it is the stochastic error term. We clustered errors at the city level to allow for serial correlation and address variance restriction in Poisson models. Figure 2 shows the estimated coefficients and the corresponding 95% confidence intervals of the temperature bins. The tabulated regression results are provided in supplementary table S2.

Results of the baseline estimates
In figure 2, we observed an approximately linear negative relationship between temperatures and caseload. A similar linear relationship was found in studies on temperature bins and suicide rates (Burke et al 2018) and cognitive performance (Zivin et al 2020). In general, we found that lower temperatures tend to increase cases, whereas hot temperatures reduce them. The beneficial effects gradually fade out as temperatures increase. Starting from the 4 • C-7 • C temperature bin, the estimated coefficients become negative and the adverse effects are enhanced rapidly with rising temperatures. For instance, an additional hour at 4 • C-10 • C decreases the daily confirmed cases by roughly 3%, relative to −2 • C to 1 • C, whereas an additional hour at 25 • C-28 • C would reduce cases by 15.1%.
Our findings on the approximately linear relationship between temperature and caseload are consistent with those of Wang et al (2020a), and different from those of Wang et al (2020b) in which the authors detected a nonlinear relationship, though all three studies were conducted based on data in China. Nevertheless, they all show that high temperatures mitigate the pandemic. Importantly, the temperature bin setup in this paper allows us to further examine the heterogeneities, i.e. extreme temperatures (<−20 • C or >28 • C) have substantial effects (positive or negative) on daily confirmed cases, relative to the moderate impacts for milder temperatures.
The estimated coefficients of relative humidity and its quadratic term suggest an inverted U-shaped nonlinear relationship with daily confirmed cases (figure 3). The turning point was estimated to be 86%, which is a fairly high level of humidity, given that the mean and median humidities in our sample are 69% and 71%, respectively. Only 17.9% of the days were exposed to a relative humidity level higher than 86% during our research period (see the bar chart in figure 3). Nevertheless, our estimates are consistent with Ficetola and Rubolini (2020), in which they also suggested a turning point at a fairly high humidity level.
In addition to the impacts of weather conditions, we also observed the statistically significant positive effects of intra-city population movement on caseload. Specifically, a one unit increase in the mobility index tended to contribute to a growth of 14.5% in daily confirmed cases on average (see supplementary table S2). This finding is in line with epidemiological studies, which have shown evidence that travel restrictions can effectively mitigate the transmission of the virus (Hellewell et al 2020, Prem et al 2020. Finally, the change in the diagnostic criteria played an important role as well. The number of reported cases increased by an average of 80% after the introduction of the new diagnostic standard (see supplementary table S2). That meant more people were able to be screened and more cases identified. Consequently, isolation and contact tracing were able to be conducted earlier to contain the spread of the virus.

Impacts of climate change
In this section, we combined our baseline estimates with climate-change projections from 21 global climate models to examine the changes of caseload for a hypothetical pandemic in the middle of the century (2040)(2041)(2042)(2043)(2044)(2045)(2046)(2047)(2048)(2049)(2050)(2051)(2052)(2053)(2054)(2055)(2056)(2057)(2058)(2059) and at the end of the century (2080-2099) relative to 2020, under the RCP4.5 and RCP8.5 scenarios, respectively. Figure 4 depicts the changes in the temperature bins in the future under different scenarios. In line with the global warming expectation, we observed more frequent hot temperatures (over 10 • C) and fewer cold temperatures (<7 • C) in the future. While changes in temperatures lower than −8 • C and higher than 25 • C are moderate, changes in temperatures between −8 • C and 25 • C (except for temperatures in 7 • C-10 • C) are more significant. One possible avenue to explain this phenomenon is by considering how climate change governs weather distribution. Specifically, in the current climate, bin [7,10] has the highest probability in the distribution in figure 1(a). Climate change in the future is expected to change the distribution so that the incidence of  temperatures greater than [7,10] increases, and the incidence of low temperatures decreases, due to global warming (Xu et al 2018); however, [7,10] may still have the highest incidence, due to the subtlety of the changes.
The changes in caseload caused by climate change in the future were derived via two steps. We first multiplied the changes in the temperature bins with the associated coefficients from the baseline estimates (Deschênes and Greenstone 2011). The formula takes the form below.
where PctChange it indicates the projected percentage changes in caseload in a city i on day t from 24 January to 26 February. The estimated coefficients,β j , are associated with the j temperature bins from equation (1). ∆Temp ij,t−6 denotes the changes in the number of hours in the temperature bins. It should be noted that we assume the other impactors, such as relative humidity, population movement, etc., remain at their 2020 levels because the projected data on these variables are either not available (i.e. intra-city population movement) or less reliable (i.e. humidity) than the projections of temperature. The nationallevel percentage change in caseload is then calculated as the weighted mean of changes in all cities, where the value of a city was weighted by the average population from 2017 to 2019 in that city (Wind 2020). It is striking that we observed increases in confirmed case counts in the future despite the negative effects of rising temperatures induced by climate change ( figure 5). Specifically, for a hypothetical pandemic that occurs in the middle of the century (2040-2059), increases in the frequency of hot temperatures are unlikely to sufficiently contain the spread of the pandemic. Instead, the confirmed case counts tend to increase by 10.9% for RCP4.5 and by 7.2% with RCP8.5 from 2040 to 2059, relative to 2020. This is, in part, due to our adjustment for reality. To be specific, for those cities with projected percentage changes <−100%, we adjusted the change rate to −100% (the confirmed cases cannot be negative). Similarly, for those cities with zero confirmed cases but projected negative change rates, we adjusted the change rate to a zero rate.
On the other hand, projections for a pandemic at the end of the century tell different stories, for which the confirmed cases decrease by 1.8% and 18.9% in 2080-2099 for RCP4.5 and RCP8.5, respectively. Figure 6 depicts the spatial distribution of the percentage changes in the confirmed cases in China and table 1 summarizes the heterogeneity across cities. On one hand, our findings suggest a worse pandemic in 2040-2059, whereby 85% (243/287) and 77% (222/287) of cities are expected to experience increases in the confirmed cases for the RCP4.5 and RCP8.5 scenarios, respectively (table 1). On the other hand, in 2080-2099, fewer cities are projected to face increases in confirmed cases, and the majority of cities would have a reduction in cases of between −20% and 0% (table 1). In terms of geographical distribution, cities in the northern part of China tend to have higher rates of increase, relative to cities in the south. However, those cities with higher increase rates do not show a specific spatial pattern, which poses challenges for the development of more accurate spatial mitigation strategies.  Finally, since the pandemic is closely related to population, which is projected to drop in the future (United Nations 2015), we feel it is insightful to picture the pandemic in a population-reduction scenario. Although subjective, we speculate that it is still possible that the population reduction cannot fully offset the increases in caseload induced by climate change. For instance, temporary human mobility is expected to increase due to the benefits of infrastructure and economic development, which could significantly facilitate the spread of the virus. Moreover, the number of permanent environmental migrants, i.e. driven by adaptation to climate change, on the other hand, is expected to reach between 25 million and 1 billion by 2050 (Anon 2019, Boas et al 2019). Although the number is uncertain and debated, the sheer size of such a migration could trigger severe clusters of cases.

Discussion and conclusions
To further examine the accuracy of our preferred Poisson model, we performed cross-validation with alternative model specifications using the root-meansquared error (RMS) of out-of-sample projections (Schlenker and Roberts 2009). Each model was estimated 1000 times, and 85% of the sample size was randomly chosen. These estimates were then used to predict the daily confirmed case counts in the remaining 15% of the sample. The RMS was calculated from the predictions. The results are shown in supplementary figure S3. Our baseline model is preferable to linear models when fixed effects are included, with an accuracy improvement of a reduction of 150% to 205% in the RMS. Additionally, our baseline model has better performance, compared to the Poisson model with average temperature (an accuracy improvement of 27.6%).
We also performed a wide variety of robustness checks. For instance, to address the concern that human activities drop sharply after midnight, we performed a robustness check with temperature bins constructed from 06:00 to 22:00, and the estimated results were consistent with our baseline estimates (see supplementary figure S4 and table S2). We also performed robustness checks with incubation periods of 5 and 7 d, and the estimates were again robust and consistent (see supplementary figure S5). Moreover, our results were robust to absolute humidity 13 (see supplementary table S3) and the estimates for the temperature bins barely changed (see supplementary figure S6). Finally, to address the concern that a specific day's confirmed cases could have largely depended on cases reported previously, we performed a robustness check by introducing cumulative case counts for the previous six days to our baseline model, and the estimates were consistent and robust (see supplementary figure S7).
Similarly to those studies that focused on the impacts of climate change on mortality but obtained heterogeneous results in terms of the magnitudes for different regions (Deschênes andGreenstone 2011, Yu et al 2019), our results are probably not consistent with those of other countries, although they can still be used as a baseline or reference evidence. Notably, we have made an important contribution to the theoretical and empirical studies regarding COVID-19 and weather conditions. Through rigorous and elaborate modeling, we have depicted the response functions between hourly temperature and COVID-19. More importantly, our work guides the handling of such a pandemic in the future, when climate change will largely be inevitable if current policies continue. 13 Note that the weather data source does not provide absolute humidity. We used the daily mean temperature and relative humidity to derive the absolute humidity based on the Clausius-Clapeyron formula (Shaman and Lohn 2009).
The proposed modeling and projection framework can be extended to other regions to obtain more empirical evidence, especially in countries where current outbreaks are still severe (e.g. the United States).
In conclusion, our findings on weather conditions and the COVID-19 pandemic suggest that rising temperatures do mitigate the expansion of the pandemic to some degree. A one-hour increase in temperatures over 28 • C tends to reduce the daily confirmed cases by 23.6%, relative to an increase from −2 • C to 1 • C. On normal days, when the relative humidity is below 86%, an increase in the humidity would drive growth in the daily confirmed cases. Our study, in agreement with numerous epidemiological studies (Hellewell et al 2020, Prem et al 2020, Lai et al 2020, but from a different angle, confirmed that travel restrictions and wide screening and testing play critical roles in curbing the pandemic. We emphasize that the negative impacts of rising temperature cannot be translated to a moderate pandemic. Climate change could worsen a hypothetical pandemic occurring in the future to some extent, causing increasing confirmed cases. For instance, our findings suggest that 85% (243/287) and 77% (222/287) of cities could be expected to experience increases in caseloads in 2040-2059 for RCP4.5 and RCP8.5 scenarios, respectively, in which 24% (70/287) and 17% (49/287) of cities could be expected to face an increase of over 20%, although the majority of cities are projected to encounter reductions in cases in from 2080 to 2099.
Last but not least, defeating the COVID-19 pandemic is tough. A portfolio of strategies should be developed, instead of relying on variations in weather conditions. Finally, it is not too late to sound the alarm for a potentially more harmful pandemic in the future, given the climate-change framework.

Data availability statement
Data supporting the findings of this study are available upon reasonable request from the authors.