Spatiotemporal dynamics of global population and heat exposure (2020–2100): based on improved SSP-consistent population projections

To address future environmental change and consequent social vulnerability, a better understanding of future population (FPOP) dynamics is critical. In this regard, notable progress has been made in producing FPOP projections that are consistent with the Shared Socioeconomic Pathways (SSPs) at low resolutions for the globe and high resolutions for specific regions. Building on existing endeavors, here we contribute a new set of 1 km SSP-consistent global population projections (FPOP in short for the dataset) under a machine learning framework. Our approach incorporates a recently available SSP-consistent global built-up land dataset under the Coupled Model Intercomparison Project 6, with the aim to address the misestimation of future built-up land dynamics underlying existing datasets of future global population projections. We show that the overall accuracy of our FPOP outperforms five existing datasets at multiple scales and especially in densely-populated areas (e.g. cities and towns). Followingly, FPOP-based assessments of future global population dynamics suggest a similar trend by population density and a spatial Matthew effect of regional population centralization. Furthermore, FPOP-based estimates of global heat exposure are around 300 billion person-days in 2020 under four SSP-Representative Concentration Pathway (RCPs), which by 2100 could increase to as low as 516 billion person-days under SSP5-RCP4.5 and as high as 1626 billion person-days under SSP3-RCP8.5—with Asia and Africa contributing 64%–68% and 21%–25%, respectively. While our results shed lights on proactive policy interventions for addressing future global heat hazard, FPOP will enable future-oriented assessments of a wide range of environmental hazards, e.g. hurricanes, droughts, and flooding.


Introduction
In recent years, five Shared Socioeconomic Pathways (SSPs) were proposed by the Intergovernmental Panel on Climate Change for navigating the uncertainties in addressing future climate change (Riahi et al 2017) and advancing our common journey toward sustainability (Szetey et al 2021). Essentially, SSPs 1-5 depict five plausible future scenarios of socioeconomic development (Kriegler et al 2014) which can be used to derive greenhouse gas emissions scenarios with different climate policies. SSPs 1-5 correspond respectively to sustainability (SSP1), middle of the road (SSP2), regional rivalry (SSP3), inequality (SSP4), and fossil-fueled development (SSP5), and have been gaining increasing popularity within the global change and sustainability research community (Maury et al 2017, Van Vuuren et al 2017. These SSPs are distinguished by a few socioeconomic variables (e.g. population, GDP, urbanization, and education level). Among these fundamental variables, particularly, the arguably most critical is population. Population data are often the basis for addressing a wide range of social concerns, e.g. epidemics (Coccia 2020), heat waves , Huang et al 2019, floods (Gu 2019), droughts (Liu and Chen 2021), and sea-level rise (Kulp and Strauss 2019), and for monitoring 73 indicators of the UN SDGs (Freire et al 2018) that require population data as an input (Dahmm 2021). In this vein, an urgent need is to downscale the SSP projections of future global population over the 21st century to spatially explicit projections-i.e. to produce SSP-consistent gridded population data.
Notable progress has been made in producing such SSP-consistent gridded population projections at the global and regional scales. For global projections, Jones and O'Neill (2016) at the National Center for Atmospheric Research (NCAR) developed a widely-used global population dataset under SSPs 1-5. They predicted grid-level population under the SSPs' constraints of population and urbanization. Yet, the dataset was found that approximately 30%-43% of the estimated population were in uninhabited areas with cropland, forest, or pasture in 2050 (Chen et al 2020c) and is limited by its relatively low resolution (i.e. 0.125 degree). Subsequently, Gao (2020) downscaled the NCAR dataset from 0.125 degree to 1 km. There are other global forecasting efforts, albeit similarly they suffer from low resolution, poor accuracy for certain areas, and/or sometimes considering only partial SSPs (Murakami and Yamagata 2019). For regional projections, high-resolution gridded population datasets under the SSPs have been developed for Africa (1 km (Merkens et al 2016), and the Mediterranean coastal zone (1 km) (Reimann et al 2018). However, due to their different input data and methods, combining these regional population datasets for global applications would induce unknown uncertainties.
As of now, the improved NCAR dataset by Gao (2020) (Gao 2020), the share-of-growth model (McKee et al 2015), and gravity-type models (Grübler et al 2007, Jones andO'Neill 2016). Note that population often presents nonlinear relationships with the various predictive variables (Rohat 2018, Leyk et al 2019, such as historical population distribution (Xu et al 2021), biophysical or environmental factors (Stevens et al 2015, Yao et al 2017, and particularly, built-up land pattern-a decisive factor that shapes population distribution (Nieves et al 2017, Reed et al 2018. Yet, built-up land pattern remains insufficiently incorporated in the existing studies. Here we incorporate a newly available fineresolution future built-up land dataset to train a non-linear machine learning (ML) model, creating a new dataset of 1 km resolution global population projections under SSPs 1-5 throughout the 21st century for every ten years (the future population dataset, FPOP for short; available at www.geosimulation.cn/FPOP.html). Specifically, we take advantage of an SSP-consistent 1 km resolution global built-up land dataset for 2020-2100 by Chen et al (2020c), which itself is not informed by FPOP distribution. Further, we adopt the random forest (RF) algorithm to better capture the nonlinear relationships between population and various predictive variables. The remainder is organized as follow: section 2 presents methodology and raw data. Section 3 validates and evaluates FPOP's accuracy, following which we assess the spatiotemporal dynamics of future global population under SSPs 1-5 and associated heat exposure dynamics under contrasting SSP-RCP scenarios. Section 4 discusses some key findings that deserve research and policy attention.

Population projection using machine learning
The core of our methodology is recursive projections for every ten years from 2010 to 2100 by using an ML framework (i.e. RF) to capture the potentially nonlinear relationships between a range of predictive factors and gridded population (figure 1). The exceptional accuracy of our projections, as detailed below, results from a combination of three methodological peculiarities. First, we adopted a recursive approach-a conventional practice in existing population projection studies (Chen et al 2020a(Chen et al , 2020b)-to extend gridded projections at an earlier time T i onto the next time point T i+10 , i.e. ten years later. The underlying assumption is that population distribution is path-dependent, influenced by a legacy effect. Second, in addition to a few commonly  (2) projection. All input data were processed to train the RF model and then perform the recursive projections based on the trained model. used predictive factors-i.e. slope, distance to city center, distance to roads, and distance to water, as highlighted in previous studies (Stevens et al 2015, Nieves et al 2017-we further incorporated the SSPconsistent 1 km future global built-up land dataset by Chen et al (2020c) to improve prediction (see figure S1 for a comparative illustration justifying our use of the 1 km projection by Chen et al (2020c) instead of a 0.125 degree data by Gao and O'Neill 2020). The underlying assumption is that population distribution is closely related to built-up land. Third, similar to the idea of Cellular Automaton, we used derived data from population and built-up land grids by applying 3 × 3, 5 × 5, and 7 × 7 moving windows-following Chen et al (2020b)to account for multi-scale neighborhood effects. The underlying assumption is Tobler's First Law of Geography that '[e]verything is related to everything else, but near things are more related than distant things.' For each projection, the SSP-consistent global population count and ten gridded layers (figure 1; table 1) were used as inputs of the predictive RF algorithm, which was trained and tested at the year 2010 (see supplementary material for critical methodological details). The prediction performance of the trained RF algorithm was evaluated at multiscales against the adjusted WorldPop 2020 data (version: unconstrained individual countries 2000-2020 UN adjusted-1 km resolution) and further, compared with five existing gridded FPOP datasets that are SSP-consistent, including NCAR, CoastalZone, AFRICA, China_CAI, and China_CHEN (table 2). The accuracy assessment and comparison were based on the percent root relative squared error (%RRSE) (Raji and Vinod Chandra 2016, Khan et al 2020, Kumar and Susan 2020) for comparison across scales and regions. It is calculated as follows: (1) where n is the total number of grids, Pred (i) denotes the predicted population at the ith grid, and Act (i) denotes the corresponding adjusted WorldPop population in 2020, while Act (i) denotes the mean of Act (i) (i.e. the gridded adjusted WorldPop population in 2020 averaged over the total of n grids). Besides, seven representative urban regions-Cairo, Egypt; Melboume, Australia; New York, USA; Paris, France; Sao Pabulo, Brazil; Tokyo, Japan; and Yangtze River Delta, China-were mapped in terms of their 2020 WorldPop populations and the corresponding projections of the five existing datasets and our study. In so doing, the projection accuracy can be visually contrasted.

Extreme heat exposure assessment
Accurate population data is critical for a wide range of sustainability issues, of which exposure to extreme heat is a typical concern. Following the method adopted by Liu et al (2017), Jones et al (2018) and Chen et al (2020b), we calculated extreme heat exposure as the product of population times the yearly frequency of extreme heat days (unit: person per day), as follows: where E T (i),pop and P T (i) denote respectively the extreme heat exposure and population at the ith grid in year T, and H T (i) is the frequency of extreme heat days at the ith grid during year T. Here, an extreme heat day is defined as one when its highest daily temperature is no less than 35 • C (see supplementary material for methodological details).
Following the scenario setting rationale in Jones et al (2018) and Chen et al (2020b), population dynamics under SSP3 and SSP5 were considered for estimating future extreme heat exposure; SSP3 represents a world with rapid population growth in most regions and low urbanization, while SSP5 depicts low population growth and high urbanization (see table 2 in O'Neill et al 2017 for detail). Besides, to quantify the impact of population dynamics on extreme heat exposure, the Thiel-sen slope (Sen 1968) was calculated to measure the rate of change in extreme heat exposure for each continent (except Antarctica) from 2020 to 2100. The same method was applied to study the population dynamics from 2020 to 2100 in regions of different population densities.

Accuracy comparison between FPOP and existing datasets
We compared the projection accuracy of FPOP and five existing gridded population datasets, i.e. NCAR, CoastalZone, AFRICA, China_CAI, and China_CHEN (table 2). As noted in section 2.1, the %RRSE was used as the accuracy measure of the projections against the adjusted WorldPop population in 2020 for the globe, major continents and regions (see figure 2(a) for grid-accuracy's mean and table S1 for standard deviation). FPOP has a %RRSE of 34.10% at the global scale, contrasting to 84.24% for NCAR and 139.42% for CoastalZone. The improved accuracy of our population projections also applies to the six populated continents. Our accuracy outperforms the existing datasets by up to 54% in Asia, 97% in Africa, 57% in Europe, 53% in North America, 47% in Oceania, and 47% in South America. For China, particularly, the relative accuracy improvement of our projections is up to 112%. To quantify our accuracy at the country scale in the absolute sense, we further regressed our national projections against the population counts of 227 countries in 2020 based on globally comparable data of the World Population Prospects 2019 by United Nations (2019), which shows a near-perfect R-square of 0.99 ( figure 2(b)). The multi-scale accuracy assessments suggest that our data would provide the best available population projections at the global, continental, and national scales.
The accuracy improvement of FPOP as compared with the existing datasets seems to also hold across various populated metropolitan areas (figure 3). We made a visual comparison based on the spatial distribution of adjusted WorldPop and projected population densities in 2020 in seven typical metropolitan areas at the same spatial resolution of 1 km. The comparison consistently illustrates the qualitatively better accuracy of FPOP. Quantitatively, FPOP again has the lowest %RRSE for all the seven metropolitan areas as compared with the five existing datasets. For one, our %RRSE for Cairo, Egypt is 32.28%, while those of NCAR, CoastalZone, and AFRICA are 87.24%, 90.04%, and 72.36%, respectively. For another, our %RRSE for the Yangtze River Delta, China is 25.45%, while those of NCAR, CoastalZone, AFRICA, China_CAI, and China_CHEN are 88.83%, 82.22%, 71.31%, and 134.60%, respectively. The comparison for the other five metropolitan areas (i.e. Melbourne, Australia; New York, United States; Paris, France; Sao Paulo, Brazil; and Tokyo, Japan) also confirms that FPOP has quantitatively substantial accuracy improvement over the existing datasets.

Projected future population dynamics by SSP and population density
Based on the above accuracy assessment of our population projection algorithm, we then simulated future global population under SSPs 1-5 at a 1 km resolution for 2000-2100 at ten-year intervals, with the projections starting from 2010 (see section 2 for details). The projected populations in high-, medium-, low-density areas show a similar trend under all SSPs, except for SSP3 when the global population-particularly the population of high-density areas-was projected to increase all the way to 2100 (figures 4(a)-(e)). Contrarily, the trend under the other SSPs is consistent that the population would first increase until some point during the second half of the 21st century and then decrease till 2100. Relatedly, the spatial pattern of the global population is also similar, except that the populous (high-and medium-density) areas in places such as India, China, and parts of central Africa expand remarkably under SSP3 compared with the other SSPs (figures 4(A)-(E)). Generally speaking, the whole nation of India and the southeast half of China would be the most populous across the globe, followed by west and central Europe, southeast Asia, west and central Africa, and lastly, the fragmentedlydistributed built-up areas in east America and Latin America. The spatial pattern and temporal trend of the global population under SSP3 show stark contrast to those under the other four SSPs, indicating the critical need of enhancing proactive research and strategic policy interventions for an increasingly likely future where regional rivals and competitions divide the global community (i.e. SSP3)-as signaled by trade wars and travel bans in the recent years.
The trends of population change, like the trends of population per se, seem to show the strongest Matthew effect under SSP3, which is followed by SSP2 and SSP4, and lastly SSP5 and SSP1 (figure 5). To be specific, those high-and medium-density areas (e.g. India, southeast China, southeast Asia, west and central Europe, central Africa, and east America) appear to experience more drastic population growth during 2020-2100, while those lowdensity or population-sparse areas (e.g. southwest China, west Sub-Saharan Africa, and central America) are more inclined to have population decline (figures 5(A)-(E)). However, this qualitative observation of spatial variations is not in line with the contrast of decadal population changes by different type density areas. In most cases, actually, the decadal population change rate in each of three density areas ranks as follows: populationsparse, low-density, medium-density, high-density (figures 5(a)-(e)). In other words, the Matthew effect that densely-populated grids seem to have more population growth (or less population decline) holds only at the grid level yet not at the category level.
It suggests notable variations within each populationdensity category, and may indicate that FPOP would become regionally polarized/centralized.

Projected dynamics of future heat exposure by SSP-RCP and continent
The SSP-consistent, spatially-explicit future global population projections make it possible to do prognostic assessments of alternative socioeconomic development and climate policies. As an example of global concern, here future global heat exposure and dynamics are estimated under different scenario settings of SSP and RCP (see section 2 for details) by combining FPOP and future daily maximum temperature projections (figure S3). Different SSP and RCP settings would remarkably affect future global heat exposure (figures 6(a)-(d)). The worst situation would occur under SSP3-RCP8.5-a future scenario featured by low urbanization, exponential population growth, and high greenhouse gas emission. Under this very scenario, the global heat exposure would increase almost exponentially, i.e. from 315 billion person-days at 2020 to over 1626 billion person-days in 2100, an increase by 416% ( figure 6(c)). Under the contrary scenario, SSP5-RCP4.5, which is featured by high urbanization, slow population growth, and low emission, the global heat exposure would increase by only 62% from 318 billion person-days at 2020-516 billion persondays in 2100. From a spatial perspective, the future heat exposure shows a similar global pattern under different SSP-RCPs, i.e. typical areas with high exposure are the Ganges River Basin, Indus valley, eastern China, southeast Asia, and Sub-Saharan Africa, while those with much lower exposure include South America, North America, Europe, and Oceania (figures 6(A)-(D)). In general, Asia and Africa account respectively for 64%-68% and 21%-25% of the global exposure, while the remaining areas together contributing to 8%-15%.
Like the patterns of population change under different SSP-RCPs, the global patterns of heat exposure change during 2020-2100 also show the Matthew effect-where is more heat-stricken is also projected to suffer from more incremental heat exposure (figures 7(A)-(D)). The least heat-stricken regions largely overlap with the global population-sparse areas, e.g. deserts in Africa, Oceania, and northwest China as well as the Arctic. From a temporal perspective, the projected exposure would increase in each region during 2020-2100 (figures 7(a), (c) and (d)), except for the last two decades of the 21st century under SSP5-RCP4.5 ( figure 7(b)). Generally, the decadal increase rate of heat exposure would decrease over time during 2020-2100. Under RCP4.5, the decrease is approximately 35% in North America, 20% in Oceania, 12% in Europe and Africa, and 10% in Asia. Under RCP8.5, the decrease is about 36% in North America, 25% in Oceania, 35% in Europe, and 15% ∼ 20% in Africa and Asia. It should be noted that the exposure change rate of South America shows dramatic fluctuations after 2070 under RCP 4.5 and after 2050 under RCP 8.5. Notably, under the same RCPs, different SSP settings that reflect opposite population growth and urbanization trends could have a similar trend of heat exposure change. This observation indicates that urbanization and population growth contribute to a lesser degree than climate policies at the continent scale.

Discussion and conclusions
The significance of long-term trends and global patterns of population dynamics has been made clear in the existing literature (e.g. public health and environmental protection (Kii 2021), floods (Tate et al 2021), droughts (Liu and Chen 2021)). In this vein, our demonstration study that applies the improved FPOP (see supplementary material for methodological discussion) to assess future global heat exposure during 2020-2100 makes at least three contributions. First, from a spatial perspective, FPOP projects a consistent global pattern of future heat exposure under SSP3-RCP4.5, SSP3-RCP8.5, SSP5-RCP4.5, and SSP5-RCP8.5 (figures 6(A)-(E)), confirming the findings by Liu et al (2017, figure 1 therein) and Jones et al (2018, figure 1 therein). Our results also depict that future increment of high exposure areas would most likely occur in the tropical and sub-tropical regions (figures 7(A)-(E)), and that the worst scenario is SSP3-RCP8.5 (figures 6(a)-(e)). Our study additionally provides the continental shares of the global exposure, highlighting the primary contributor of Asia (64%-68%) and secondary role of Africa (21%-25%). Our study also hints on the Matthew effect of future exposure dynamics at the grid scale (figures 7(A)-(E)), which has important policy implications deserving further investigation.
Second, the contrast of global/continental exposures under different SSP-RCPs suggests that climate policies (the RCP setting regarding greenhouse gas emission) seems slightly more influential than socioeconomic policies (the SSP setting regarding population and urbanization) (figures 6(a)-(e)). Our preliminary conclusion is consistent with Jones et al (2015) in the context of the U.S. and the observation by Jones et al (2018) in their global-scale study. However, there are contradictory studies. For example, Liu et al (2017) conclude the contributions of future climate change and global population dynamics would respectively account for 28% and 9%, with the main contribution coming from their interaction (66%). Huang et al (2018) show that in the context of China the contributions of climate change and population dynamics are 60%-70% and 20%-40%, respectively. Besides, a recent historical study by Tuholske et al (2021) suggests that the impact of population dynamics on global urban population exposure to extreme heat is about two times that of warming. Despite these inconclusive findings, it seems climate policies would be relatively more effective than population policies in addressing future global heat exposure.
Last, our estimates of future global heat exposure under the various SSP-RCPs are substantially larger than those of existing findings, though consistent in terms of the magnitude. FPOP-based global heat exposure under the four SSP-RCPs is around 300 billion person-days in 2020, and by the end of the 21st century, could increase to as high as 1626 billion person-days under SSP3-RCP8.5 and as low as 516 billion person-days under SSP5-RCP4.5. Contrastingly, Liu et al (2017) Tuholske et al (2021) report an average annual increase rate of 2.1 billion person-days for the historical global urban heat exposure during 1983-2016, and a total of 119 billion person-days in 2016. The differences notwithstanding, our estimates and the noted values are not directly comparable in at least two aspects. One is that they are based on various population and temperature datasets, and the other is that they are average or median values of multiple years. With regards to the latter, moreover, our study illustrates the generally increasing trend of FPOP heat exposure under the examined SSP-RCPs (figures 6(a)-(e)) and reveals the generally declining trend of the exposure increase rate (figures 7(a)-(e)). In light of these two findings, it is possible that the snapshot-style assessments which ignore temporal dynamics may lead to severe underestimation or even misleading conclusions.
To be rigorous, we made supplementary analyses to disentangle the difference/sensitivity in estimating heat exposure due to the variety of population datasets (figures S4-S7). The estimates of global heat exposure at 2020 are 306. 80-326.36, 313.68-337.81, and 312.04-335.44 billion person-days based respectively on FPOP, NCAR, and CoastalZone, and for 2100, are 516. 17-1626.21, 518.36-1975.14, and 504.50-1935.97 billion person-days under the four SSP-RCPs (table S2). Against the FPOP-based results, NCAR underestimates the exposure in denselypopulated areas (e.g. India, southeast China, southeast Asia, west and central Europe, South Africa, east and central America) by up to 3.77%, and overestimates in sparsely-populated areas (e.g. southwest China, North and central Africa, west America) by as much as 67.42% (figure S6); CoastalZone overestimates the exposure in densely-populated areas (e.g. India, China coast, southeast Asia coast, east Europe, and east America) by as much as 52.89%, and underestimates in inland areas (e.g. central China, central America, central Europe, central southeast Asia and Sub-Saharan Africa) by as much as 89.02% ( figure  S7). The mis-estimation and spatial distribution due to inaccurate population projections are far from trivial. A recent study by Burkart and colleagues (Burkart et al 2021) reports that 356 000 deaths worldwide in 2019 were linked to heat extremes. The death number may seem not so alarming, yet with the projected increase by 2100 and the mis-/underestimation due to inaccurate population projections, can translate into more dramatic deaths and devastating loss for the families involved. As none of the existing datasets are perfect including FPOP, continuing efforts should be invested to keep improving gridded future global population projections and to reexamine future exposures and vulnerabilities associated with other widely-concerned hazards, such as flooding (Kirezci et al 2020), extreme cold (Batibeniz et al 2020, Broadbent et al 2020, and typhoon (Yin et al 2021).

Data availability statement
Any data that support the findings of this study are included within the article (and any supplementary files). The historical global population maps are available at www.worldpop.org/. The United Nations Population can be obtained at https://population.un.org/ wpp/Download/Standard/Population/. The historical global built-up land maps can be retrieved from