Global migration is driven by the complex interplay between environmental and social factors

Migration manifests an important response and adaptation measure to changes in the environment and socioeconomic conditions. In a time when environmental stressors and risks are unprecedentedly increasing, understanding the interplay between the underlying factors driving migration is of high importance. While the relationships between environmental and socioeconomic drivers have been identified conceptually, the comprehensive global-scale spatial quantification of their interactions is in its infancy. Here, we performed a geospatial analysis of gridded global net migration from 1990 to 2000 using a novel machine learning approach which analyses the interplay between a set of societal and environmental factors simultaneously at the place of origins (areas of net-negative migration) and destinations (areas of net-positive migration). We diagnosed the importance of eight environmental and societal factors in explaining migration for each country, globally. Nearly half of global in- and out-migration took place in the areas characterized by low adaptive capacity and high environmental stress. Regardless of the income level, income was the key factor in explaining net-migration in half of the countries. Slow-onset environmental factors, drought and water risk, were found to be the dominant environmental variables globally. Our study highlights that factors representing human capacity need to be incorporated into the quantitative diagnosis of environmental migration more rigorously.


Introduction
Recent events such as migrant caravans from Central America to the United States in 2019, the Venezuelan migrant and refugee crisis in 2019-20 and the 2015 crisis of large refugee flows from the Middle East and North Africa to Europe have been frequently linked with preceding severe drought episodes in the country of origin (Chemnick 2019, Gustin and Henninger 2019, Markham 2019, Podesta 2019. Indeed, a stereotypical view that environmental change would induce mass-migration fluxes towards the 'Global North' has been repeated in both research and policymaking for decades (Boas et al 2019). The empirical evidence supporting such claims however is inconsistent (Selby et al 2017, Abel et al 2019). Accordingly, investigating the fundamental, manifold role of environmental stress (ES) as a trigger and driver of migration has substantially gained both scholarly and public attention. Not only do various environmental factors influence migration in different directions and magnitudes (see e.g. Gray and Mueller 2012, Cattaneo and Peri 2016, Kubik and Maurel 2016, other societal factors and their interactions also play an important role. The understanding of human migration therefore needs to account for complex interactions between different drivers of migration at the micro, meso and macro levels (Abel et al 2019, Boas et al 2019, Borderon et al 2019, Hoffmann et al 2021. A traditional gravity-based 'push-pull' model has often been used to identify the macro-level factors underlying migration decisions by analyzing spatial disparities between the place of origin (as pushing factors) and destination (presumably more attractive conditions, i.e. pulling factors) (Lee 1966, de Haas 2011. Despite their conceptual clarity, the pushpull model has been criticized for its simple assumption on the linear relationship between environmental change and migration dynamics (Jónsson 2010). The literature is dominated by the assumptions that environmental changes are the primary pushing factors that linearly lead to migration whereas in reality individuals and households employ diverse responses to environmental shocks based on their social, economic, demographic and political capital (Nelson et al 2007). ES thus may influence migration through affecting other migration drivers such as through exacerbating conflict, reducing agricultural production and income change (Beine andParsons 2015, Abel et al 2019). On the other hand, migration is a costly process and people with little social and economic resources generally have lower capacity to move, thus the majority of migration is internal or between low-and middle-income countries (Hoffmann et al 2020). This non-linear pattern follows the prediction of the migration hump theory which holds that migration has an inverted U-shaped relationship with socioeconomic development (Martin and Taylor 1996). International migration hence is low in low income and the least developed countries because their populations cannot afford to emigrate.
Establishing the relationship between environmental change and migration response requires a comprehensive account of all other factors and contextual effects which could determine the migrationenvironment association (Borderon et al 2019). One commonly used approach for coupling the societal and environmental dimensions in studying migration on a conceptual level is introduced by Black et al (2011b) and the Foresight report on Migration and Global Environmental Change (2011). Their approach depicts migration through a relationship between dimensions of human capacity and vulnerability to environmental change (figure 1) and thus combines objective circumstances with subjective perceptions influencing migration. In addition to addressing vulnerability to environmental change, their widely used conceptual framework incorporates a diversity of psychosocial and socioeconomic factors (e.g. education, income, individual's intentions and cultural identity) that influence people's mobilitydecisions and capacities to move. Failing to account for socioeconomic drivers and their interplay with other factors in influencing migration can provide a biased estimate of the role of environmental change and stressors.
There are, however, only few studies that provide quantitative global assessments of the interplay between societal and environmental factors underlying human migration. Marotzke et al (2020) and Lilleør and van den Broeck (2011) explored the poverty-climate-migration nexus in a laboratory setting considering only economic factors in less developed countries. de Sherbinin et al (2012) and Neumann et al (2015) studied global spatial patterns of environmentally induced migration but excluded socio-economic drivers from their analysis. Studies which include both environmental change and socioeconomic factors are mainly regional ones (see e.g. Wiederkehr et al (2018) on Sub-Saharan Africa and Kluger et al (2020) for Peru). Furthermore, studies on environmentally induced migration typically focus on the place of origin and their characteristics while much less attention is paid to conditions in the destination areas (Findlay 2011, Ayeb-Karlsson et al 2020, despite the fact that societal and environmental factors also reflect the ability of the destination area to absorb (or attract) migrants (Niva et al 2019). For policy planning, it is highly relevant to identify where environmentally induced migrants may move to, as well as to understand the characteristics of both the origins and destinations in order to assess migrants' vulnerability at both ends of migration. Moreover, quantitative global assessments of migration can be directly incorporated into other modeling frameworks such as the Integrated Assessment Models which are designed to describe key interactions between physical and social systems. Changes in drivers of migration would influence migration patterns and consequently population size, income distribution and emissions (Liang et al 2020, Benveniste et al 2021. The quantitative assessment of environmental and socioeconomic drivers of global migration thus can substantially improves our understanding of future socioeconomic development which can have considerable implications on the global climate system. We address these gaps by providing a global quantitative assessment of (a) the interplay of environmental-societal characteristics in both sending (negative net-migration) and receiving (positive net-migration) areas globally, and (b) the importance of different environmental and socio-economic indicators underlying net-negative and net-positive migration by utilizing a machine learning method (random forests). This paper thus contributes to the current migration research by studying both out-and in-migration locations simultaneously by utilizing spatially explicit global data sets covering a range of relevant environmental, socio-economic and demographic indicators (see table 1) as well as gridded net-migration data (de Sherbinin et al 2012). Furthermore, the use of random forests to quantitatively define the nexus between environmental change, socioeconomic factors and migration on a global scale is novel in the field. The number of international and internal migrants is constantly growing with rapidly changing environment around the globe (Xu et al 2020). It is thus of prime scholarly and policy importance to understand the characteristics and interplay of both environmental and societal factors behind human migration.

Materials and methods
All analyses were conducted globally on five arcminute resolution grid cell level (figure 2, table 1). For the random forest analysis, individual models for netnegative and net-positive were created for 178 countries in total, i.e. each model is based on the grid cells of the country in question (n varies from 1, in very small countries such as Vatican City or Gibraltar, to 34 35 160 cells in Russia, global median 4447 cells). Models were used to study the importance of each variable in explaining net-positive and net-negative migration, i.e. which variable had the highest explanatory power on the response variable. Feature importance distributions of each variable are illustrated for 12 groups based on the United Nations (UN) geoscheme (Statistics Division of the United Nations Secretariat 2021). Country classification is presented in supplementary materials (table S2) (available online at stacks.iop.org/ERL/16/114019/mmedia).

Indicators of ES and societal factors
Our indicator approach for analyzing the interplay of environmental and societal characteristics behind human migration has been extended from Varis et al (2019b) who studied the resilience of human-natural systems through considering both AC and environmental vulnerability. This approach allows a geospatial analysis of ES factors in parallel with factors indicating societal AC to cope with environmental and other stress factors. For the purposes of this study, some of the indicators were modified. We defined four societal factors: governance effectiveness, level of income, health and education as components of AC, of which the last three are also the components of the Human Development Index used as a composite index in Varis et al (2019b). Income was downscaled to grid level based on night lights and agricultural land use, using linear multiple regression model from Kummu et al (2018).
For ES, we selected four variables representing diversity of environmental risks and stressors: drought and WR were considered to be proxies for slow onset environmental change while natural hazards represent a more sudden change or shift in the environment. FPS was selected as a proxy of local food insecurity (see complete list of all indicator sources and their measurement in tables 1 and S1 in the supplements). Spatial distributions of the indicators used are illustrated in figures S2-S4.
Temporal average over 1990-2000 was used for all indicators which are available for the whole time period (except for food production which was measured in 2000 and drinking water and sanitation coverage measured in 2015 due to data availability). Drought risk (DR) was composed from the standardized precipitation-evapotranspiration index (SPEI) (Vicente-Serrano et al 2010) by computing a cumulative sum of negative index values (drier years than average) over the study period. WR was calculated based on quantitative risk factor, baseline water stress, and qualitative risk factor, the level of improved sanitation and drinking water, from aqueduct water risk data (Hofste et al 2019). FPS is the ratio between crop production and population (kcal/capita/day) and scaled between 0 and 1 based on kcal per capita level (FPS ⩽ 500 kcal: high scarcity = 1; FPS ⩾ 5000 kcal: no scarcity = 0). Finally, all indicators (except for FPS) were scaled between 0 and 1 with min-max normalization where the smallest and highest 5% were assigned values 0 and 1, respectively. Societal and environmental factors were then combined into two composite indices of AC and ES, as the mean over their four components. The data were tested for cross-correlations: variables within AC index had strong correlation, while correlation between AC and ES variables was weak (see figure S1).

Net-migration and population data
In the acquired dataset, decadal net-migration was defined as NM = total population change (birthsdeaths), in each grid cell (de Sherbinin et al 2015). Net-negative migration illustrates areas with more emigrants than immigrants, and net-positive migration areas with more immigrants than emigrants over the time period. The NM data were aggregated from 30 arcsec to 5 arcmin resolution to match other datasets, which were not available at higher resolution. Furthermore, de Sherbinin et al (2015) data were not modeled with the 30 arcsec resolution original input data. It is thus justified to aggregate the data to 5 arcmin resolution without losing much information (see figure S4 for the coefficient of variation in the aggregated data). The data were aggregated by summing over a 10 × 10 window by using the aggregatetool in Raster-package in R (Hijmans 2019). For random forest analysis, the net-migration data were then normalized with the respective population count in the initial timestep (1990) in each grid cell in order to address the effect of population to net-migration count. Here it is important to note that net-migration accounts for all types of mobility and does not distinguish between voluntary and forced migration, for instance.

Interplay and importance of environmental and societal factors
We extend the conceptual typology introduced in figure 1 to a quantitative tool by using the composite indicators of AC and ES (Varis et al (2019b); see above) to describe the relationship of environmental and societal factors driving migration (figure 1). Accordingly, we created a four-by-four classification matrix representing the interplay at net-negative and net-positive migration locations (figure 3) with four thresholds for low, medium-low, medium-high and high AC and ES as per the following breaks (0, 0.25, 0.5, 0.75, 1). This framework was employed to both origins (net-negative migration) and destinations (net-positive migration) in order to define the interplay between AC and ES as the underlying conditions of migration at both ends. The matrix was used to calculate the sum of net-negative and net-positive migration in each class (e.g. total netnegative migration in class 1 would be the sum over all net-negative grid cells within that class). Then the share of each class was calculated as the ratio to the total (global) net-negative/positive migration (sum of all net-negative/positive grid cells globally). Calculations were done by using the zonal-tool in the Raster package in R (Hijmans 2019).
Random forest regression was utilized to quantitate the independent importance of each variable (table 1) in explaining both net-negative and netpositive migration. Random forest regression is a machine learning algorithm that uses an ensemble of multiple bootstrap sample predictions (decision trees) to produce a consensus regression fit (Breiman 2001). This technique is suitable for identifying and ranking endogenous explanatory factors underlying migration decisions (Schutte et al 2021). It is also applicable to data with collinear explanatory variables and unique probability distributions as the method randomly splits or bags the data into multiple samples (and out-of-bag samples, i.e. the data left out of each sample) each containing only a subset of variables, i.e. potentially correlated variables are not represented in all decision trees (Cutler et al 2007). The importance of each variable describes the increase in prediction error (MSE from the out-of-bag sample) when the values of that variable are randomly permuted. High importance denotes high explanatory power in that specific model while negative importance indicates that the variable weakens the model's prediction power. Ultimately, relative feature importance (RI) is used to illustrate and rank how well a given feature predicts migration in relation to the best feature with RI = 1.
Country-specific regression models were created for relative net-negative (per population; 178 countries) and net-positive migration (per population; 178 countries) observations (response variables) and respective individual variables of AC and ES (explanatory variables) with the Ranger-package in R (Wright and Ziegler 2017). Regression was conducted for each country individually, as it represents a highly relevant scale for policy making. Grid cell values for both response and explanatory variables within each country were extracted and then used as individual observations for each model.

Interplay of AC and EV
Our analysis shows that in 1990-2000, the majority of net-negative and net-positive migration occurred in areas characterized by high ES. Globally, 58% of the total net-negative migration took places in areas with medium-low to medium-high AC and ES. Further, 32% of global net-negative migration originated in just one class (c6), with medium-high to high ES but medium-low AC (figure 4(a)) while neighboring class c7 (with higher AC) and c10 (with lower ES) together accounted for 27% of global net-negative migration.
Despite the majority of global net-negative migration being concentrated in intensively populated areas (35% of world's population lived in c6, c7 and c10 in 1990) migration-to-population ratio shows a slightly different pattern. For instance, the net-negative migration-to-population ratio (total net-negative migration per population per class) in the abovementioned c6 was very low, around 69 emigrants per 1000 inhabitants, compared to the highest net-negative ratio of 5860 emigrants per 1000 inhabitants in c13 with globally lowest ES and AC ( figure 4(b)); however, the populated areas in c13 represent a very small share of global land and population as they include only a handful of cells e.g. in rural Kenya and Afghanistan (see figure 3).
The clusters accommodating the majority of global net-positive and net-negative migration were characterized by similar profiles (figure 4). A total of 80% of global net-positive migration took place in five classes of which c7 alone accommodated 22% of global net-positive migration ( figure 4(c)). Yet, the median net-positive migration-to-population ratio across all observations in c7 was only 96 immigrants per 1000 inhabitants. The highest net-positive migration-to-population ratio was found in c3 with 147 immigrants per 1000 inhabitants (figure 4(d)).

Relative importance of explanatory variables
The analysis of the variables' importance and explanatory power highlights the following three points. Firstly, Ethiopia, Georgia, Jordan, Bangladesh, Demographic Republic of Congo and Papua New Guinea stood out with the strongest explanatory power for net-negative migration (R 2 = 0.63, 0.61, 0.58, 0.52, 0.51 and 0.5, respectively), compared to moderate global predictions (global median of R 2 = 0.17) (figure 5). In terms of net-positive migration, explanatory power was moderately strong (R 2 > 0.50) in ten countries (e.g. R 2 = 0.72 in Tanzania; 0.67 in Eritrea, 0.66 in Guyana, 0.58 in Mali), while global median remained very low (global median R 2 = 0.14). Noteworthy, the selected variables could not explain any of net-negative migration in 14% of all countries, or any of the net-positive migration 28% of the countries (R 2 = 0). See figure  S5 for the overall out-of-bag prediction error for each model.
Secondly, income level was the key determinant for both net-negative (figure 6(a)) and positive migration (figure 7(a)), illustrating a globally mutual feature importance even when other societal and environmental factors were included in the models. Given that the income data were downscaled with night-lights data, this also indicates a strong effect of urbanization. In other words, income was the best variable in describing the internal variation of both net-positive and net-negative migration across the low to high income gradients in around half of the countries (58% and 60% of the countries for netpositive and negative migration, respectively).
Notably, education and health were the 2nd most important societal features, by ranking highest in 8% and 6% of the countries in terms of net-negative migration, respectively (figures 6(b) and (c)).    N = 178). Importance of each feature on net-negative migration is ranked so that the most and least important variables in each country's model are assigned values 1 and 8, respectively. The higher the importance, the better the variable is in explaining net-negative migration in each country. Importantly, the global median RI of education (global median RI = 0.41) and health (global median RI = 0.39) in explaining net-negative migration were a 3rd of the most important factor income level (global median RI = 1.00), being higher than the global median RI of any of the environmental variables (figures 8(a) and S6). To mention a few, education was the most important feature in Kyrgyz Republic (absolute feature importance AFI = 736; In terms of net-positive migration, health was the most important determinant after income, by ranking the highest in 8% of the countries, while education ranked the highest in only 4% of the countries (figures 7(b) and (c)). Yet, the global  N = 178). Importance of each feature on net-positive migration is ranked so that the most and least important variables in each country's model are assigned values 1 and 8, respectively. The higher the importance, the better the variable is in explaining net-positive migration in each country. median RI of education and health were around a 3rd (RI = 0.34, 0.32, respectively) of income level (RI = 1.00) (figures 8(b) and S7). To mention a few, health was the best variable in Madagascar, (AFI = 67; R 2 = 0.28; MSE = 47), India (AFI = 16; R 2 = 0.39; MSE = 1.7) and Lao (AFI = 12; R 2 = 0.36; MSE = 4.9) for net-positive migration. Expectedly, governance ranked the lowest in explaining both net-negative and positive migration; data for governance were on a country level and thus do not explain well variation within a country. See figure S9 for country specific results regarding absolute feature importance.
Thirdly, another collective feature is shown by slow-onset ESs and natural hazards which were globally the dominant environmental variables in explaining net-negative and net-positive migration in almost all country groups (figures 6(b), 7(b), and 8). DR and natural hazards ranked the highest in explaining net-negative migration in 7% of the countries each (figures 6(g) and (h)). DR was the best feature in Iraq (AFI = 6278, R 2 = 0.33, MSE = 3977) and Libya (AFI = 0.008, R 2 = 0.37, MSE = 0.01) while natural hazards ranked the highest in Georgia (AFI = 248, R 2 = 0.61, MSE = 111) and Mali (AFI = 15, R 2 = 0.30, MSE = 11), to mention few (see figure S8 for country specific results). Yet, the global median RI of DR and natural hazards were less than 30% (global median RI = 0.28 and 0.21, respectively) of the most important variable income (RI = 1.0) ( figure 8(a)), indicating that their importance in relation to the most important variable was relatively low in the countries where the variables did not rank the highest (figures 6(g) and (h)). The importance of WR and food production was lower, by being the best variable in only 6% and 4% of the countries, respectively.
In terms of net-positive migration, WR was the best variable in 9% of the countries, the global median RI being one third (RI = 0.3) of income (RI = 1.0). Notably, the global median relative importance of DR was higher, 37% of the best feature, indicating it had a moderate importance even when not ranking as the best feature (figures 7(e) and 8(b)). Natural hazards ranked highest in 8% of the countries, including Libya (AFI = 36, R 2 = 0.20, MSE = 53), Kenya (AFI = 1.2, R 2 = 0.22, MSE = 2.6) and Lesotho (AFI = 0.36, R 2 = 0.41, MSE = 0.28) but also Norway (AFI = 7.3, R 2 = 0.19, MSE = 5.3), where the conditions regarding the risk to natural hazards as well as AC range from low to high (see figure S9 for country specific results). FPS ranked highest in 5% of the countries, with the global median RI being 0.1.

Importance of societal factors on environmental migration
The majority of global migration in our study period occurred in areas with a risky combination of high ES and low to medium AC. Income level was the key factor in explaining net-migration, interestingly across the global income groups from low to high. Slow-onset environmental variables, drought and WR, had the highest importance amongst ES for both net-positive and net-negative migration especially in dry regions like South and East-Asia and North-Africa. Here net-positive refers to situations where inmigration exceeds out-migration while net-negative refers to situations where out-migration exceeds inmigration. Our global synthesis with 16 classes successfully illustrated the spatial heterogeneity of the different factors underlying migration and their interplay. While the global prediction power with the selected factors was moderate, we were able to identify geographical heterogeneities of migration patterns.
A clear majority of global net-negative migration originates from environmentally stressed and hazardous areas (in agreement with de Sherbinin et al (2012)) with medium-low to medium-high ES and medium level of AC. This aligns the previous literature showing that environmental migration is more common among the middle-level income countries, not among the poorest nor the richest (Cattaneo and Peri 2016, Hoffmann et al 2020). Our results indicate that income level, followed by DR and education have a primary importance in explaining net-negative migration in areas with high ES (figures 6 and 8(a)). In fact, aligned with our finding, Neumann and Hermans (2017) observed economic and social aspects to be the predominant reasons for out-migration whereas environmental factors, such as droughts, were found to drive migration indirectly through 'economic deterioration' in areas like the Sahel. Our results suggest that environmental pressures alone are unlikely to cause migration through simple linear linkages, despite the fact that the presence of environmental pressures in the sending areas of migration is evident (Black et al 2011a, 2011b, de Sherbinin et al 2012, Neumann et al 2015, Abel et al 2019. The role of the environment in driving migration should thus be investigated critically (Murphy 2015, Betts and Pilath 2017, Boas et al 2019, and socioeconomic variables should be factored in in the attempts to quantify environmental migration. We found that the majority of global net-positive migration was characterized by high ES and medium level of AC ( figure 4(c)). This finding is in line with the empirical evidence that both voluntary and forced migration tend to occur between neighboring countries or within the same region Sander 2014, Abel et al 2019). African migrants, for instance, predominantly move within Africa so the high ES observed in the destinations may reflect the fact that most migration is short-distance. The characteristics of the destination areas, on the other hand, have received less attention in the environmentalmigration nexus literature (Cattaneo andPeri 2016, Hoffmann et al 2020). A combination of high ES and low-to-medium capacity potentially exposes migrants to a twofold risk at both origin and destination: firstly, they are also exposed to numerous social and ecological vulnerabilities in the destination (de Sherbinin et al 2012, Adri and Simon 2018), and secondly, such conditions might prevent people with low capabilities from moving to a more desired location or relocating back to their origin (Ayeb-Karlsson et al 2020). Environmental hazards combined with numerous inadequacies in terms of human development, economy and governance may trap in-coming migrants with increasing vulnerabilities (Ayeb-Karlsson et al 2020) and thus hamper the positive gains from migration.
Despite the fact that our global analysis does not distinguish between rural and urban areas in terms of origins and destinations of migration, our income data capture the importance of regional disparities in producing migration. These data were downscaled from sub-national income data to 5 arcmin (ca 10 km in the equator) resolution by using night lights and agricultural land use data and thus illustrate the difference in income levels between rural and urban areas within a country. Considering the importance of income in explaining both net-negative and netpositive migration, it is likely that it is the difference between income-levels of the origin and destination areas that explains migration instead of income itself. This finding aligns well with the classic gravity-model theories of migration (Lee 1966, de Haas 2011.
In the coming decades, African countries, in particular, are expected to experience fast urbanization resulting from a combination of natural population growth and in-migration driven by the disparities between rural and urban areas (Awumbila 2017, Farrell 2018. Rapidly expanding urban areas with low capacity in terms of income level, governance and basic services, in particular, tend to generate informal settlements that often function as 'waiting rooms' for in-coming migrants with low capabilities (Tacoli et al 2015, Niva et al 2019, Andrews 2020. Meanwhile, the population living under water stress is expected to grow by half up to double in the coming decades due to climate change (Munia et al 2020). In fact, there is already some evidence showing that some urban agglomerates are facing a dual-risk from both droughts and floods (Cai et al 2018). Notably, our results show drought and WR had the highest or second highest importance in explaining net-positive migration in numerous areas with low-to-medium AC and high ES, reflecting the evidence from other studies as well as showing further research needs; future studies should pay elevated attention to the conditions of where people move to (Findlay 2011, Ayeb-Karlsson et al 2020, especially in urban destination.

Limitations of this study
This work has analysis and data-related limitations commonly faced in global analyses. Firstly, the results are prone to uncertainty, because the migration data obtained from de Sherbinin et al (2015) themselves are a product of modeling: the original migration dataset contained a minor built-in error of around (−) 400 000 migrants, (ca 0.1% of global net-migration). The same issue applies to the environmental data of which many are originally modeled (water stress, SPEI index and natural hazards), and may thus contain and result in inaccuracies especially in remote locations.
Secondly, while our global analysis was conducted at high resolution grid, it should be noted that the net-migration data used here represent the world in the past. Here, the dataset from de Sherbinin et al (2012) at 10 km spatial resolution were selected over a recent net-migration dataset by Alessandrini et al (2020). While Alessandrini et al (2020) data has a fine temporal resolution, they used only gridded national values on a coarse spatial resolution (25 km) instead of using downscaled sub-national values, as done in de Sherbinin et al (2015). Notably, despite we utilized the best available data for building our indicators, water stress and FPS were comprised with data from varying years.
Thirdly, the explanatory variables could explain up to 60% of the variance in any of the models, and notably, income outperformed all other variables systematically across the globe. While this aligns with many studies highlighting the role of income as a primary driver of migration, the results may be biased. The data of income were downscaled to grid level by using a proxy for rural-urban division (see supplement) thus potentially overriding other variables that were gridded from sub-national data. Moreover, some of the indicators used here (NH, WR, FPS) comprise of multiple indices and thus do not provide information on the importance of their individual components on migration.
It should also be noted that studying a complex phenomenon such as migration by using quantitative indices is prone to uncertainty as global indicators and the data cannot capture decision-making processes at an individual level, or in very small countries. Despite the population living in countries where the number of cells is 20 or less is only 0.1% of the global population, it can be presumed that the data do not fully capture migration dynamics in microstates, such as Liechtenstein or Andorra. Moreover, our data only illustrate net-migration and thus do not separate voluntary from forced migration. While it is not entirely possible to make a clear-cut distinction between forced and voluntary migration since in fact migration decisions do have a certain degree of volition (Erdal and Oeppen 2018), different types of migrants are protected by different bodies of international law as well as non-legally binding best practices and principles (Martin 2018). Therefore, in practice, migration policy and regulations need to distinguish between types of migration which unfortunately is not possible in the net-migration data used here.
Nevertheless, our analysis does tap into various indicators such as governance, education and health that have previously been identified as being fundamental in reducing vulnerability and enhancing AC (Lutz et al 2014, Andrijevic et al 2020. The novel machine learning approach which helps identify the importance of each variable in explaining migration thus allows for pinpointing which societal factor is highly relevant and can be used as an empirical ground in policy making processes. Furthermore, our analysis provides useful insights on the relationship between the used variables as well as variation of relative feature importance in terms of migration globally, by country groups, and by similarity classes. That the variables featured very different level of explanation power between neighboring countries indicates that selecting variables for future studies is sensitive to location.

Ways forward
Our results and limitations partly reflect the availability, accuracy and development needs of migration and socioeconomic indicator data. Demand for high-resolution spatiotemporal data on detailed subnational net-migration is urgent. To our knowledge there are altogether two gridded datasets of global net-migration of which both compromise with either temporal or spatial scale and the scale of input data (national vs sub-national) (see section 4.2). This significantly hinders the production of accurate and comparable spatiotemporal estimates of migration. For instance, the simplistic narratives of massmigration fluxes and portraying migration as a security hazard has been repeated in both research and policy-making for decades (Boas et al 2019), but data for investigating these recent developments lag behind.
Noteworthy, identifying local characteristics underlying migration is equally difficult. Globally comparable fine-scale socio-economic data are scarce and typically sub-national scale data require downscaling if a more refined scale is desired. For instance, education, governance and health were outperformed by downscaled and spatially more detailed income data income in explaining net-negative and netpositive migration. We thus call for high-resolution spatiotemporal data for producing consistent and up-to-date predictions of human migration and its conditions globally.

Conclusions
We provided a global assessment of the interplay of environmental and societal characteristics underlying migration in sending (negative net-migration) and receiving (positive net-migration) areas by creating a novel classification-matrix. Furthermore, we assessed the importance of eight environmental and socioeconomic indicators on net-negative and net-positive migration at national scale using a machine learning method. Our findings extend the current knowledge on three fronts: • Within the study period 1990-2000, the majority of global net-negative and net-positive migration was concentrated in areas with rather similar profiles; a combination of both low-to-medium adaptive human capacity and medium-to-high ES, and low migration-to-population ratio. • Income outperformed all other variables in circa half of both sending and receiving areas. Education and health were also significant local factors in explaining migration, especially net-negative, with global median importance being around 40% of the most important factor, income. DR and WR had the highest importance among environmental variables, globally. • The combination of the novel matrix approach, an ensemble of national-level models, and machine computational methods allowed us to identify new global patterns on both net-positive and netnegative migration, thus significantly improving the knowledge on important drivers of in-and outmigration.
Finally, we highlight the urgency for adapting integrative approaches in the quantitative analysis of environment-migration nexus more rigorously. A phenomenon that is ultimately based on individual and human decision-making simply cannot and should not be studied without the inclusion of societal dimension: human capacity and agency. In order to study the complex causalities between migration and its underlying conditions further in both research and policy-making, it is of urgent importance to produce detailed and timely spatiotemporal data regarding migration and its drivers. In the time when environmental vulnerabilities are on the surge, it is indeed fundamental to understand how human populations respond and adapt to them.

Data availability statement
The data that support the findings of this study are available in Zenodo: 10.5281/zenodo.5562038. The code needed to replicate the analysis is available at: https://github.com/VenlaN/global-migrationinterplay.