Modeling and predicting evacuation flows during hurricane Irma

Evacuations are a common practice to mitigate the potential risks and damages made by natural disasters. However, without proper coordination and management, evacuations can be inefficient and cause negative impact. Local governments and organizations need to have a better understanding of how the population responds to disasters and evacuation recommendations so as to enhance their disaster management processes. Previous studies mostly examine responses to evacuations at the individual or household level by using survey methods. However, population flows during disasters are not just the aggregation of individuals’ decisions, but a result of complex interactions with other individuals and the environment. We propose a method to model evacuation flows and reveal the patterns of evacuation flows at different spatial scales. Specifically, we gathered large-scale geotagged tweets during Hurricane Irma to conduct an empirical study. First, we present a method to characterize evacuation flows at different geographic scales: the state level, considering evacuation flows across southern states affected by Irma; the urban/rural area level, and the county level. Then we demonstrate results on the predictability of evacuation flows in the most affected state, Florida, by using the following environmental factors: the destructive force of the hurricane, the socioeconomic context, and the evacuation policy issued for counties. Feature analyses show that distance is a dominant predictive factor with counties that are geographically closer generally having larger evacuation flows. Socioeconomic levels are positively related to evacuation flows, with popular destinations associated to higher socioeconomic levels. The results presented in this paper can help decision makers to better understand population evacuation behaviors given certain environmental features, which in turn will aid in the design of efficient and informed preparedness and response strategies.


Introduction
potentially under the impact of disasters may choose to evacuate voluntarily or under the guidance of evacuation policies issued by local governments, so as to avoid life-threatening risk, reduce personal injuries and damages to property [3]. Nevertheless, due to the limited understanding of population evacuation behaviors, local governments may fail to provide proper guidance or management strategies, which can lead to inefficient evacuations or even cause negative consequences to surrounding areas. For example, the evacuated population might crowd to certain routes and cause severe traffic jams that hinder the evacuation process [4]; or evacuations might fail to provide sufficient supplies for the daily necessities of evacuees [5].
It is critical to have a comprehensive understanding of the evacuation behaviors for governments and organizations to plan, support, and respond to the evacuations. Local governments would benefit from a better understanding of how residents respond to evacuations so as to adjust the policies that aim to support vulnerable communities [6]. A better understanding of the possible evacuation destinations lead to more efficient logistic supply and effective humanitarian relief operations [7]. Traffic management also benefit from an accurate foresight of the evacuation flows during disasters [8].
Evacuation behaviors have been studied from different perspectives, however few have examined the relation between evacuation behaviors and geographic variances in the context, i.e. what meteorological or socioeconomic features affect evacuees' decision making. Survey studies have tried to explain evacuees' decision making processes at the individual or household level using their personal characteristics [3,9,10]. For example, Perry showed that evacuation destinations are related to the geographic distribution of one's social network, i.e. people tend to move to places where their friends or family are [3]. Simulation methods usually take these behavioral findings to model evacuation flows, so as to find optimal choice of evacuation routes and shelters [11,12]. Few of these models incorporate mobility information from actual evacuations. Large-scale spatiotemporal data such as GPS, cell phone data and geotagged social media data enable researchers to model population evacuation behaviors at societal scales [13,14]. Most of recent studies aim to reveal the common patterns of evacuation trajectories, for example the distance and regularity of evacuees' mobility trajectories [14,15]. However, these studies fail to provide detailed evacuation behaviors with specific spatial information, such as evacuation destinations, which is necessary to improve evacuation management. In addition, few studies have looked into the geographic factors that might affect evacuation behaviors, i.e., the choice of an evacuation destination can be related to the environmental characteristics of the destination. The geographic variances may explain whether people evacuate and where they evacuate to.
In this paper, we aim to model and predict evacuation flows across different geographic areas, and to examine environmental indicators that contribute to variances in the volumes of evacuation flows. To do so, we will use geotagged tweets, which have been used in the past to successfully model large-scale displacement and dispersion across intra-city, metropolitan, and inter-city movements [16]. With continuous geotagged tweets, we can approximate the spatiotemporal trajectories of users, therefore inferring their movements across geographic areas during a disaster. Although our findings will mostly reveal disaster behaviors across the Twitter population, it is important to highlight that surveys during disasters also suffer from biases in the population that responds to them [17]. We expect our findings to be analyzed in combination with survey results, rather than separately, in an attempt to build a comprehensive picture of the actual evacuation events for different population groups.
Specifically, we look into the case of Hurricane Irma, a catastrophic hurricane that reached category 5 intensity in 2017. Starting from August 30th, the National Hurricane Center observed the formation of Hurricane Irma and issued notice. With the forewarning, areas that were potentially threatened by the hurricane, including large areas of Florida, and partial areas of Georgia and South Carolina issued evacuation orders or emergency declarations as coping strategies [18]. It is estimated that more than 6 million people evacuated from their homes [19]. We collected geotagged tweets of the southern states of US for months that include different phases of Irma. We model the evacuation flows at three geographic levels: across states, across urban and rural areas of states, and across counties of Florida. Then, we build models to predict evacuation flows across counties using environmental features that include the hurricane's destructive force, the socioeconomic context, and the government evacuation policies. Lastly, we identify the important factors in evacuations by examining how these factors affect the prediction results.
The contributions of this paper are: • A method to model the evacuation flows across different geographic areas with geotagged tweets, which might be indicative of population evacuation patterns during Hurricane Irma. • A model to predict the evacuation flows based on environmental features including meteorological indicators, socioeconomic status, and evacuation policies. Results show that our model can largely explain the variances in evacuation flows. • An analysis of the impact of features in the predictive model, which shows how the variances in certain factors are related to the volumes of evacuation flows. It provides a way to quantify the impact of factors on predicted evacuation flows, as opposed to qualitative analysis based on stated information gathered by survey methods. In the following sections, firstly we summarize the previous studies on evacuation behaviors and show that the research question we work on has rarely been studied. Then we present the background information on Hurricane Irma and describe the dataset and methods for modeling evacuation flows at different geographic scales. In the last section, we present our results and discuss the implications as well as the limitations of this study.

Related work
Evacuation behaviors have been studied from three perspectives: (i) survey-based methods, to understand the decisions made by individuals and households during a evacuation; (ii) simulations models, to optimize the evacuation routes or resource strategies with respect to road networks, supplies or shelters; and (iii) data-driven models, to understand the patterns of human mobility behaviors during evacuations.
Evacuations involve several consecutive decisions, for example, whether to evacuate or stay, or when and where to evacuate. Previous studies have looked into the factors that relate to these decision-making processes using surveys. Perry built a social-psychological framework on individuals' decisions to evacuate, which suggests three major variables: the presence of an adaptive plan, the individual's belief in the perceived threat, and the level of perceived risk [3]. Following this work, researchers investigated how these factors influence decisions to evacuate [20], the choice of evacuation departure time [21,22], and evacuation destinations [23]. Individual and household characteristics are usually analyzed in the decision-making models, including the demographics, socioeconomic levels, evacuation notices received, and social relations. Survey-based studies provide a comprehensive understanding of how individuals or households make decisions on evacuations during disasters. Nevertheless, the population evacuation results, i.e., evacuation flows, are not results of the aggregated effect of individuals' or households' decision-making, but rather results of a combination of complex individual and community-level interactions [2]. In this study, informed by the survey-findings, we look into factors that relate to the population evacuation behaviors at societal scale.
Simulation models have been widely used to characterize population evacuation behaviors for logistics coordination and disaster relief. These models are generally built on assumptions about how people choose evacuation routes, for example, the optimal routes are the ones incur least cost [11]. These assumptions are used to find the optimal locations of shelters [12] or to plan for traffic management [24]. However, simulation models usually simplify the computation of evacuation flows and fail to incorporate data of actual evacuations. We propose a method to characterize evacuation flows based on spatiotemporal data released by people who evacuated in a disaster. The purpose is to have a better understanding of actual evacuation flows and be able to predict the evacuation flows based on geographic variances. Such models can be potentially applied to improve the accuracy of simulation models.
Large-scale digital traces as GPS traces, cell phone records, and geotagged tweets, have become critical sources of information to study human mobility behaviors during natural disasters. Lu et al. used cell phone records from 1.9 million residents during the Haiti earthquake to characterize population mobility trajectories. They showed that despite large variances in travel distances, the trajectories and movement patterns are predictable with social bonds [7]. Song et al. used GPS data to model population mobility patterns after a disaster [13]. Song et al. found that there are basic laws that can explain the variances in mobility distance and evacuation destinations [25]. Previous studies have shown geotagged tweets can be reliable sources to model movement within and across cities [16,26]. Wang and Taylor used geotagged tweets to quantify human mobility distances before, during, and after disasters of hurricanes and typhoons. They found that trips follow the Levy-Walk model where short-distance trips take the majority [27,28]. Most of the studies aim to characterize mobility patterns in terms of the distribution of mobility distances or the perturbations in mobility directions. Few have used large-scale spatiotemporal data to characterize evacuation flows at different geographical scales. Hara and Kuwahara used GPS data to model the traffic flows of a large city in Japan after disaster [4]. However, GPS data is not always available due to privacy or company regulations. We uses Twitter data to model the evacuation flows in a large geographic area across states and to incorporate contextual features to examine potential factors on the prediction of population evacuation behaviors.

Phases of hurricane Irma
To model evacuation behaviors, we identify the following phases of Hurricane Irma based on its tracking positions [19].
Pre-hurricane phase, which is the period of two weeks before the warning of the hurricane. It is assumed that most of the people stay in a permanent location during this period (possibly their residence).
Hurricane preparation phase, which lasts from August 30th to September 9th. On August 30th, the National Hurricane Center observed the formation of Hurricane Irma in the Atlantic Ocean and issued warnings about the potential incursion into the US. We extend the time range to one day before the hurricane makes landfall in the US.
Hurricane landfall phase, which is the time when Hurricane Irma lands on US territory and decreases to tropical storms in the state of Georgia. It lasts from September 10th to September 11th. We extend this phase with one day since related work has shown that evacuees can stay a minimum of one night in their evacuation (landfall) zones prior to re-entry [29].
Hurricane recovery phase, which is two weeks after the hurricane's landfall. After the hurricane leaves, the evacuees start to return to their pre-hurricane locations. Since reentry time varies across people, part of the evacuees are probably still traveling in this phase.
Post-hurricane phase, which is two weeks after the hurricane recovery. People are assumed to have returned back to their pre-hurricane locations according to previous studies on evacuation behaviors [17,27]. Figure 1 shows the tracking positions of hurricane Irma collected by the US National Oceanic and Atmospheric Administration (NOAA) [19]. The timeline below shows how we define the phases of Irma in US territories.

Data collection
We aim to use geotagged tweets from six US southern states to model the evacuation flows during Irma. For that purpose, we purchased all historical tweets through the Twitter Enterprise Search API which allows to acquire tweets within specific geographical boundaries and temporal ranges. Next we provide further details about the tweets acquired. Spatial Bounds. Figure 1 shows that Florida (FL) was the most affected state during Irma. However, due to the uncertainty in the hurricane's moving path, the states of Georgia (GA) and South Carolina (SC) were also issued notice of warning, which could have lead to evacuation behaviors. Therefore, we aimed to observe the evacuation behaviors of residents in these states. Previous studies [14,15] show that evacuation mobility follows the Levy-Walk model, where short distance trips are the majority. Thus, we add the states that share a border with FL, GA and SC; and purchased all tweets with spatial bounds set to cover the states of North Carolina (NC), Tennessee (TN), Southern Carolina (SC), Georgia (GA), Alabama (AL), and Florida (FL).
Temporal Bounds. In Sect. 3 we defined the different phases of Irma that start from August 15th to October 12th. As a result, we collected all geotagged tweets in the specified spatial bounds during this time range.
With the spatial and temporal bounds, we retrieved 9,263,399 tweets, which include tweets that are tagged with latitude and longitude (geotagged), as well as tweets that are tagged with geographic entities such as counties or neighborhoods. Since there are no common standards to regulate the entity tags people use, it is difficult to accurately map the entity tags to geolocations. Thus, we only use the 1,013,313 geotagged tweets, which were posted by 127,181 Twitter users. Figure 2 shows the number of geotagged tweets and observed Twitter users in each state. Since the geographic bounding box cannot perfectly align with the boundaries of states, there might be a few tweets from other states. In addition, a user may move across states during Irma. As a result, the total number of users shown in the figure can be larger than the number of unique users in the dataset.
Finally, we also carried out a tweet cleaning process to eliminate tweets that might not represent individuals, but rather news platforms or other social entities. For that purpose, we used the traditional approach based on ratios between followers and followees to eliminate users with extremely large numbers of followers that could represent nonpersons [30]. Specifically, we eliminated users whose number of followers and followees were two standard deviations larger than the average values. Of the 127,181 Twitter users, only 116 were detected as non-persons. This is probably due to the fact that related work has shown that news platforms and other official accounts that belong to government ad-ministrations and that report important information during disasters (weather, road or evacuation information) do not usually post geotagged tweets [31].

Methodology
Geotagged tweets can be used to estimate the spatiotemporal trajectories of users; that is, the changes of locations in different phases of Irma. We present a method to estimate the evacuation flows across different geographic areas at state and county levels. Then we build machine learning models to predict county-level evacuation flows in Florida based on environmental features that have geographic variances, and identify the key factors related to accurate predictions of evacuation flows.

Identification of evacuation flows
To understand the evacuation and re-entry behaviors of evacuees, we measure the number of individuals whose stay location changes during three Irma phases: pre-hurricane, landfall, and post-hurricane. Changes in stay locations from pre-hurricane to landfall will reflect individuals evacuating, while changes from landfall to post-hurricane will mostly reveal individuals going back to their pre-hurricane locations. Although one could argue that landfall locations might not be end destinations for some evacuees who might still be traveling, Wong et al. showed that 90% of the population in Florida areas affected by Irma went back to their homes one week after the hurricane's landfall [17]. Thus, we argue, and our results in fact will show, that changes in stay locations from landfall to post-hurricane reflect actual re-entry behaviors.
We define stay location as the place where an individual has a permanent (pre-hurricane) or temporary home (landfall and post-hurricane). To identify stay locations for all the individuals in our dataset, we require users to have geotagged tweets in at least two days within each phase, which enables us to differentiate permanent/temporary stay locations (multiple tweets from a given location during a phase) from users passing by (one tweet in a given phase). Using all the geolocated tweets within a given phase, we compute the stay location for an individual as the center of gravity of all the locations visited in that phase. The center of gravity, computed as the geographical average location of all points (see Fig. 3), has been used in many studies using location and trajectory data and is a well tested approach [33][34][35][36][37][38].
To compute the center of gravity, we first project the geographic points with latitude and longitude into the NAD83 coordination system and then compute the stay location (center of gravity) as ( . . , n) are the geotagged coordinates of a user at a given phase. The stay location is then associated to the county whose geographic boundary contains the center of gravity. We consider that the stay location has changed between two phases if it is at least 1 mile away from the previous stay location. An individual will be defined as an evacuee if her stay location changes from prehurricane to landfall. Evacuees will be either in-county evacuees, when the changes in stay locations happen within the county, or out-of-county or out-of-state evacuees when the changes take place between counties or states. We selected 1 mile as the most stringent distance threshold that will reveal the upper bound for the number of evacuees. Higher threshold values will decrease the total number of evacuees identified. Nevertheless, the lack of ground truth prevents us from selecting what could be the optimal threshold, thus reporting the upper bound instead. Upper bounds are extremely valuable for emergency preparedness and response to prepare for the worst case scenario.

Figure 3 Identification of evacuation flows based on the geographic tweets of users at different phases
We model evacuation flows by aggregating the number of evacuees from pre-hurricane to landfall phases, and re-entry behaviors by modeling the number of evacuees between the landfall and post-hurricane phases. Evacuation flows are modelled at the county level with stratification at both urban and rural levels. A discussion of the results is presented in Sect. 6.

Relaxing two-day constraint
Applying the two-day constraint to each phase significantly decreases the number of individuals in our dataset to 2339 individuals. This is specially significant for the landfall phase as it only lasted three days. In this section, we explore the impact that relaxing that constraint for the landfall phase would have on the evacuation flows identified. Specifically, we maintain the two day constraint for the pre-hurricane and post-hurricane phases, while only requiring tweets on at least one of the three days in the landfall phase to compute the stay location. Although this approach obviously increases the number of individuals considered (up to 6726), one may argue that it might include individuals who were passing by a given location rather than reflecting a stay location during the landfall. Therefore, we next present an experiment to compare the differences in the volumes of evacuation and re-entry flows when using different constraints; and we will show that relaxing that constraint has a minor impact on the final volumes.
We first apply the one-day and two-day constraints during the landfall thus getting two sets of individuals with their stay locations. Next, we compute their evacuation flows based on their stay locations during the pre-hurricane and landfall phases. We separately generate two evacuation matrices with these individuals. Each row or column of the evacuation matrix represents a county, while the values in the matrix represent the number of people evacuating from origin counties to destination counties.
We use Pearson's correlation to test whether these two evacuation matrices are linearly correlated. The results show a high correlation between these two matrices, with a coefficient of 0.9880 and p < 0.05. In addition, we compute the ranks of origin-destination pairs and apply the Wilcoxon rank test [39]. There is no significant shift in the ranks with p = 0.6326. The tests show that when we soften the two-day constraint in the landfall phase, we manage to increase the number of individuals considered while maintaining similar patterns of evacuation flows across counties. Similar tests were computed for the re-entry flows computed using landfall and post-hurricane stay locations. The two sets of individuals with the one-and two-day constraints show correlations of 0.9898 between the two re-entry matrices with p < 0.05. The rank test also showed no significant shift in the rank of pairs with p = 0.3444.
This experiment shows that both the evacuation and re-entry flow patterns are maintained if we soften the two-day constraint in the landfall phase. Since the one-day constraint enables us to have a larger number of individuals to model, we report results and conduct prediction models based on the set of 6726 individuals with a one-day constraint in the landfall phase, and two-day constraints in the pre-hurricane and post-hurricane phases. It is also important to clarify that although we require a minimum of two days of data, stay locations for over 47% of users have been computed with at least 5 days of data, and over 32% of users had data for at least 8 different days which confirms the robustness of our approach.

Prediction of evacuation flows
We propose to predict the evacuation flows across different geographic areas based on contextual characteristics of these areas. Specifically, we take into account features including: (i) physical destructive force, which is measured using meteorological indicators; (ii) socioeconomic indicators, which are obtained from census records; and (iii) evacuation policy in each county. In Sect. 6 we will present the results of the prediction models for county-level evacuation flows in Florida since that was the state with the most damage.

Features
We introduce the features and data sources for constructing these features.
Meteorological Indicators. The meteorological indicators are collected from the Global Historical Climatology Network (GHCN), which is an integrated database that summarizes climate measurements from land surface weather stations across the globe [40]. We extracted the features that depict wind intensity, including the average daily wind speed (AWND), peak fastest 2-minute wind speed (WSF2), fastest 5-second wind speed (WSF5) and precipitation (PRCP) during Irma's landfall. Since the records of WSF2 and WSF5 have a high correlation of 0.95 across weather stations in FL, we only maintain AWND, WSF2, and PRCP in our analysis. Although one could argue that other meteorological factors such as storm surge or flooding might also impact evacuation behaviors, these are factors that cannot be observed prior to the evacuation. Thus, we have identified a set of meteorological factors that can be measured pre-evacuation and that are natural precursors of storm surge or flooding, among others [40].
Census Data. Meteorological indicators and flooded areas mainly measure the variances in the physical environment. Beyond that, we also collect census data to characterize the socioeconomic status of counties. The data is collected from the United States Department of Agriculture Economic Research Service [41]. Specifically, we extracted countylevel population, poverty rate, and unemployment rate.
Evacuation Policy. According to data from Florida's Department of Emergency Management (DEM) [42], there were 54 out of 67 counties that had been issued evacuation orders in response to Irma. Forty-two counties have mandatory evacuation orders, while the remaining ones were only issued voluntary evacuation orders. We employ the evacuation policy as a categorical feature with three values: no evacuation, voluntary evacuation, and forced evacuation. This feature is taken into account to examine whether the evacuation policy has an impact on the evacuation flows. We acknowledge that several evacuation orders for hurricane Irma in Florida focused on specific zones or types of populations across counties, rather than county-wide. Nevertheless, we decided to work with countylevel orders instead to account for the potential of shadow evacuations, a phenomenon that has been widely observed during disasters whereby residents living near mandatory evacuation zones also evacuate, despite not being forced to do so [43,44].

Machine learning models
We formalize the prediction of evacuation flows using a machine learning model whereby given a pair of counties, features characterizing each individual county and their interactions, we can predict the number of people that will evacuate from one county to the other. Our machine learning model is inspired by gravity models that approximate flows between two regions as being proportional to the regional population and inversely proportional to the distance between them [45]. The machine learning model proposed in this paper predicts the evacuation flow between two counties using not only traditional gravity model features per county, but also all the features described in Sect. 5.3.1 namely meteorological indicators, census data and evacuation orders. In addition, inspired by intervening opportunities models [45], the machine learning model is also enhanced with features that characterize the interaction effect between any two given origin and destination counties, modeled using all the features mentioned before. Next we describe the model in detail.
Given a county U i , we represent it as a set of features characterizing that county i.e., where i indexes the counties it i.e., i = 1, 2, . . . , 67 since there are 67 counties in Florida; and N indexes the features in Sect. 5.3.1. In addition, for each pair of counties, we compute the joint features w k , as shown in (1). Each joint feature is computed using a gravity-inspired transformation of the feature values from origin and destination counties [46]. Specifically, we take the feature values from the origin county and the destination county, multiply the values and divide them by the square of the distance between the two counties, where distance is computed as the separation between the two county centroids (see Formula (1)).
The idea behind this approach is that the number of people evacuating from one county to another is not only dependent on the contextual features of these two counties, but also depends on the interactions between the two counties and on the distance between them.
where k indexes the features, i.e., k = 1, 2, . . . , N . The model takes features from the origin county and the destination county, and the joint features of these two to predict the number of people in each evacuation route, defined as any pair of counties in Florida. After the construction of features, we use Support Vector Machines (SVM), Random Forest (RF), and Extreme Gradient Boosting regression (XGBoost) to train models. The data is randomly split, with 75% samples as training and the rest 25% samples as testing. We use root mean square error (RMSE), R-squared, and mean absolute error (MAE) to evaluate model performance.

Two sets of features
Previous studies have shown that the mobility flow between two geographic areas is highly related to the population, as well as to the distance between these two areas [45]. If that holds for the evacuation case, features that contain population variances might be highly indicative of the evacuation flows. To assess the role that other features, besides population, might play in the evacuation behaviors, we consider a second feature representation U i that removes the population feature from U i . In the following experiments, we will compare prediction performance on these two sets of features and analyze the impact of each feature in the prediction models.

Impact of features on prediction
We propose to use SHAP values to understand how the meteorological, socioeconomic and policy factors lead to variances in the evacuation flows. SHAP values are usually used to evaluate the impact of features on the final model output, especially when applied in complex models that are hard to interpret. The method "connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions" [32]. In other words, we can analyze the dependency relations between features and predictions in terms of which feature is more helpful in achieving high predictive accuracy, and also in terms of the direction of that relation i.e., a positive relation signals that increments in a feature correspond to increments in the prediction value, while a negative relation points to decrements in prediction values when features increase. In the following experiments, we will evaluate the overall importance of features in the model and identify features that are most indicative of the predicted variances in evacuation flows. We will also compute the effect of the interaction of features on the model output.

Results
To understand the evacuation flows across different geographic areas, we use geotagged tweets from six US southern states. We first identify the stay locations of individuals during the pre-hurricane, landfall and post-hurricane phases as explained in Sect. 5.1. Next, evacuation flows will be computed as flows from the pre-hurricane stay location to the landfall stay location; while re-entry flows will be computed as flows from stay locations during the landfall phase to stay locations during post-hurricane phase. Flows are extracted counting the number of people for each pair of origin-destination locations for evacuations and re-entry behaviors. Flows will be computed and discussed at three levels: state, urban/rural areas at state level, and county level.

State level evacuation flows
First, we show the results of evacuation and re-entry flows across states. This analysis will enable preparedness and response decision-makers to have a comprehensive understanding of large-scale evacuation behaviors during disasters. We do not report results for Tennesse (TN) since the resident sample was too small (only 21 residents were detected).
Mobility across five states. We present evacuation flows across five states: Florida, Georgia, Alabama, South Carolina, and North Carolina, which in the figures are shown as FL, GA, AL, SC, and NC. Figure 4 shows the evacuation and re-entry flows across the five states for the 5307 evacuees identified in our dataset out of a total of 6726 individuals with stay locations across the three hurricane phases, as explained in Sect. 5.1. We focus on presenting the numbers of evacuation flows and use the re-entry flows, from the landfall to the post-hurricane phase, to check if there are reverse mobility trends after evacuations.
First, we observe that 85.42% of the sampled evacuees (4533 individuals) across all 5 states do not evacuate to other states i.e., they are within-state evacuees and could either be in-county or out-of-county evacuees (next section will analyze these behaviors). Similar trends were reported for an online survey by Wong et al. specifically for the state of Florida. In fact, they found that the majority of evacuees in Florida during hurricane Irma stayed within state [17], although their percentages were lower than ours: 51.4% vs. 86.5% (1747 out of 2020 evacuees in Florida). We posit that such difference might be due to selection bias since the population in Wong et al. had 81.9% of female respondents, and generally higher income and education levels when compared to the population at large in Florida e.g., only 6.5% of the survey participants have a high school degree or below compared to 42% of the total Florida population. In addition, the survey was only distributed online thus leaving out individuals without internet access or smartphone access (according to Pew Research, in 2018 only 68% of US homes had access to broadband internet service). On the other hand, Twitter population has been found to be biased towards male, younger and certain ethnic groups [47]. As a result, these demographic and socio-economic differences make it hard for the two papers to be compared, and in fact, the differences in evacuation rates could be simply due to the fact that diverse populations are being analyzed. Nevertheless, trends albeit not percentages, are comparable.
Second, of the five states, Florida and South Carolina have more people going out than coming in during the evacuation. Using the number of individuals sampled in each state during the pre-hurricane phase as Twitter residents, 13.5% of Florida Twitter residents leave to other states (with 2.4% coming in), while 32.2% of South Carolina Twitter residents leave to other states (with 12% coming in from other states). Wong et al. showed larger percentages for Florida, with 48.6% of evacuees leaving to other states [17]. We posit that these differences are highly probably related to diverse selection biases both in their online survey and in our Twitter population [47].
Alternatively, Georgia and North Carolina have more people coming in from other states. For people who evacuated from Florida, 6.3% of them have their resettlement locations in Georgia, which corresponds to 46.52% of the total number of individuals evacuating to other states. North Carolina is the second state that Florida Twitter residents chose to move to during Irma's landfall, with 1.63% of Florida Twitter residents. The re-entry flows are reversed. Especially, we observe that the most popular re-entry flows are from Georgia to Florida, and from North Carolina to South Carolina, which respectively take 6.1% of Florida residents, and 14.58% of South Carolina residents. Wong et al. also found that the most visited state for Floridians that evacuated out of state was Georgia [17].
Although Alabama is more inland and has less risk of being hit by a hurricane, there is no significant number of people going to Alabama. That state has a relatively balanced number of people going out and coming in, which respectively takes 8.7% and 9.5% of Alabama Twitter resident population.
The state-level analysis enables us to have a comprehensive understanding of the preferred states that are chosen by evacuees. The major evacuation directions are from Florida to Georgia (as reported in Wong et al. [17]) and from South Carolina to North Carolina. Generally, the direction is from South to North. Although geographically close, few people choose Tennessee and Alabama as destinations during the landfall phase.
Mobility across rural/urban areas of states. Based on the state-level analysis, the majority of people stay in the same state during the landfall phase. However, these people may move within their states, which cannot be captured by the analyses presented in the previous section. In this section, we are interested in the analysis of evacuation and re-entry flows across the urban and rural areas of states.
We employ the Census Bureau definition of urban and rural areas. Urban areas are defined as the geographical territories that have at least 1000 people per square mile in the core census blocks and 500 people per square mile for the surrounding census blocks [48]. Urban areas are usually associated with facilities to support public services [49]. We refer to the geographic boundary file from the US Census Bureau to define the urban and rural boundaries [50]. Figure 5 shows the evacuation and re-entry flows across urban and rural areas for the five states. There is a common pattern across in-state evacuations: people tend to move away from rural areas and stay in urban areas during the landfall. Three states have the highest number of people moving: (i) There are 28.7% of Florida twitter rural residents staying in rural areas, while 71.3% evacuate to urban areas. Comparatively, there are only 16.3% of urban twitter residents who move from urban to rural areas, and 83.7% of them stay in urban areas. The majority of Florida twitter residents choose to stay in urban areas in the landfall phase; (ii) 66.8% of North Carolina rural residents move to urban areas, while only 17.6% of urban residents who move to rural areas; and (iii) in Georgia, 71.8%

Characterization of evacuation flows
State-level analyses have shown that most of the residents stay in the same state, and that within states, residents mostly evacuate from rural areas to urban areas. In this section, we focus on county-level evacuation flows to model population evacuation behaviors at a finer spatial granularity. Specifically, we focus on county evacuations for the state of Florida for two reasons. First, Florida has 2486 observable Twitter residents-the largest number in our sample, of which 2020 are evacuees with 1747 in-state evacuees. Second, Florida is the most affected state since Irma's landfall took place in the Southern part of Florida. Figure 6 shows a heatmap of the county-level evacuation (a) and re-entry (b) flows in Florida. Only counties that have incoming or outgoing mobility flows are shown in the figure. Each grid corresponds to the number of people moving from their origin county to a destination county. At the county level, we find that the majority of Florida twitter residents that evacuate (62%) are in-county evacuees i.e., they stay within the same county. Results reported by Wong et al. found that only 17.1% of individuals were in-county evacuees [17]. As explained earlier, we posit that these results might differ due to the selection bias present in both their online survey and our twitter population sample.  [51] and showed that evacuees have a preference towards metropolitan areas.

Prediction of evacuation flows
In this section, we present the results of the machine learning models for the prediction of evacuation flows across counties in Florida. As our analyses have shown, the re-entry flows show reverse patterns of the evacuation flows. As a result, if we can predict evacuation flows, re-entry numbers can be directly inferred.
We employ two sets of features. Set-2 is composed of U i that removes the population feature and normalizes all other features by its corresponding population, while set-1 is composed of U i . As explained in Sect. 5.3.3, we build models with set-2 to identify how much variance in evacuation flows can be explained by factors other than population and distance.
The evacuation flows across Florida counties are highly imbalanced, with a lot of origindestination county pairs having no observed values. To mitigate this problem, we use a sub-sampling method in the model training by randomly taking out pairs of counties that do not have any observed evacuation flows. A parameter k determines the proportion of samples with observed values and samples with 0 in training. If there are n pairs of counties with observed evacuation flows, we randomly take n * k samples that do not have evacuation flows to be included in the training data. The value of k that leads to the best prediction results is chosen. K is selected from a range of 0 to 5. When k is 0, it means we only use pairs that have evacuation flows.  Figure 7 shows the averaged R-squared values across repetitive experiments for models built on two sets of features. SVM-1, RF-1, and XgbTree-1 are built on set-1 that includes population factors. SVM-2, RF-2 and XbgTree-2 are built on set-2 that removes population factors. It shows that models built on set-1 have much better performance than models built on set-2. This result implies that the population feature highly contributes to the variances in evacuation flows.
Models built on set-1 have the best performance when k is 0.5. Specifically, the model trained by SVM-1 has the best performance in terms of the average R-squared value, which is 0.7192. This result reflects that, on average, SVM-1 with features that include population factors can explain 71.92% of the variance in the evacuation flows. Table 1 shows the results of the models on set-1 for k = 0.5. Consistently, SVM-1 has better performance evaluated by RMSE and MAE.
Models built on set-2 have the best performance when k is 2. The highest averaged R-squared is achieved by XbgTree-2, which is 0.5511 and 13.24% smaller than that of XgbTree-1. These numbers show that environmental factors other than population are also predictive of evacuation flows, and that by adding population factors, the prediction performance can be increased.
For SVM-1 that has the best prediction performance, we plot the distribution of the predicted values against the real evacuation numbers across all train-test pairs (Fig. 8). This plot enables us to examine biases in the SVM-1 model. The plot shows that for large evacuation flows, the predicted values are high, while for small evacuation flows, the predicted values are correspondingly smaller. However, the predicted values for some evacuation flows tend to have larger variances across experiments, which might be due to sample bias during training.

Interpretation of the environmental features
R-squared values show that environmental factors combined with population are highly predictive of the evacuation flows across counties in Florida. In addition to that, we are interested in examining what are the most important factors for the prediction of evacuation flows other than population. We employ the SHAP indicators to evaluate the impact of features on the best model output [32].
Overall Feature Importance. Fig. 9 shows the SHAP values for features that have relatively higher importance in the prediction model. The features are ordered according to the sum of the impact over all test samples. Each SHAP value is computed per prediction task and represented as a point in the Figure. Positive SHAP values indicate that the feature tends to push the predicted value high, while negative SHAP values indicate the opposite impact.
We observe that the distance between the origin and destination counties is the most important predictor. Specifically, when the distance is short, the impact on the output value is high and positive i.e., county closeness is related to increases in predicted evacuation flows, while long distances between counties tend to have negative impact on the predicted evacuation flows. A similar relation was reported in a survey conducted by Cheng et al. post-hurricane Floyd in South Carolina in 1999 that showed that evacuees preferred to evacuate to places that they perceived as being safe and closer to home than traveling longer distances [51].
The second most important features are wind features (AWND.O, AWND.D). The wind intensity of the origin county has a positive impact, which reflects that high wind intensity is related to increases in the predicted evacuation flows. Similar results have been reported by researchers using household surveys to assess the impact of weather features on evacuation decisions [52]. Wind intensity and precipitation at the destination county (AWND.D, PRCP.D) also have positive impact in the prediction model i.e., higher values are associated to larger predicted evacuation flows. Ignoring the weather conditions at evacuation destinations might appear to be counter-intuitive. We posit that since we are combining socioeconomic features and meteorological features together, it could be that the model is using environmental features exclusively because they happen to be highly correlated to distance or socioeconomic features making the feature a useful confounding factor with no direct relation with the evacuation flows. In fact, our model also shows that the poverty rate at destination plays a role in the prediction of evacuation flows as we discuss next.
Other important features are the socioeconomic indicators of origin and destination counties. According to the SHAP values, when the poverty rate of the origin county is low, the SHAP values tend to be positive. The effect of unemployment rate is similar. Together, these features show that high socioeconomic status is associated to higher predicted evacuation flows. The evacuation behaviors are also related to the socioeconomic status of destination counties. The SHAP values associated to PovertyRate2016.D are negative when feature values are high, which indicates that high poverty rates at destination are related to lower predicted evacuation flows. Similar relations has been shown in the literature studying household-level behaviors during evacuations. For example, the poverty rate of evacuation destinations has been shown to have a negative effect on evacuation rates [23]; high-income households have been reported to be more prone to evacuate [10,21]; and income has been found to have a small positive effect on evacuation numbers [29].
The least significant features for prediction are the evacuation policy in origin and destination counties. As the feature has categorical values, we created dummy variables for each category to be able to carry out the SHAP analysis i.e., no evacuation, voluntary evacuation and forced evacuation of origin and destination counties. These three types have small SHAP feature importance values, with voluntary evacuation being the most important. This result reflects that evacuation policy does not have much impact on the accuracy of the predicted evacuation flows for twitter users. It is important to clarify that the SHAP analysis does not imply that evacuation policy impacts lightly on population evacuation, which would be contrary to the related literature, but rather that the feature itself, and when compared to other meteorological and census features, is not as helpful in the prediction model i.e., does not provide as much information in enhancing the accuracy of the prediction. Nevertheless, our more nuanced interaction analysis in the next section will show that, for small origin to destination county distances, voluntary evacuation orders have a positive effect on the prediction i.e., the presence of a voluntary evacuation order is associated to larger predicted evacuation flows when the two counties are close to each other. This result is consistent with related work that showed that evacuation orders, either mandatory or voluntary, have a positive effect on residents' decisions to evacuate [10,21]. Examining the Interaction of Features on Prediction Output. In the previous section, we showed that distance is the most predictive factor for evacuation flows. Next, we will evaluate the interactive effects between distance and the other meteorological and socioeconomic factors at origin and destination counties. This analysis will provide more nuanced insights into the effect of specific factors on predicted evacuation flows when mediated by distance. Figure 10 shows the SHAP values that reflect the interactive effects of paired features for wind variables and poverty rates mediated by distance. The color of the dots represents the distance value in each prediction task. For each dot, the x-axis value corresponds to the normalized feature value and the y-axis value corresponds to SHAP value. In the plot, we can observe that distance plays a dominant role in almost all prediction tasks. The red dots usually locate close to the y = 0 line, which indicates that the interactive effect on the predicted evacuation flows is small when the distance between counties is large. In comparison, blue dots usually deviate further away from the y = 0 line, reflecting that when distance is small, the interactive effect between distance and environmental features is larger. In other words, the effects described in the previous section mostly take place at small between-county distances. Figure 10 shows that the effect of wind speed in origin counties is generally positive when the distance between origin and destination counties is small (see WSF2.O Figure  with blue dots associated to positive SHAP and positive wind speed values). This result reflects that high wind speeds are associated to larger predicted evacuation flows when counties are close to each other. For origin counties with relatively low wind speed, the predicted evacuation flows are also lower when the distance between origin-destination counties is small (see the blue dots for negative SHAP and negative wind values in the same figure). Similarly, wind speed at destination county (WSF2.D) has a positive effect on the predicted evacuation flows when the origin-destination county distance values are small. This result reflects that high wind speed at destination is related to an increase in the predicted evacuation flows when the evacuation distance traveled is small. Related work via household surveys has shown that weather features impact evacuation decisions [52] which together with results showing that individuals prefer to evacuate to places closer to home [51] could explain these findings.
On the other hand, Fig. 10 shows that when the poverty rate is high in the origin or destination county, the effect is always negative (see blue dots for high poverty rates and low SHAP values in Figures PovertyRate2016.O and PovertyRate2016.D). In other words, the predicted number of evacuees decreases when socioeconomic levels are low at origin or destination counties that are close to each other. In addition, we also observe that such relationship holds as well for cases where the destination county is located at long distances (see red dots in PovertyRate2016.D). These results can be interpreted as a confirmation of previous work at the individual household level that showed an individual preference towards evacuating to high income counties and to counties that are closer to home [23,51].
Next, we discuss the interaction between distance and evacuation policy. We focus on voluntary evacuation policy in origin counties since, as discussed earlier, it is the one with the largest SHAP feature importance. Figure 11 shows the SHAP values for the interaction between travelled evacuation distance and voluntary evacuation policy (where 1 means voluntary). We can observe that for counties that issued voluntary evacuations, the SHAP values tend to be positive at small distances i.e., voluntary evacuations to counties close to the origin are associated to larger predicted evacuation flows. On the other hand, if the origin counties do not have evacuation orders, the SHAP values tend to be negative, which means that for counties with no evacuation orders,the predicted evacuation flows to counties nearby are smaller. Survey-based studies have shown a similar relation between evacuation orders and distance to home [51].
To summarize the findings, distance between origin and destination counties is a dominant factor that appears to highly impact predicted evacuation behaviors, especially when the distances are short. Socioeconomic factors are positively correlated to the predicted evacuation flows as well. Evacuation policy has a very small effect on the prediction of evacuation flows. However, we still see the positive impact of evacuation policy with coun-ties with voluntary evacuation associated to higher predicted evacuation flows, especially for counties close to evacuees' homes. We posit that these findings can be potentially helpful for local governments and organizations to coordinate or issue policies on evacuations, specially with respect to destination coordination.

Discussion
Using geotagged tweets to model evacuation flows has certain limitations. Previous studies show that less than 1% of tweets are geotagged [53]. Twitter users who frequently use geotagged tweets tend to be younger, belong to certain ethnic groups and have higher socioeconomic status [47]. Although the Twitter population we examined is not fully representative of the real population, we can at least observe evacuation flows across counties with different socioeconomic levels. Compared to other demographic characteristics, socioeconomic status is the dominant factor in individual's decision-making during evacuations [3,21].
Second, we might introduce bias when we determine stay locations using geotagged tweets. Twitter users tend to post tweets tagged with locations in metropolitan areas [54], which means mobility in urban areas can be better captured with geotagged tweets. It can be seen from our results that the observable Twitter population is much larger in urban areas, or in urban counties. We partially correct for that by comparing Twitter population proportions rather than absolute numbers. In addition, we employ method to compute SHAP values for features so as to disentangle the environmental factors on the relative large or small values of output. The analysis is less affected by the bias in the modeled evacuation flows.
Third, the stay location threshold has been defined as the most stringent distance (one mile) due to the lack of ground truth data that prevents us from selecting what could be the optimal threshold. As a result, we can only report with certainty the upper boundary for the number of evacuees. This number is still extremely valuable for emergency preparedness and response to prepare for the worst case scenario. In future work, we aim to explore how survey-based and Twitter-based results could be used in combination to balance biases and hypothesize about potential ground truth values to better validate quantitatively the approach presented in this paper. In addition, we believe that qualitative validations via mapping tools is also a promising research avenue.
Finally, we use Hurricane Irma as a case study for the initial exploration of modeling large-scale evacuation behaviors with geotagged tweets. For future work, we will incorporate data from other types of disasters to compare the effect of environmental factors on evacuations. Here we present a study that partially validates findings from previous studies [55], and also reveals findings that could potentially be interesting for researchers working on evacuation route simulations or on the determinants for the decision-making during evacuations.

Conclusions
Understanding evacuation flows across different geographic areas can potentially enhance decision-making processes put in place by governments and organizations for disaster preparedness and response. However, the existing work mostly studies evacuation behaviors at the individual or household level. We argue that population evacuations are not just the aggregation of individual evacuee's behaviors, but involve complex interactions with people and the environment. We have proposed a method to model evacuation flows with large-scale geotagged tweet datasets. In addition to that, we have built machine learning models to examine the relations between evacuations and geographic variances, which take into account the physical destructive force, socioeconomic indicators and evacuation policy. To the best of our knowledge, this is the first study aiming at understanding evacuation behaviors across large geographical areas and revealing the effect of multiple dimensions of environmental factors on the evacuation flows.
Our results provide nuanced perspectives on evacuation flows at different geographic levels: i) state-level, which characterizes the evacuation flows across states; ii) urban/rural areas within states, which reveal the preferences to urban/rural areas when people choose to stay in state; and iii) county-level, which depicts the evacuation flows across counties in Florida. We find that many Twitter residents did not evacuate (stayed in their original county) during Irma. People who evacuated but stayed in their own state have a preference for urban areas during the landfall. The county-level evacuation flows in Florida are highly imbalanced and a few origin-destination county pairs have the majority of evacuees. We also show that the combined environmental factors of origin and destination counties can predict county-level evacuation flows with relatively high accuracy. The population of origin and destination counties plays an important role in predicting the number of evacuees. However, if we control for population factors, the variances in evacuation flows can be largely explained by geographic variances. Specifically, the distance between origin and destination locations is the most important feature for prediction. The dependency relation shows that lower socioeconomic status at origin is associated to lower predicted evacuation flows, and that higher socioeconomic status at destination is related to higher predicted evacuation flows.
Decision-makers working on disaster preparedness and response might benefit from this paper on different perspectives. First, the study depicts evacuation flows at different geographic levels, which could provide information to design better traffic management and logistics when disasters happen in the similar regions. Second, it reveals the pattern between evacuation flows and geographic variances, which could potentially be used to infer evacuation flows in other cases.