Defining distance thresholds for migration research

There exists a large body of research focused on migration distance, where migration is either the outcome of interest or used as an input variable to model something else. However, there is little consistency in the distance thresholds used: these are often arbitrary, based on administrative boundaries or constrained by definitions available in the dataset. This causes problems with comparison across studies, and in some cases where migration distance is poorly defined could lead to issues with interpretation of results. Using Binary Logistic Regression and drawing on data from the 2011 Census Sample of Anonymised Records for England and Wales, we demonstrate that the odds of migrating vary when considering a range of population characteristics across 16 distance thresholds. We argue that the choice of distance cut-offs needs to be population and context specific and that decisions about these cut-offs should be made carefully as part of the study design.


| INTRODUCTION
In publishing his seven laws of migration, Ravenstein (1885) set out a series of empirical generalisations about why people move, which groups are more mobile and how the distance over which migrants travel varies widely. Subsequently, a wealth of empirical research has expanded and built upon Ravenstein's laws (Rees & Lomax, 2019), with a focus on who moves where, for what reason, and over what distance. That the motives for migration vary over distance is widely acknowledged, however, Niedomysl (2011) argues that the precise nature of this relationship between motive and distance has, to date, been under researched due to inadequate data availability and the use of surveys with fixed response options. The result is a field of study which largely defines short distance migration as motivated by housing considerations and long-distance migration as motivated by employment considerations. Yet Clark and Withers (2007) argue that this is an over simplification given how complex family formation processes are.
A further distinction is often made between shorter distance moves being thought of as residential mobility and longer distance moves being considered as internal migration (Coulter et al., 2016).
This distinction between residential mobility and internal migration is problematic due to considerable ambiguity in what parameters are used to define what constitutes either a short or long distance move (Niedomysl et al., 2017). Assessing if a person has moved at all is further complicated as much of the data available only report a move if that person crosses an administrative boundary. In most contexts these boundaries are of uneven size and shape. This was identified as an issue by Ravenstein (1885), and with reference to contemporary research which relies on these datasets Niedomysl and Fransson (2014, p. 358) argue that 'migration scholars have had little choice but to hope that these problems are not too serious'.
Thus, many of the studies which consider migration distance tend to use an arbitrary cut-off to define the threshold for short versus long distance moves, or residential mobility versus internal migration.
The distance thresholds used are not uniform across studies, and are usually dependent on the definitions available in the dataset being used by researchers at the time (White & Mueser, 1988). Our aim is to demonstrate how migration propensity differs depending on the choice of distance threshold used when broken down by a range of population characteristics. This work will provide additional guidance to researchers who are interested in migration distance and looking for justification for choosing a threshold for analysis.
We first review literature from a large field which is focused on migration distance to demonstrate both the breadth of applications and the range of ways in which distance thresholds are defined. We then go on to provide evidence for variations in migration propensity over distance in England and Wales by assessing odds ratios across a range of population attribute categories for 16 distance thresholds reported in microdata from the 2011 Census Sample of Anonymised Records (SARs). Although our work focuses on the situation in England and Wales, our approach will be of relevance in other countries. However, some of the specifics relating to housing tenure, ethnicities, employment, etc. and indeed what may be perceived by people to be a 'short' or a 'long' distance will inevitably have different meanings in different places.
The rest of the paper is structured as follows: Section 2 provides a review of the literature where distance thresholds are used to measure migration; Section 3 provides an overview of data from the SARs and the logistic regression methods used; results are reported in Section 4, and discussion and conclusions offered in Section 5.

| REVIEW: DISTANCE AS AN OUTCOME OR AS AN INPUT
Distance features prominently in much of the research on assessing patterns, determinants or outcomes of migration, whether as the outcome of interest or as an input variable into a model designed to measure something else. It can be estimated-with a sizeable body of work evaluating the validity of different approaches (e.g., Niedomysl et al., 2017;Stillwell & Thomas, 2016), inferred based on moves within or across contiguous and non-contiguous administrative boundaries (e.g., Bernard et al., 2016;Foster, 2017) or based on measured distance between addresses (e.g. McCollum et al., 2020;Thomas et al., 2015). The heterogeneity in approach to modelling or defining distance is matched in the variety of thresholds used to distinguish 'short' and 'long' distance moves. Where administrative boundaries are used to determine distance, 'short' distance moves are those within or to contiguous areas, such as local labour market areas (Clark & Withers, 2007;Pelikh & Kulu, 2018), whereas 'long' distance moves are those which cross a boundary. Where actual distance is recorded (or estimated), specific thresholds vary with anything from two (Boyle et al., 2002) to eight or more categories used (Niedomysl & Fransson, 2014).
Across this literature, two themes emerge. The first is concerned with how factors relating to a move vary by distance (Thomas et al., 2019). The second considers the differences in the characteristics of people and households who move across different distances (Finney & Simpson, 2008). These themes emerge because the selectivity of migration is such that different sub-groups of the population are differently mobile, at different times in their lifecourse, for different reasons, across different distances.
We summarise the extensive work across these themes in two tables outlining how migration is defined/measured, the thresholds used for different distance cut-offs, and how the migration/distance variable is used. In terms of research framework, Table 1 contains literature which treats migration distance as an outcome, whereas Less than 10 km, 10-50 km, 51 km+ Assessment of multiple moves. Logistic regression of migration distance by individual characteristics. Clark and Huang (2004) Change of address reported between waves in British household panel survey, uses centroid to centroid distance of old and new local authority.
0-49 km = short distance; 50 km or over = long distance To assess sequencing of long distance followed by short distance move. Finney and Simpson (2008) One year transition between addresses, straight line distance. 0-4 km; 5-9 km; 10-49 km; 50-199 km; 200+ km To assess distance migrated by ethnic groups. Niedomysl and Fransson (2014) Residential re-registration with distance between small area centroids. To investigate the relationship between actual migration distances and migration-defining boundaries. Thomas et al. (2015) Postcode to postcode straight line distance over a period of up to 3 years.
Distance moved (continuous) as outcome in models.
To assess neighbourhood and city region variations in origins and destinations. Halás et al. (2016) Registration in a different municipality, measured as continuous distance.
Various cut-offs used for summaries with consideration of various choices.
Used to define functional regions.  Modelled from inter-zone distances. Continuous estimates analysed. To assess variation in mean migration distance between countries.
To evaluate regression-based estimates of intra-zonal moves. Niedomysl et al. (2017) Residential re-registration with distance between small area centroids.
<100 km; 100 km + To investigate the relationship between actual migration distances and moving distances inferred from either populationweighted or area centroids. Pelikh and Kulu (2018) Within or between local labour market (LLM) areas.
To investigate how education, employment, and family life shape spatial mobility. Champion and Shuttleworth (2017b) Patient re-registrations in different region.
Move across a region boundary. To assess whether rates of longdistance migration are declining. Champion and Shuttleworth (2017a) 10 year interval small area centroid/ postcode straight line distance. the US there has been a decline in overall migration rates, not just those across shorter distances (Cooke, 2011(Cooke, , 2013 To assess how migration motives change over distance. Sander and Bell (2016) Five year transitions. Moves between area types To assess inter-cohort differences in the intensity and pattern of migration. Boyle et al. (2002) One year transition postcode to postcode straight line distance. Biagi et al. (2011) Registration data of inter-provincial moves using linear distance in kilometres between the province centroids.
Short: Moves between provinces within the same region Long: Moves between provinces belonging to non-adjacent macroregions To define three models -All migration, short distance and long distance migration.
Foster (2017) Survey respondent report of move since previous year.
Short: Intra-county moves Medium: Intra-state moves Long: Inter-state moves To define three models investigating the compositional impact of population on migration. Wilding et al. (2018) One year transition postcode to postcode straight line distance.
10 km, 20 km, 50 km each used as cutoff Exploration of health and distance cut-off relationship. Thomas et al. (2019) Various: Moves between panel waves (UK -straight line distance between postcodes; Australia -Great-circle distance between two addresses) and Swedish register data (actual distance moved).

Continuous
To assess how migration motives change over distance. Andersson and Drefahl (2017) Swedish register data documenting moves from north to South Sweden, and return moves.
Long: North to South Sweden To assess mortality of long-distance movers within Sweden.
Short distance moves, often referred to as residential mobility rather than migration, are widely considered not to involve a significant change in the social or economic situation of an individual or household (Pol & Thomas, 2001), or movement away from the community or context of origin (Castro & Rogers, 1979). Understanding how odds of moving across different distances varies according to individual or household characteristics, as presented in some of the studies in Table 1, is one way to differentiate types of move. This may also be indicative of variations in possible outcomes following a migration event, as well as variations in motives for a move. Yet such research is more commonly conducted where migration distance features as an input into models, rather than as the outcome itself. A selection of studies which deal with migration as an input are summarised in Table 2.
Many have urged caution about overly simplistic distinctions between short and long distance moves, made according to theoretical assumptions as to the differences in what motivates either type of move (Clark & Huang, 2004;Clark & Withers, 2007). Others have explicitly tested how motivations for a move vary across distance (Niedomysl, 2011). More recently, Thomas  Using measured distance, rather than administrative boundaries, (Boyle et al., 2002) consider the relationship between poor health, material deprivation and migrant status, differentiating between stayers, short-distance (less than 10 km) and long-distance (10 km or more) moves. Wilding et al. (2018) and Andersson and Drefahl (2017) look at the health-migration-distance relationship in more depth. The former evaluates, for working age adults, at what 'long' distance migrants are more likely to be healthier than those who do not migrate, and are healthier than those migrating over shorter distances, using three distance thresholds to define possible 'long' distance moves (10, 20 and 50 km). Andersson and Drefahl (2017) consider the relationship between mortality and long distance moves between the North and South of Sweden. Others have focussed on short distance 'residential mobility' rather than long-distance moves alone (Coulter et al., 2016). In the context of tied-migration Boyle et al. (2003) examine the influence of children on the relationship between longdistance migration (50 km is the cut-off distinguishing short and long distance moves) and labour market status for women. Similar to studies cited in Table 1, Foster (2017) examined drivers of declining migration within America in terms of the demographic, social and economic characteristics of migrants. Distance is defined according to movement within or between administrative boundaries.
Beyond papers interested in the impact of migration across different distances on individual-level experiences or population composition, migration flows can be estimated for different distances accounting for the influence of socio-economic factors and migration behaviour (Biagi et al., 2011). Elsewhere, migration distance has also been used to evaluate changes in the pattern of migration (Bernard et al., 2016;Sander & Bell, 2016). Common across these studies is the-sometimes explicit-recognition of the importance of context when evaluating differences in distance moved. What constitutes 'long' distance in Sweden will be very different from the distance covered in Australia, even setting aside the varying construction and geographies of labour, housing and education markets. It is then critical to exercise caution in how distance thresholds are used to distinguish short or long-distance moves, particularly when drawing upon existing empirical studies to inform research design.
Distance moved is the outcome of interest in our analysis discussed in the next section. We provide a robust analysis of the relationships between the pertinent characteristics of people who move over a wide range of different distance thresholds, rather than limited groupings or arbitrarily defined distance thresholds or the crossing of an administrative boundary. Investigation of variations in distance moved as an 'explanatory' variable is outside the scope of our analysis. To undertake such work, there would need to be the relevant additional explanatory variables (which might well be different across a range of outcomes) and these are not necessarily available in the data used here.

| DATA AND METHODS
We use microdata from the 2011 Census Samples of Anonymised Records (SARs). The SARs are a 5% nationally representative sample of the enumerated England and Wales population and provide a rich multivariate dataset of individual characteristics. The SARs contain a distance moved variable which is calculated using straight line distance between postcode of origin and destination (where a postcode typically identifies a street or group of properties) and is released with the underlying continuous data grouped into distance categories.
These distance moved categories are summarised in Figure 1, which reveals that a large proportion of moves occur over relatively short distances, 37% of all moves occur below 3 km, 13% between 3 and 4 km and 21% of moves occur between 5 and 14 km. Cumulatively, half of all moves occur under 5 km and 79% of moves occur at 29 km or less. Migration reported in the SARs is a transition: a person's usual address on the census enumeration date of 27 March 2011 is compared to where they were usually resident 12 months before, and if these are different a migration is recorded. Other, interim moves that a person might make during that 12-month period are not captured in the census data. retirement) which are often triggers for migration (Bernard et al., 2014) and these groups demonstrate different migration intensities (Kalogirou, 2005). Grouping of ages is necessary to provide sufficient sample sizes when cross-tabulating with other variables. People aged under 16 and over 75 are excluded because of incomplete socioeconomic data for the oldest and youngest age groups. Sex is an important discriminator of migration propensity, especially when coupled with age (Rogers & Castro, 1981).
Marital status reflects key life-transitions which are known triggers of migration (Champion, 2005;Mulder & Wagner, 1993) defined here as single, married, divorced/separated or widowed. Housing tenure is a key differentiator of migration propensity (Boyle, 1993;Hamnett, 1991) and we define four groups, owner occupied, privately rented, socially rented, and people living in communal establishments.
Higher levels of educational attainment are often associated with higher rates of migration (Finney & Simpson, 2008), and we define two groups, people educated to below degree level and people educated to degree level and above. Differentiation between those born in the UK and those born outside the UK is included because of iden- Health is measured using a binary definition of Limiting Long Term Illness (LLTI) which combines the responses 'limited a little' and 'limited a lot' to a single affirmative response. Evidence suggests that, in general, good health enables migration but poor health motivates moves across shorter distances and is often a trigger for migration of older migrants (Boyle et al., 2002(Boyle et al., , 2004. Migration has been found to vary by social class (Catney & Simpson, 2010;Smith & Higley, 2012), which is defined here using the Registrar General's scheme distinguishing between I (Professional); II (Managerial and Technical); IIIN (Skilled non-manual); IIIM (Skilled manual); IV (Partly skilled); V (Unskilled); and a residual 'unclassified' (U) category for all those not assigned to a class. Finally, mobility has been found to vary by ethnicity (Finney, 2011;Lomax & Rees, 2015;Raymer et al., 2011) Figure 2 shows the odds ratios for moves over all distance thresholds by age group and by sex. Females are significantly less likely to move across greater distances than males. This is fairly consistent at around 0.9 at each distance threshold.
Ages 30 to 44 are generally less mobile across greater distances than the reference 16 to 29 group. Conversely, ages 65-74 are consistently more mobile across greater distances than the reference group, with relative differences increasing with increasing distance.
The odds ratio for this age group is greater than 1.5 from 20 km. Up until a threshold of 80 km and then again for the 250 km threshold   privately rented and socially rented tenants. Communal establishment residents are nearly five times more likely to move than home owners at or above 3 km. The odds ratios continue to rise to the 20 km threshold where communal establishment residents are more than seven times more likely to move at or above the distance threshold than homeowners. There is a steady decline in odds ratios to the 250 + km distance where the ORs are around 3.5.
To better see the results for privately rented and socially rented results, Figure 4b shows these groups with communal establishment residents removed. Some clear patterns emerge when focussing on these groups. Socially rented are consistently less likely to move greater distances, with the odds ratio declining from 0.74 at 3 km and over to 0.54 at 60 km and over. From the 5-km mark differences between these two groups begin to increase as distance increases.
Odds ratios increase for privately rented and at the 150-km mark, this group becomes significantly more likely to move over and above the distance threshold than the reference home owner group.
Figure 5 reveals that those educated to below degree level are consistently significantly less likely to move greater distances defined by the distance thresholds, relative to those who are educated to degree level and above. Odds ratios fall from around 0.75 at the 3-km threshold to just above 0.5 from the 15-km threshold where they remain fairly constant at higher distances.
Those with a limiting long-term illness are significantly less likely to move at or above the distance thresholds from the 5-km mark.
There is a general pattern of declining odds ratios to the 100-km cutoff. Those who are foreign born are significantly less likely to move than those born in the UK at or above the distance threshold to 80 km. From 100 km, there is no differentiation in distance moved between UK or foreign born groups. Though, as distance increases, the odds ratio does increase for foreign born across all distance thresholds. Figure 6 demonstrates that all social classes (II to V) are significantly less likely to move at or above the defined thresholds than the Professional (I) social class across most distances. Second most mobile at distance thresholds 3 to 15 km are those in Managerial and Technical roles (II). Consistently, social class V (unskilled) are the least mobile.
There is a U shape to the odds ratios for the skilled manual (IIIM) group where relative mobility declines to the 30-km threshold before increasing again. A similar but shallower U shape pattern emerges for group IIIN (skilled non-manual), with declines in odds ratios to 20 km before gradual increase as distance increases. Odds ratios for group IV (partly skilled) are relatively stable until 30 km, after which they increase. As distance increases, the unclassified (U) group odds ratios increase.
Results by ethnic group are split across Figure 7a, b to aid interpretability. The largest variability across all ethnic groups can be seen at the shortest distances reported. The Chinese group are significantly more mobile than the reference White British at and above the 3-, 5-, 7-and 10-km thresholds; the Bangladeshi group are significantly less mobile at the 5-and 7-km thresholds; and the Black African significantly more mobile at the 3-, 5-and 7-km thresholds. Differences across many groups are not apparent from the 10 km cut-off.
A notable trend is seen for the White Other group, who are consistently less likely to move at or above the distance threshold relative to the reference White British but exhibit declining odds ratios (i.e., are relatively less mobile) as distance increases. At 3 km or more, the Black Caribbean group are significantly more above this threshold relative to the reference group, but odds ratios decline as the distance threshold increases. From 10 km onwards, this group is significantly less likely to move greater distances than the White British reference group.

| DISCUSSION AND CONCLUSION
Our results demonstrate the wide differences that exist in the odds of There are some clear inflexion points for certain variables where analysis using different cut-offs would conclude different things, so here we are able to offer guidance which should help researchers interested in studying migration distance. Relative to those who are single, those who are divorced or separated become significantly less likely to migrate over greater distances at the 15-km threshold. This is also the distance at which those who are married become significantly more likely to move greater distances. Using our methods, a cut-off of 3, 5, 7 or 10 km would lead to the conclusion that there is no significant difference. In fact, from 15 km onwards the odds of migrating for divorced/separated continue to decline across all thresholds to 60 km and over, whereas it is around the 40-km threshold where the odds of migrating for married people stabilise. One would conclude that the divorced/separated group are more likely to migrate over all distance cut-offs from 15 km onwards and that those who are married are significantly less likely to migrate, but the odds ratios steadily decrease and increase respectively. So while 15 km looks to be a useful threshold in terms of differentiation by marital status, the magnitude of the difference would be interpreted differently depending on the distance cut-off chosen.
Although the odds of migrating for communal establishment residents is significantly higher than for homeowners at all distance thresholds, this rises rapidly from nearly five times more likely at or over 3 km to a peak of around seven times more likely at 20 km. The 20 km cut-off is insightful for this particular group given the rapid rise in odds ratios from 3 to 20 km. For those in privately rented accommodation, using a cut-off of between 3 and 20 km would lead to conclusions that this group are less likely to migrate than the reference homeowner group, however they become significantly more likely to migrate at thresholds of 120 km and over. For tenure type then, a 20-km threshold might be the most useful for establishing a large and significant difference, but it is useful to know that differences are apparent but less defined at other thresholds.
Analysis of those with a LLTI at the 3-km threshold would lead to conclusions that there is no significant difference compared to those without a LLTI. It is at the 5-km threshold that those with a LLTI reference. For 30 to 44, age groups are generally less mobile than the youngest 16-29 reference, a 10-km cut-off reveals an odds ratio of around 0.8, however this declines at all thresholds to 60 km where it then starts to increase. For ages 65 to 74, the odds ratio increase steadily to 60 km, where they then begin to decline.
Often the decision-making process when choosing distance cutoffs is guided or constrained by the availability of data. This is also true of our study since we are able to assess differences across 16 thresholds but are constrained by the data which can be extracted from the SARs and information is lost in the available categorisations.
The only way to get at the full information would be to use a continu- information about migration motives from survey data (e.g., as has been done by Thomas et al., 2019, andShuttleworth, Stevenson, et al., 2020b) would be a fruitful avenue for further research. We hope that by identifying that differences exist across a wide range of thresholds our paper will contribute to ongoing efforts to better understand and quantify variations in migration behaviour and propensity.
In developing the various distance threshold models, our experiences lead us to hypothesise that other variables categorised through different (arbitrary) cut-offs may also produce different results. We defined four age-groups which broadly relate to different life-stages but have found that different groupings lead to different influences of other variables. Geographers are well-acquainted with the 'modifiable areal unit problem'. There seems to be a 'modifiable categorical unit problem' whereby differently specified cut-offs of continuous/ordinal data may generate different conclusions being drawn.
In conclusion, our review has demonstrated that in the broad field of migration research, distance moved is used to answer a wide range of research questions, but that data availability or decisions about cut-offs result in substantial diversity in how distance is measured or categorised. Our empirical findings demonstrate that migration propensity varies across a number of distance thresholds, which differ in magnitude and direction depending on the migrant attributes being studied. The culmination of both review and analysis demonstrates that decisions about distance thresholds and cut-offs needs to be carefully thought through, and are also very context specific. We hope that this work serves to highlight that the choice of distance threshold should be given prominence in the study design, and that if there is any uncertainty and the data allows, that experiments over different thresholds should be carried out and results compared. Certainly in our analysis we find that using different distance cut-offs would result in different interpretations about relative mobility for different population sub-groups. Our work will be of use to researchers looking for guidance, justification or elements on which to reflect around the use of different migration distance cut-offs.

ACKNOWLEDGEMENTS
This work was supported by the Economic and Social Research Council, Grant/Award Number: 1223130.

CONFLICT OF INTEREST
The authors have no conflict of interest to declare.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the UK Data Service. Restrictions apply to the availability of these data,