Population movements based on mobile phone location data: the Czech Republic

ABSTRACT The paper presents new approaches to the visualisation of origin–destination flows, in which all three basic parameters of flows between pairs of geographic objects are cartographically expressed simply and clearly: the length of flows, their intensity, and the proportional distribution of both directions between pairs of objects (polarisation of flows). The data on population movements based on mobile phone location are used as the input information, which were collected from the whole territory of the Czech Republic. Apart from the visualisation of origin–destination flows, the paper addresses the issue of the transformation of these data through the application of two different interaction measures. The transformed flows are also cartographically visualised and the functional regions based on the respective interaction measures are used as base maps.


Introduction
Human geographers and urban and regional planners are traditionally concerned with various aspects of origin-destination flows and study them for a range of reasons. The regular daily movement of people is one of the basic indicators used to recognise human mobility, and as such it is usually the primary source of input information in the analysis of settlement and regional systems. In the past, population censuses were the main source of information on daily travelto-work and travel-to-school flows. The development of information and communication technologies has brought new possibilities to the analysis of human mobility.
The main objectives of the paper are to identify spatial patterns in the daily flows of people, based on their mobile phone location, and to visualise them cartographically, using the Czech Republic as an example. The paper presents a new form of cartographic visualisation which, to the best of our knowledge, has not yet been used. Apart from regular daily flows, we also present weekend flows and compare both processes. We aim to outline the ways to transform origin-destination flows through selected interaction measures and to contextualise the spatial distribution of regular daily flows of people. In doing so, transformed daily population flows are depicted, and respective functional regional systems are used as base maps.
The paper consists of the introduction, a discussion of the theoretical background, a description of the data and methods, a discussion and a conclusion. It is not strictly monothematic; it is centred on several issues that are directly concerned with the cartographic visualisation of origin-destination flows. The theoretical background discussion is divided into four thematic subsections.

Population flows based on mobile phone location data
Mobile phone location data has brought numerous opportunities for research into population movements at various hierarchical levels, with a range of periodicity, and with regard to changes in these movements over time. The location changes of individuals can be recorded in just a few seconds, so we can generally speak of big data.
Many studies using mobile phone location data are set in large cities, agglomerations and wider urban systems (Fan et al., 2018;Galpern et al., 2018;Yin et al., 2021). Collected mobile data can be used to identify changes in the location of an urban population during the working day, to plan urban transport systems, etc. The knowledge of basic patterns of spatial mobility is also very important in relation to the COVID-19 pandemic (e.g. Hara & Yamaguchi, 2021).
The determination of so-called anchor points (Ahas et al., 2010) is important when using mobile phone location data in relation to the daily movements of people. For an economically active population typical anchor points are home and workplace, for students they are home and school. Thus it is possible to define city hinterlands (Šveda & Barlík, 2018) and functional regions (Novák et al., 2013). Yet, most studies are concerned with urban and suburban spaces and with attractive touristic locations. Regarding the character and extent of the data, studies of whole countries are exceptional, mostly carried out in smaller countries such as Estonia (Novák et al., 2013), where the intramax regionalisation procedure was applied.

Origin-destination flows and interaction measures
Scalar information is very easily transformed into relative values. It is related in a standard way to the whole, and referred to as percentages, which enables one to compare areal units of different sizes. Origin-destination flow information cannot be transformed into relative values in such an easy and unambiguous manner. In the analysis of spatial population movements the use of interaction measures can be seen as a way to normalise flow data. In the interaction measures we relate all flows T ij , i.e. all flows from the origin i to the destination j, to total outflows from the origin k T ik and to total inflows to the destination k T kj . Apart from the transformation, the interaction measures also symmetrise the flows when the normalised flows from i to j are added to the normalised flows from j to i.
An overview of the most frequently used interaction measures can be found in Casado Izquierdo and Propín Frejomil (2008). They present 11 interaction measures; not all of them can be seen as a correct normalisation of flow information (their article has no such objective, however, they only summarise previous interaction measures). For the purposes of regional taxonomy two basic interaction measures are most frequently used: Smart's interaction measure and the additive interaction measure. Smart's interaction measure was proposed but not used by Smart (1974): Smart's interaction measure is currently used the most; it was also used in all variants of the CURDS (Centre for Urban and Regional Development Studies) regionalisation algorithms CURDS (Coombes et al., 1986;Coombes & Bond, 2008) and in most alternatives for these variants used in other countries (see the next subsection on the functional regions). The accuracy of the measure lies in the fact that all T ij flows are normalised by k T ik and by k T kj simultaneously.
The alternative is the additive interaction measure used by Coombes et al. (1982) in the first variant of the CURDS regionalisation algorithm: Unlike Smart's measure the additive interaction measure does not normalise T ij flows by k T ik and by k T kj simultaneously, but piece by piece. It was also used in the study of the temporal development of labour commuting in the Czech Republic (Tonev et al., 2018). Comparisons of both measures in various territories are presented by Klapka et al. (2014) and Halás et al. (2019).

Visualisation of origin-destination flows
The computer-based cartographic and graphic visualisations of origin-destination flows of migration, commuting, moving commodities, modelled intensities of contacts, etc., has quite a long tradition (see e.g. Kern & Rushton, 1969;Tobler, 1975Tobler, , 1987. Nevertheless, it currently remains a challenging research topic (Andrienko et al., 2008;Andrienko & Andrienko, 2013;Rae, 2011). Several recent avenues of inquiry in this field can be identified. Here, we concentrate on the issues concerning the volume of interaction data and the related problem of their visualisation.
Relatively recently, large data sets (so-called big data) on various movements have become available, such as the mobile phone data used in this paper (see e.g. Andrienko et al., 2016). Obviously, it is not easy to visualise very large numbers of flows and there are at least three ways to mitigate the visual cluttering problem (Guo & Zhu, 2014): flow rerouting, surface generation and spatial aggregation. While the first solution is 'technical' in nature (see e.g. Jenny et al., 2017Jenny et al., , 2018, the remaining two can cause information loss. Nevertheless, leaving aside the issue of the surface generation, the necessity to aggregate flows (e.g. Andrienko & Andrienko, 2011;Guo & Zhu, 2014) and to use aggregated flows is sometimes inevitable, as is the case with the map presented in this paper. With aggregation there is often a problem with the flow data transformation (see the preceding subsection), which is used to avoid the bias caused by the different sizes of spatial units used for the aggregation. Another frequent issue is the amount of information shown by (carto)graphic flow visualisations, such as length, intensity and direction (see e.g. Hennemann, 2013;Koylu & Guo, 2017;Scheepens et al., 2016).

Origin-destination flows and functional regions
Formal and functional regions can be generally identified based on their structure. Formal regions are constructed from scalar (spatial) information, e.g. regions based on ethnic population structure, while functional regions are constructed from spatial vector flow information. In the latter case, examples include migration, travel to work, school, retail, and services. In a strict sense, a functional region is internally coherent and externally self-contained with regard to incident flows. In practice, functional regions are defined by procedures which seek to maximise flow frequencies within regions and to minimise flows crossing the borders of regions.
The definition of functional regions is detailed in overviews by Casado-Díaz and Coombes (2011) and Klapka and Halás (2016). The CURDS regionalisation procedure, including its variants, is probably the most frequent method, and it has been used for instance in Spain (Casado-Díaz, 2000), New Zealand (Newell & Perry, 2005;Papps & Newell, 2002), Ireland (Meredith et al., 2007), and Belgium (Persyn & Torfs, 2011). Due to their flexibility, universality and quality, which have been verified in numerous studies, these procedures are also used in this paper to visualise functional regions.

Data
We use the data on personal mobile phone locations provided by the biggest operator covering approximately 40% of the Czech market. Signalling data from the operator were used when each was recorded. Each cell is linked to a Base Transceiver Station (BST) and the territory of the Czech Republic is covered by 32,000 BSTs. The data were collected at the mobility peak just before the COVID-19 pandemic broke out. In order to identify places (basic spatial units) of origin (residence) and places (basic spatial units) of destinations, a period of four weeks was used (16 September-13 October 2019). The actual tracking of spatial movements between origins and destinations took place during the week beginning 7 October 2019. In the paper we primarily analyse and visualise regular daily flows; one map is constructed from weekend flows.
Regular daily flows were identified during the fourweek period. A flow between basic spatial units is considered to be regular when it occurred at least 12 times during the four-week period and a user always spends more than three hours at a destination. Based on this definition it can be assumed that labour and school commuting predominate, because other journeys (to retail or to services) usually do not have daily periodicity. For the given period, regular daily flows crossing the borders of elementary units (for elementary units see the next subsection) were created by 663,000 users, and flows within elementary units were created by 2,493,000 users. Weekend flows are based on one weekend at the end of the four-week period. This particular weekend was sunny and without precipitation; ideal conditions for weekend mobility. As a weekend flow we use the situation where a person spends all Saturday (24 h), part of Friday and part of Sunday within one basic spatial unit, either inside the person's place of residence (inner flow) or outside. Weekend flows represent tourism and recreational travels and movements of students and weekday workers to their places of permanent residence. Weekend flows crossing the borders of elementary units were created by 125,000 users.
The data have the following advantages: they proportionally cover the whole territory of the Czech Republic, they are firmly anchored with regard to the places of permanent residence and the places of regular daily movement, they cover the period of maximum mobility, and they cover the variability of different movements (weekdays vs. weekend, etc.). In contrast, the data are limited by the impossibility of identifying motivations for movement or determining the modes of transport, both of which can only be deduced.

Basic spatial units: elementary units
The basic spatial units (BSUs) for which movements are localised are based on the units used by Musil and Müller (2008). These are groups of municipalities connected through basic nodal relations to a central municipality which typically has a school, a post office and a health centre, and occasionally has a registry office and a building authority. We have defined 1451 BSUs using the labour and school commuting flows from the last census. Cities with more than 75,000 inhabitants (i.e. approximately regional capitals) are deemed to be individual basic spatial units. Thus BSUs have also upper population limit beyond which a municipality is not amalgamated with other municipalities. This is important for subsequent regional analyses and for relationships between urban and suburban areas, etc. BSUs are also called elementary units in the paper.
These elementary units are used because the Czech municipalities are very small; they are among the smallest in Europe (Klobučník & Bačík, 2016). If the municipalities were used, some of them would have no records of daily population movements based on mobile phone location data, while others would only have a smaller number of flows and that information could not be acquired due to the Czech legislation on personal data protection.

Method
The main ambition is to depict the origin-destination flows of people in one map, and to clearly express all three basic parameters of movements: length, intensity and the proportional distribution of both flow directions (polarisation of flows) between two places. The depiction of the length is only possible using the Euclidean distance between two places, because the input data does not include the actual route or the mode of transport. The intensity of flows is depicted by the width of the line. The depiction of the proportional distribution of both flow directions is done through the original procedure. We use a continuous colour gradient from red to yellow hues along the flow lines connecting elementary units (origins and destinations). The red hue indicates that in the relationship between a pair of points (A, B), there is 100% of outgoing flows from A and 0% of ingoing flows to A. In contrast, the yellow hue indicates that in this relationship there is 100% of ingoing flows to A and 0% of outgoing flows from A. The whole colour gradient from red to yellow is shown along the line connecting points A and B in these cases. When other ratios in the proportional distribution of reverse flows occur, a 'shorter' colour gradient is used along the line. If both directions are completely proportional (50% of ingoing flows to A, 50% of outgoing flows to A), the line connecting points A and B is only orange; the orange hue being precisely in the middle of the red and yellow hues. This general approach is further detailed below. The same method was used to visualise weekend flows and transformed regular daily flows based on Smart's and additive interaction measures.
Transformed origin-destination flows based on Smart's interaction measure are depicted in the map of functional regions that are defined in accordance with this measure; similarly, transformed origin-destination flows based on the additive measure are depicted in the map of functional regions that are defined according to this measure. Functional regions were defined using the iterative regionalisation algorithm, which is detailed by Coombes and Bond (2008), and by using the continuous constraint function (see Halás et al., 2015Halás et al., , 2019. Elementary units are successively amalgamated into proto-regions and regions based on the interaction intensity of a particular interaction measure. The proto-regions are dissolved when they do not fulfil the minimum parameters. The constraint function enables us to estimate the minimum size and self-containment parameters (in fact the trade-off between size and self-containment) based on the character of a regional system, and not normatively. We seek to maximise the difference (gap) between 'successful' functional regions and those proto-regions that are dissolved during the algorithm run. In our case the smallest functional region has a population of 22,000 and the least selfcontained regions have a self-containment 1 of around 0.68. For each functional region r the proportion of ingoing and outgoing flows was calculated. This can be described as a measure of the orientation of interregional flows: The resulting values of O r are depicted for each functional region in the choropleth maps.

Discussion and conclusion
Examining the map of 'Regular daily movements' shows us that the spatial distribution of regular daily flows, based on the mobile phone location data, fully corresponds to the distribution of daily travel-towork flows as shown in the last census (Erlebach et al., 2019). The Pearson correlation coefficient for regular daily flows based on the mobile phone location data and daily travel-to-work flows from the last census is 0.989 in the case of inflows, and 0.960 in the case of outflows. The largest regional centres (particularly Prague, partly also Brno, Ostrava, Plzen and others) have a distinctly nodal character, where the extent of their hinterland is a direct reflection of the position of their centres within the hierarchy of the Czech regional system. Inflows to these centres are clearly predominant. 2 Examining the map of weekend movements shows us that the expression of weekend flows has different spatial patterns to the weekday flows. Outflows from the largest regional centres prevail over inflows. Flows are considerably longer (they are not limited by daily rhythmicity) and the predominance of the largest centres, particularly Prague, is even more significant. This predominance almost totally obscures the flows between smaller regional and local centres, which are barely visible in the cartographic representation.
The transformed daily origin-destination flows based on Smart's interaction measure have specific spatial and intensity distribution compared to the absolute flows (see the map 'Functional regions based on Smart's interaction measure'). The inflows to Prague seem to be insignificant after transformation; their value is decreased to a considerable extent by very high total values of k T ki and k T ik , where the high inner flow of Prague T ii is part of both these flows. In contrast, Smart's interaction measure has higher values in the hinterlands of smaller regional centres which have a large number of employment opportunities (e.g. Mladá Boleslav as the home of the Škoda car factory). Smart's interaction measure can reach high values for pairs of elementary units with lower absolute flows between them. It is the case when the flows are of crucial importance to both units. Thus the definition of functional regions based on Smart's interaction measure levels the sizes of regions, when the regions of predominant centres have comparable area to the regions of smaller centres.
The additive interaction measure eliminates the differences between regular daily flows less than Smart's measure. Spatial distribution more closely resembles the distribution of absolute flows (i.e. the map 'Functional regions based on the additive interaction measure'). Only large-distance flows between regional capitals are eliminated. The sizes of the resulting functional regions are not levelled and they have bigger differences. Both definitions of functional regions are based on the particular interaction measure by which the elementary units are iteratively amalgamated. Therefore, flows should only minimally cross regional borders. This can happen for two reasons: only transformed flows between elementary units are depicted, however, the algorithm assesses relationships according to the interaction measures not only between a pair of elementary units, but also with proto-regions and the resulting functional regions; the algorithms amalgamate elementary units and proto-regions in various stages of iteration, therefore the transformed intensity of links incident to an elementary unit can change during the iteration.
The indices of orientation of inter-regional flows O r were visualised for functional regions based on Smart's interaction measure and for functional regions based on the additive interaction measure (i.e. the two maps showing regular daily movements based on Smart's/the additive interaction measure). Both cases are similar in their basic features when values greater than 1 are seen in economically central regions that have a surplus of job positions and opportunities in their secondary schools and universities. Values smaller than 1 are seen in the functional regions bordering on metropolitan regions (particularly Prague) and in economically peripheral regions with a shortage of job opportunities and low capacity at their secondary schools and universities. Part of the travel of people living in these regions involves travelling to work and school outside their own functional regions.
Apart from the practical depiction of origin-destination flows, the paper has added value in methodology. A simple and clear visualisation of the length, intensity and direction of regular flows in one cartographic object can be seen as a basic and very important factor when presented to the scientific community and the general public. The depiction of the proportional distribution of both flow directions through variable colour transitions is original. The typology or polarisation of flows through colour transitions has been depicted in different contexts and in different ways (Hennemann, 2013;Koylu & Guo, 2017). Research into and use of interaction measures is important for spatial analysis and regional taxonomy. The recognition that origin-destination flow information can be transformed or normalised can stimulate further geographical research. This methodological development can also be used in practical applications. For instance this knowledge can be used in the definition of administrative regions which should be at least partly comparable in size according to the principle of spatial equity. This can be accomplished through the transformation of origin-destination flows in order to define relevant functional regions.

Software
The visualisation of flows and transformed flows between elementary units is constructed in ArcGIS Pro 2.7 using the 'XY to Line' tool. As current GIS does not enable us to use gradient fill consisting of two colours with lines, the lines had to be converted to polygons using the Buffer tool. The 'width' (i.e. flow intensity) is expressed by the total amount of interaction in both directions for the particular relationship and by the value of the interaction measure. The colour gradient fill for polygons is done in QGIS 3.16 Hannover, because ArcGIS does not enable us to visualise gradient fill which would also express the proportion of flows in the relationship. The origins and destinations are centroids of the built-up areas of central municipalities within elementary units. In the relationship between a pair of elementary units we understand an origin to be an elementary unit with prevailing outflows, and a destination to be an elementary unit with prevailing inflows. Notes 1. The self-containment of the region r is expressed as SC r = T rr / k T rk + k T kr − T rr 2. We consider a place of residence to be an origin; but of course, flows are bi-directional when people return to their places of residence.
GA20-21360S]. The authors would like to thank the reviewers and the editor for valuable comments regarding the manuscript.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by Czech Science Foundation: [grant number GA20-21360S].