Modelling cyclists’ route choice using Strava and OSMnx: A case study of the City of Glasgow

Previous research has demonstrated the influence of street layout on travel behaviour; however, little research has been undertaken to explore these connections using detailed and robust street network analysis or cycling data. In this study, we harness state‐of‐the‐art datasets to model cyclists’ route choice based on a case study of the City of Glasgow, Scotland. First, the social fitness network Strava was used to obtain datasets containing the number of cycling trips on each street intersection for the years 2017 and 2018. Second, we employed a Python toolkit to acquire and analyse the street networks. OSMnx was subsequently employed to quantify several commonly used centrality indices (degree, eigenvector, betweenness and closeness) to measure street layout. Due to the presence of spatial dependence, a spatial error model was used to model route choices. Model results demonstrate that: (1) cyclists’movement models were consistent for the years 2017 and 2018; (2) the presence of a spillover effect suggests that cyclists tend to cycle in proximity to each other; and (3) cyclists avoid streets with high degree centrality values and prefer streets with high eigenvector centrality, betweenness centrality and closeness centrality. These findings reveal cyclists’ desired street layouts and can be taken into consideration for future interventions.


Introduction
Active travel (AT) comprises travel modes that incorporate physical activity for all or part of a journey (e.g., walking and cycling). AT is known to reduce many urbanization externalities, including traffic congestion, noise pollution, climate change, health inequality, and social exclusion, and is able to boost life quality by improving physical and mental well-being (Avila-Palencia et al., 2018;Grabow et al., 2012;Hamer and Chida, 2008;O'Dea, 2003;Rissel, 2009). In many places, cycling has the potential to substitute for the private car since a large share of trips made by car are <5 miles (Department for Transport, 2019; Hong et al., 2020).
In order to foster cycling, several studies have explored the role that street networks including streets, paths, bridges, and cycle routes exert on shaping travel behaviour, travel modes and more specifically, AT (Boeing, 2019b;Braçe, 2016;Emmanouilidis, 2013). A street network may comprise either planned or unplanned components through accretion (Hanson, 1989) and can be summarised by network centrality indices (explained in 2.2) and the shape movement of road users. Furthermore, street networks facilitate a range of socio-economic activities, such as protests, public speeches, retail establishment locations and, eventually, the movement of people (Enström and Netzell, 2008;Mohamed et al., 2015). Street network centrality indices quantify the way in which each street segment relates to other streets within the network through various metrics (Emo et al., 2012). Natural movement theory, proposed by Hillier et al. (1993), states that street layout is a primary generator for an individual's movement. Numerous studies have been able to predict human movement through street layout. In particular, Hillier et al. (1993), Penn et al. (1998), and Hillier and Iida (2005) predicted 55-75% of human movement in London, while Read (1999) predicted between 60% and 70% in Amsterdam. Therefore, it is believed that cyclists' movement and street layout are also intertwined.

Background
Previous studies that have attempted to model cyclists' route choice have suffered from several limitations related to: (1) insufficiently detailed cycling data; (2) quantifying street network centralities; and

Study area
Glasgow is the largest and most populated city in Scotland. The majority of journeys (71%) are under 5 km; however, the cycle mode share for work and study is only 1.6% (Cycling Scotland, 2019). Glasgow has been selected as the area of interest in the current study ( Fig. 1) due to: (a) the availability of rich data and literature; (b) the existence of AT communities; and (c) the significance of the city for Scotland, particularly the current challenges that can be mitigated through AT, and its ongoing endeavours to promote AT.

Aims, objectives and outline
This study aims to increase the understanding of cyclists' route choices within street configurations. Such an understanding will reveal favourable street characteristics that can be utilized in conjunction with other factors to foster cycling and consequently increase cycling ridership levels. The objective of this study is to model the number of cycling trips on each street intersection as a function of street network centrality indices (degree centrality [DC], eigenvector centrality [EC], betweenness centrality [BC], and closeness centrality [CC]).
The remainder of the paper is structured as follows: Section 2 reviews the key literature on AT data sources and street layout. Section 3 explains the data acquisition, preparation and analysis methods. The descriptive analysis and specifications of the empirical models used in the study are presented in Section 4. The results of the selected empirical models are interpreted and discussed in Section 5, followed by concluding remarks in Section 6.

Traditional and emerging AT data sources
Interest in AT has led to the application of numerous data collection methods. AT data sources can be partitioned into: (1) traditional data sources, in which various conventional sensors are used to collect data - Table 1 summarizes traditional AT data advantages and disadvantages; and (2) emerging data sources that can be attributed to the proliferation of GPS-enabled devices such as smart phones and tablets (Lee and Sener, 2017).
Furthermore, emerging methods can be grouped in two categories, depending on the level of the subject (AT user) interaction required: The collection of passive data requires minimal levels of interaction between the AT user and the device (e.g. standalone GPS devices), while active data measurements (e.g. social fitness networks, in-house developed apps, and public participation geographic information systems) require more extensive levels of interaction (willingness to participate) (Lee and Sener, 2017). Many places around the world are developing in-house active data sources such as CycleTracks 5 (San Francisco), CycleAtlanta 6 (City of Atlanta), and CycleLane 7 (Eugene, Oregon). For more in-depth reviews of various traditional and emerging data sources, readers are referred to Pritchard (2018) and . Strava Metro datasets are available commercially, with the price of the dataset depending on the number of users and type of data. In Queensland, Australia, the data service has been used to identify potential conflicts between road users as well as to prioritize cycling infrastructure and signage investment (Department of Transport and Main Roads, 2017). Moreover, Strava datasets are able to outperform data obtained from many traditional methods. For example, consider the collection of cycling data using a manual count procedure via a clicker; such a method is characterized by the requirement of pre-training for accurate results, and is also labour intensive, time-consuming, tiresome, and subject to vagaries of weather. Strava datasets overcome such limitations (Day et al., 2016;Jestico et al., 2016). Table 2 listed selected studies that have used Strava.

Route choice modelling
Crowdsourced data have previously been used in multiple cyclingrelated applications, such as identifying cycling patterns (Musakwa and Selala, 2016), infrastructure evaluation (Hong et al., 2018), and air pollution exposure (Sun et al., 2017a(Sun et al., , 2017b. The scope of this study is cyclists' route choice, which refers to investigating factors influencing the likelihood of choosing a given street segment along a cycling journey. Route choice models shed light on cyclists' preferences in order to aid transport planners in making routes more appealing for cyclists and identifying optimal locations to situate bicycle facilities (Lu et al., 2018). Table 3 reviews selected studies that integrated crowdsourced data into modelling route choices.

Glasgow, Scotland
Explored the association between cycling purposes (commuting and non-commuting) and air pollution exposure for both PM 10 and PM 2.5 . Hong et al.

Glasgow, Scotland
Examined cycling volumes before and after the installation of four cycling infrastructure projects around the time of an international multi-sport event took a place Venter et al.
Oslo, Norway Examined the trends of outdoor recreational activities (cycling, walking, running and hiking) during COVID-19 partial lockdown.   (Lee and Sener, 2017). aFor example, magnetometers detect changes in magnetic fields within the approximation of the sensor created by ferrous metal objects, thus this sensor is not suitable for non-ferrous metal objects (e.g. carbon-fibre bicycles and pedestrians).

Open geospatial data
In the context of ubiquitous ICT and the proliferation of open data, open geospatial data is becoming increasingly popular. Numerous platforms have been developed to collect geospatial data from users, known as volunteered geographic information (VGI). Such platforms include OSM, Flicker, Twitter and Foursquare, with VGI platforms dedicated for cyclists including BikeMaps, BikeLaneUpRising and Safe-Lanes. Open geospatial data facilitate research transparency, reproducibility and scalability and many governmental bodies have adopted this concept. For example, New York City's map portal (http://maps.nyc.gov/doitt/nycitymap/) provides a single access point for official geospatial datasets, including transportation, locations of airport, bicycle parking, and off-street parking (Mobasheri, 2020).
OSM has a noticeable presence in urban studies, as it provides updated data that is typically hard/expensive to obtain from other data sources. For example, OSM includes informal paths and passageways that are not included in official data. This presence can be witnessed in many studies. Graser et al. (2015) determined a close agreement when assessing the quality of OSM street networks for vehicle routing in Vienna by using official data as a reference, otherwise known as a extrinsic assessment. Mobasheri et al. (2017) evaluated the suitability of OSM for wheelchair users via both extrinsic and intrinsic assessments (i.e., the procedure of checking the completeness level of attributes such as sidewalk width and incline within OSM data itself) in various German cities, determining a high level of acceptability of the data. Similarly for cycling, Hochmair et al. (2015) assessed the completeness of OSM cycling data (lanes and trails) in selected US urbanized area using governmental datasets, satellite imagery and Google maps as a reference. In some locations, OSM data were found to be even more accurate than the reference data due to the greater number of mappers, though other regions were found to have some Table 4 Definition and illustrations of centrality indices.

Index
Definition Returns the fraction of streets that it is connected to (Hagberg et al., 2018;Newman, 2006).

Eigenvector Centrality [EC]
Computes the extent of centrality of a given street based on the connectivity of the streets to which it has ties (Hagberg et al., 2018;Newman, 2006).

Betweenness Centrality [BC]
Computes how often a given street appears on the shortest path between other streets in the network (Hagberg et al., 2018;Newman, 2006).

Closeness Centrality [CC]
The reciprocal of the average shortest path from any given street segment to all other street segments in the network (Hagberg et al., 2018;Newman, 2006). ±Hot (cold) colours represent high (low) values of the corresponding index. missing data. Rousell and Zipf (2017) prototyped a pedestrian-based navigation system with landmark extraction using only OSM data. Using raw data from OSM to model street networks is cumbersome, as it does not provide network typology (connections and configurations). The development of the OSMnx toolkit (described in 4.2) has extended the concept of open geospatial data by allowing for downloading and modelling of street networks (i.e. walkable, bikeable, and drivable), in addition to building footprint, points of interest, and elevation data. Boeing (2020) made all street networks in the world openly available on the Harvard Dataverse and calculated various transportation-related indicators such as street grade and intersection counts. OSMnx also has applications in transportation studies. For example, Boeing (2019a) explored transportation efficiency using network circuitry (the ratio of street network distances to straight-line distances) for walkable and drivable networks in 40 US cities. Yen et al.
(2019) extended this investigation to include bikeable networks and applied it to Phnom Penh, Cambodia.

Street layout
Street layout has a primary role in AT mobility, safety, and wayfinding (Hillier et al., 1993). Rifaat et al. (2012) reported that loop and lollipop street patterns are associated more with pedestrian crash severity compared to gridiron street patterns. The impact of street angularity has been investigated by Dalton (2001), who concluded that people attempt to conserve street linearity in their journeys by avoiding unnecessary turns. In the current study, street layout is represented by four centrality indices that measure how each street segment relates to other streets within the network, namely the DC, EC, BC and CC indices, presented in Table 4. DC refers to the number of street segments that are immediately connected to any given street. Well-connected streets offer maximal accessibility to other streets and alternative routes, while also maximising the number of intersections (Pucher and Buehler, 2012: 121). EC is a more advanced version of DC, which also includes the influence of a street in a street network. More specifically, a higher score is assigned to streets which are connected to multiple streets with high DC values (Newman, 2006).
BC measures the likelihood for any given street segment to be passed through as the shortest route from one street segment to all other street segments (Freeman, 1977;Hillier et al., 1987;Hillier and Iida, 2005). This index quantifies how well any given street functions as a unique bridge between other streets (Joss, 2016). BC is assumed to be associated with travel time minimisation, as it increases the likelihood that a given street segment serves as a shortest path within the street network, and thus acts as a trip option (McCahill and Garrick, 2008;Samson, 2017).
CC represents the number of turns that must be traversed in order to reach all street segments from any street segment of origin within a street network (Hillier, 2012). Compared to streets with a low CC value, those with high CC values can be reached with fewer turns, as they are closer to all other streets within the network (Koohsari et al., 2016). Kim and Sohn (2002) and Al-Shaheen (2012) found that streets with high CC often exhibit mixed landuse (residential, commercial, and work areas), a well-known variable that is correlated with AT Kerr et al., 2007).

Methods
In this study, cyclists' route choices are modelled using street layout. Variables related to bike ridership were derived from Strava Metro, whereas OSMNx was used to calculate street network centrality indices. QGIS and GeoDa on macOS Catalina were used for data preparation, visualisation and analysis. Strava's 2017 and 2018 cyclist datasets for Glasgow were obtained from the University of Glasgow's Urban Big Data Centre. 8 Each dataset has three products: Street, Origin/Destination, and Nodes. The Nodes product (point Shapefile) represents street intersections extracted from OSM and was used to model cyclists' movement. The Street product was employed for the purpose of visualization (shown in 5.1.1) and to extract temporal patterns. Core data files were layered to the shapefile to provide a summary for the entire year (known as yearly roll-up) for each point feature. The attribute ACTCNT, which denotes the count of trips through the intersection (hereafter CCT), was used to model cyclists' route choice. As for the temporal patterns, Edges -Hourly files were used to plot the Activity_Count variable, which denotes the number of trips on the street/trail segment at hourly and daily levels in the direction of the line digitization.
It should be noted that all counts must meet a minimum of 3 users before they are shown, and are rounded up to the nearest multiple of 5 (i.e. counts of 4 are shown as 5, counts of 8 are shown as 10 and so on). In addition to the aforementioned data, demographic information and other relevant data was acquired for the timeframe of the products.
In order to determine the extent to which the Strava cycling data represents actual cycle ridership in Glasgow, the 2017 and 2018 data-sets were correlated with corresponding cordon cycle counts, which took place over two successive days between 6:00 am and 8:00 pm in September 2017 and 2018 at 36 locations (Fig. 2). This is a common practice to examine the representativeness of Strava (Conrow et al., 2018;Jestico et al., 2016;Roy et al., 2019). Note that the cordon cycle count was manually digitized using data obtained from Glasgow City Council (2018). Since the cycle count data reflects the number of cyclists going to and from the city, the Strava street product was used as it contains TACTCNT, which denotes the total count of trips on the street/trail segment, regardless of the direction of travel. Using linear regression analysis, the coefficient of determination (R 2 ) was found to be 0.69 and 0.67 for 2017 and 2018, respectively. This suggests that both Strava datasets adequately represent cycle ridership in Glasgow.

OSMNx
OSMnx, a Python-based toolkit, was used to analyse the street network as it overcomes the limitations of traditional methods (e.g. small sample size and an excessive network simplification). First, one can query if any street network exists on OSM using the location name or a predefined polygon. The toolkit automatically accounts for nonplanarity, meaning that it represents street networks using a threedimensional model for grade separation, bridges, and tunnels (Boeing, 2017  The toolkit then corrects and simplifies the street network, retaining all spatial geometry as the original OSM nodes are positioned on intersections, dead-ends, and along a single curved street segment. In the current study, the street network was simplified using the strict mode, which retains a node if it constitutes a dead-end, self-loop, and a true intersection of multiple street segments in which at least one of the streets continues through the intersection. Note that for the latter, the intersection of two streets, where both end at the intersection and thus create an elbow, does not constitute a node. Thus, as illustrated in Fig. 3, this procedure will eliminate excess street fragmentation, as is the case of Space Syntax, and yield better representative results (Boeing, 2017).
After acquiring the desired street network, OSMnx can calculate a wide range of indicators for both edges (street segments) and nodes (intersections). In this work, we focus on the centrality indices mentioned in Table 4. For the purpose of visualization, the street segments are illustrated in 5.1.2. The nodes were included in the cyclists' movement model.

Data preparation
The cycling trip counts for both years (derived from the Strava node product) were merged with the centrality indices (derived from the OSMnx node product) using the QGIS NNJoin plugin (version 3.4.14-Madeira). This feature joins two shapefiles based on the nearest neighbour. The input and join layers were set as the centrality index and Strava layers, respectively. This results in two shapefiles with 35,561-point features corresponding to the attributes from the cycling trip count derived from Strava, while the centrality indices were derived from the urban network analysis. The nearest-neighbour distance tolerance was set to <10 m to allow points with less than a 10-m radius to be aligned, resulting in datasets with 31,374 point features.
GeoDa (version 1.14.0) was then deployed for the logarithmic transformation of all variables in order to reduce any skewness in the data. Following this, the weight matrix was created using queen contiguity to define neighbour points based on border sharing. The

Results
The results of both the descriptive and inferential analysis are reported in this section. The descriptive analysis generally consists of the spatial visualisation of the data obtained from Strava and OSMnx along with a descriptive summary of demographic information. The inferential analysis results report the output of the spatial regression model. The final section presents the procedures used to assess the adopted method.

Strava
Figs. 5 and 6 map Glasgow Strava trip volumes for 2018 and 2017, respectively. Spatial cycling trends are similar for both years in the city centre. In particular, the distinctive gridiron street pattern in the city centre, as well as its arterial roads, allows for a high volume of cycling trips. The locator maps in both figures depict the cycle ridership in the city centre for both years, clearly demonstrating the high number of cycling trips along the River Clyde and over the bridges across the River Clyde.
The temporal distribution of the cycling trips of each street segment in Glasgow is plotted in Fig. 7. The daily cycling trips plotted in Fig. 7A and B exhibit seasonal variability, where the number of trips plateaus in the non-winter seasons (around day 97 to 265). In 2018, the number of cycling trips peaks at day 252 (September 9th) and 220 (August 8th) in conjunction with major cycling events, namely, Pedal for Scotland and the Glasgow 2018 European Championships Cycling Time Trial, respectively. Similarly, for 2017, day 253 (September 10th) peaks in conjunction with Pedal for Scotland. The total hourly cycling trips plotted in Fig. 7C and D exhibited the same trends. The number of cycling trips peaks at around 8 AM and 5 PM, coinciding with commuting peaks, with a slight plateau at around 12 PM that may be attributed to recreational trips.

OSMNx
The analysis of the street layout characteristics in the city demonstrate the scattering of the DC values across the city (Fig. 8), with low DC values located on arterial roads and to the west of the city centre. Note that DC is affected not just by the intersections, but also by the street direction, which is evident in the western region of the city centre. In particular, an intersection of two bidirectional streets will have a DC value of 8, while intersections of two one-way streets will have a DC value of 4. Thus, intersections of bidirectional and oneway streets will vary accordingly. However, the illustrated values are normalised. In comparison, as can be seen in Fig. 9, EC values exhibit an almost symmetric diffused pattern stemming from Glasgow Green, located in the east end of the city centre. Fig. 10 presents the BC values for the City of Glasgow's street network. It can be seen that streets that function as bridges between sets of other streets (e.g. arterial roads, bridges and tunnels) exhibit high BC values. The lowest BC values correspond to streets located in the periphery of the city.
The CC captures the extent of the topological closeness between a street segment and all other street segments. It can be seen from Fig. 11 that the arterial road (A8) that passes through central Glasgow and its extensions towards the south west to the north east exhibits the highest CC values, and gradually decrease outwardly. This reflects the hierarchy and location of the roads, whereby arterial roads exhibit the highest values, and local roads (the lowest level in the hierarchy) located in the outskirts of the city hold the lowest values.

Spatial dependence
We adopted ordinary least squares (OLS) regression to obtain the residuals for CCT 2018 and 2017 and subsequently assessed the multicollinearity via the variance inflation factor (VIF). The VIF value was found to be <10 (mean VIF = 1.06), implying the absence of multicollinearity between the explanatory variables (street network centralities). Spatial dependence analysis was performed to determine the appropriateness of spatial regression models. Spatial dependence occurs when a value at a given location is associated with those of neighbouring locations. The presence of spatial dependence of the residual implies a violation of the basic OLS assumption of spatially independent residuals. Thus, spatial regression models can improve model estimations (Anselin, 2013). Table 5 reports the diagnostics for spatial dependence. Both Moran's I and Lagrange Multiplier (LM) spatial dependence tests were found to be significant at the 1% level, indicating that the data was able to fit both the spatial error model and the spatial lag model, respectively. The spatial error model accounts for the spatial dependence of residuals (ε) in the neighbouring regions (spatial heteroskedasticity), while the spatial lag model accounts for the spatial dependence of the dependent variable (y) in the neighbouring regions (Anselin, 2013;Burkey, 2018). Local indicator of spatial autocorrelation (LISA) analysis was carried out with significance assessed by 9999 Monte Carlo permutations to identify spatial clusters (composed of High-High and Low-Low) and outliers (composed of High-Low and Low-High) from the OLS residuals for 2018 and 2017 (Figs. 12 and 13, respectively) (Anselin, 1995). High-High refers to features with high residual values that are surrounded by neighbouring features with high residual values;Low-Low refers to features with low residual values that are surrounded by neighbouring features with low residual values; High-Low refers to features with high residual values that are surrounded by neighbouring features with low residual values; and Low-High refers to features with low residual values that are surrounded by neighbouring features with high residual values.

Model selection criteria
Both the spatial lag and spatial error models are strong prediction models. Thus, in order to select the most appropriate spatial regression model, four selection criteria (reported in Table 6) were used to determine the optimal model goodness of fit (R 2 ), log likelihood (LogL), Akaike info criterion (AIC), and Schwarz criterion (SC). Higher R 2 and LogL, and lower AIC and SC values imply a better regression fit (Chuai et al., 2012). The spatial error models for both years exhibit higher R 2 and LogL, and lower AIC and SC values (Table 6). This suggests the presence of statistical evidence of spatial heteroskedasticity, which occurs when the error variance is non-constant across observations (Brennan and Carroll, 1987: 443). Thus, the spatial error model fits the data more efficiently compared to the spatial lag model (see Table 7). Table 6 reports the results of the spatial error models for CCT for the years 2018 and 2017. All parameters were significant at the 1% level. In particular, the lag coefficient λ denotes the spillover effect of neighbouring intersections, such that an increase of 1% in the number of trips in an intersection leads to a 0.55% and 0.62% increase in adjacent intersections for the 2018 and 2017 models respectively. All centrality metrics were found to have a positive impact on the trip, with the exception of DC, which exhibited a negative impact.

Spatial error model
In 2018, CC was observed to have the greatest coefficient of correlation a 1% increase in CC yielded a 1.15% increase in trips. This was followed by BC, where a 1% increase led to a 0.28% rise in trip numbers. EC demonstrated the smallest coefficient of correlation, implying that a large amount of the variation cannot be explained by the model, resulting in a dispersion of the prediction. In particular, a 1% increase in EC resulted in a 0.02% rise in the number of trips. DC was found to have a negative impact on the trip number, whereby a 1% rise in DC led to a 0.15% decrease in the number of trips. A similar trend was observed for the 2017 model.

Method assessment
Strava datasets for both years exhibited an adequate representation for cycling in Glasgow, with a linear correlation observed with cordon counts in the city centre. A visual inspection of the spatial distribution of cyclists also implies a plausible representation high CCT can be identified in the city centre and along the river, with low CCT values in highways. Furthermore, the CCT for both years were found to be sensitive to seasonality and both commuting and non-commuting peaks.
The adoption of OSM in the current study is demonstrated as suitable by the results. Haklay (2010) found a fairly accurate agreement between OSM and the national mapping agency for Great Britain, the Ordnance Survey, for London and England. Given the proliferation of GPS-enabled devices, this is believed to also be the case in Glasgow. Nonetheless, OSM is used as a background map for Strava, making it the optimal option for spatial matching.
The presence of spatial dependence and model selection criteria in Table 6 are attributed to the nominate spatial error model and discarded OLS. Replicating the analysis for both years, the number of observations and similar coefficients trends introduces rigidity to the results.

Discussion
Results demonstrated the ability of the Strava datasets to model cyclists' movement. First, Strava was found to be strongly associated with the cordon cycle counts. Second, total daily trips on all street segments were identified as sensitive to seasonal variability. Third, total hourly trips on all street segments clearly depicted commute and recreational patterns. Fourth, gender disparity reflects the prevalence of male cyclists. These trends are in accordance with findings from McPherson (2017), who reported similar temporal trends (both daily and hourly) in Glasgow's public cycle hire scheme, and McPherson (2017) and Motherwell (2018) who found gender disparity towards males.
The count of cycling trips (CCT) for the year 2018 was modelled and the results were validated by repeating the analysis for CCT 2017. A spatial error model was used to account for the spatial dependence in the residuals. The models' predictive powers were found to be moderate (0.42 and 0.53, respectively) with similar explanatory variable coefficients trends. Nordström and Manum (2015) found that betweenness centrality (BC), a proxy for the likelihood of a given street to act as a shortest path between the origin and destinations, is insufficient for cyclist route predictions as less direct routes are used as a result of the poor safety of the shortest path routes. However, our findings suggest that cyclists in the City of Glasgow flow along the shortest-path, and this spillover effect may reinforce the safety-in-numbers theory, which states that 'the greater the cyclists' volume, the safer the path (Jacobsen, 2015)'. Both basic and advanced measures of street connectivity were adopted in this study, namely, degree centrality (DC) and eigenvector centrality (EC), respectively. DC has a negative influence on trip counts, confirming that cyclists seem to prefer using less connected streets, and avoid intersections. Similar findings have been reported in Stinson and Bhat (2003), Menghini et al. (2010) and Snizek et al. (2013). Furthermore, previous research has indicated that intersections are perceived as potential areas for accidents due to: (i) the difficulty in anticipating how vehicles will move; (ii) noiseless and quiet vehicles which hinder localization; and (iii) the unpredictability of cyclists due to the lack of stop and indicator light usage (ECMT, 2000;Strauss et al., 2013). In addition, intersections can result in cyclists losing their momentum, which leads to a slow re-start as considerable energy is required to re-gain the momentum (McLean, 2017). As a countermeasure, the "Green Wave" system, first introduced in Copenhagen, offers a seamless flow for cyclists via synchronized consecutive green traffic light signals (Pucher and Buehler, 2012).
Conceptually, EC is a more advanced variant of DC that measures the popularity of street segments. DC captures the immediate number of connections to other streets (which are likely to be intersections), whereas EC is able to capture the indirect number of connections and does not necessarily reflect intersections. By following the same reasoning of Hong et al. (2016), who developed a model of pedestrian exposure, EC estimates the extent to which a street segment is connected to other well-connected streets. Thus, cyclists are able to scatter to other desirable streets, such as those with less congestion and complex traffic situations. EC was observed to be high within Glasgow Green (see Fig. 10) and exhibited a gradual outward decreasing trend. The high number of cycling trips at this location may be a function of the proximity to the city centre, several landmarks, the River Clyde, and the presence of Glasgow's most popular bike rental location operated by NextBike (McPherson, 2017). Glasgow Green, Scotland's oldest public park, contains several landmarks, such as the largest terracotta fountain in the world (Doulton Fountain), McLennan Arch, a historical museum and a glasshouse, People Place's and Winter Garden (Ashurst, 2008), which may also make it attractive to cyclists.
Closeness centrality (CC) values were found to be highest on major roads (e.g. the A8 and A77) and their surrounding areas, and gradually decreased towards the periphery of the city. These major roads accommodate cyclists with the presence of bridges, artwork and feature lighting (Olsen et al., 2016). The correlation between CCT and CC indicates that cyclists are more likely to use accessible streets (i.e. streets that are topologically close to other streets within the network) rather than deep streets. In contrast, streets associated with low CC values are not easily accessible by many cyclists (Hillier and Sahbaz, 2008). This suggests that the infrastructure and bicycle facilities situated in the core of the city, including the city centre, will be more effective than those within the outskirts of the city, which is in agreement with McArthur and Hong (2019).

Conclusion
The current study proposes the use of a spatial model of CCT in the City of Glasgow to better understand cyclists' route choices and to identify optimal location for cycling facilities. The study overcomes limitations of previous research by: i) using a crowdsourced cycling dataset obtained from Strava; ii) deploying OSMnx to calculate street network centrality indices; and iii) accounting for the spatial dependence by using spatial errors models.  CCT was found to be significantly positively correlated with streets of high CC, EC, and BC values, and negatively correlated with those of DC values. This study can be applied to the planning of future street network configuration. For example, in order to increase ridership in grid pattern streets, Green Wave systems can be applied. Further, our results indicate that streets with high CC values should be prioritized for cyclist infrastructure since CC exhibits the highest coefficient of correlation with our proposed model.

Limitations
This study, however, is not without limitations. Strava datasets may be subjected to the self-selection bias that occurs when participants include themselves into a group, resulting in a nonprobability sample. This may explain the overwhelming predominance of two segments, both age-wise (users aged between 25 and 54 years) and gender-wise (males), in the Strava dataset, supporting the claim that females are under-represented, while fitness-focused users are over-represented (Hochmair et al., 2019;Sun, 2017). Our findings represent the demographic composition found in Strava datasets, which are to a large extent in line with the demographic composition seen in previous studies (McPherson, 2017;Motherwell, 2018).

Directions for future research
Future research might consider the following directions: (1) Extending the present study by incorporating different street network centrality indices, such as angular indices through the Urban-FormPy python package (Filomena et al., 2019); (2) modelling pedestrians' route choice using crowdsourced data, which is currently in its infancy stage (Griffin et al., 2014;; or 3) considering additional mechanisms for gathering Strava-equivalent data from the under-represented groups identified above.