Exploring in ﬂ uencing factors on transit ridership from a local perspective

Purpose – Exploring the in ﬂ uencing factors on urban rail transit (URT) ridership is vital for travel demand estimation and urban resources planning. Among various existing ridership modeling methods, direct demand model with ordinary least square (OLS) multiple regression as a representative has considerable advantages over the traditional four-step model. Nevertheless, OLS multiple regression neglects spatial instability and spatial heterogeneity from the magnitude of the coef ﬁ cients across the urban area. This paper aims tofocus onmodelingandanalyzing thefactors in ﬂ uencingmetroridership at the stationlevel. Design/methodology/approach – This paper constructs two novel direct demand models based on geographically weighted regression (GWR) for modeling in ﬂ uencing factors on metro ridership from a local perspective. One is GWR with globally implemented LASSOfor feature selection, and the other one is geographically weightedLASSO(GWL)model,whichisGWRwithlocallyimplementedLASSOforfeatureselection. Findings – The results of real-world case study of Shenzhen Metro show that the two local models presented perform better than the traditional global model (OLS) in terms of estimation error of ridership and goodness-of-ﬁ t. Additionally, the GWL model results in a better ﬁ t than GWR with global LASSO model, indicating that the locally implemented LASSO is more effective for the accurate estimation of Shenzhen metro ridership than global LASSO does. Moreover, the information provided by both two local models regarding the spatial varied elasticities demonstrates the strong spatial interpretability of models and potentials in transportplanning. Originality/value


Introduction
Urban rail transit (URT) plays a critical role in maintaining effective passenger mobility nowadays.URT ridership at the station level is known to be influenced by interaction among multiple factors (e.g.land-use, socio-economics, intermodal traffic accessibility and metro network structure, etc.).Exploring the influence of these factors is vital to accurately estimate travel demand and to effectively make design schemes of urban systems including the identification of which public infrastructures, services and resources need to be built and deployed.Modeling URT ridership at the station level can help to not only estimate and forecast ridership but also analyze the influencing factors on it.
Given the need to understand the effects of multiple factors on URT ridership, a growing number of recent studies have sought to model transit ridership.As one of the best-known models, the four-step (generation, distribution, mode choice and assignment) model has been widely used since the 1950s.However, its weaknesses are also obvious, such as low model accuracy, low data precision, insensitivity to land use, institutional barriers and high expense (Gutiérrez et al., 2011).As an alternative to the four-step model, direct demand models have become popular in ridership estimation in recent decades.Direct demand models estimate ridership as a function of influencing factors within the pedestrian catchment areas (PCA) via regression analysis, which enable identifying factors that contribute to higher transit ridership (Gutiérrez et al., 2011;Choi et al., 2012;Cervero, 2006;Kuby et al., 2004;Chu, 2004).In the models, a PCA is a geographic area for which a station attracts passengers.The size and shape of a catchment area depend on how accessible a station is and how far it is from alternative stations.One can use buffers to create circular catchment areas by a specific distance or use Thiessen polygons to illustrate the area most accessible to each station.Direct demand models have distinct advantages in travel analysis, such as simplicity of use, easy interpretation of results, immediate response, and low cost.As a kind of direct demand models, ordinary least square (OLS) multiple regression which assumes parametric stability is generally used (Gutiérrez et al., 2011;Kuby et al., 2004;He et al., 2018;Sohn and Shim, 2010;Loo et al., 2010;Sung and Oh, 2011;Thompson et al., 2012;Zhao et al., 2013;Chan and Miranda-Moreno, 2013;Singhal et al., 2014;Liu et al., 2014).In other words, OLS considers that the coefficients estimated do not have significant differences in space.With the development of spatial modeling, direct demand models could increase their spatial explanatory power by using geographically weighted regression (GWR), which is designed to model spatial parametric non-stationarity and variance heterogeneity.In recent years, Cardozo et al. (2012) compared the performance of OLS and GWR in modeling transit ridership and its influencing factors, and GWR showed better goodness-of-fit than OLS for forecasting station-level ridership.Furthermore, the study of GWR with penalized forms (e.g.' 1 norm) can be found in Wheeler's (2009) studies.Wheeler (2009) introduced least absolute shrinkage and selection operator (LASSO) into the GWR framework, called geographically weighted LASSO (GWL) to simultaneously conduct coefficient regularization and local model selection, which has the capability to reduce prediction and estimation errors for estimating the response variable in GWR.
In light of deficiencies of popularly used global direct demand models (OLS multiple regression), considering the advantage of spatial models including GWR and GWL for modeling potentially spatially varying relationships, we applied two local direct demand models based on GWR with global implemented LASSO and GWL into modeling influencing factors on metro ridership.For the former, we select features by implementing LASSO globally for all calibration locations before the process of GWR, and for GWL, we can select features for each station by implementing LASSO locally.The ridership and its potential influencing factor data of Shenzhen Metro in the year of 2013 are used to elaborate the two models.Besides, we conduct a relevant comparison analysis of results generated from those two models.

Transit ridership
Our main contributions are threefold: The approach taken is based on spatial models considering spatial autocorrelation of variables, which outperform the traditional global regression model OLS in terms of model fitting and spatial explanatory power.GWR with global feature selection using LASSO and GWL are compared through a real-world case study on Shenzhen metro, that is, the difference between global feature selection and local feature selection is discussed.Network structures as a type of factors are quantified with the measurements in the field of complex network.
The remainder of this paper is organized as follows.In Section 2, we outline the profiles of study area and the data description.In Section 3, we provide a description of the methodology we used.Section 4 conducts results analysis and the comparison between GWR with global LASSO and GWL models.Finally, Section 5 contains concluding remarks.

Area of study and data
Our study focuses on Shenzhen Metro network, which consists of five lines and 118 stations in the year of 2013 (Figure 1) [1]. Figure 1 shows the spatial distribution of those stations.Shenzhen Metro ridership data at the station level were aggregated by using the data collected through AFC system of Shenzhen Metro Corporation in China.The data set includes the total information about entry-exit smart card records.The data used in the research cover a time span of seven days from October 14 (Monday) to 20 (Sunday) in 2013.
We summed boarding and alighting ridership amounts and then calculated the average daily ridership of the whole week.The explanatory variables represent factors hypothesized to influence station ridership.

Response variable
This paper aims to identify and analyze multiple factors influencing station ridership.We conduct preliminary statistical analysis using metro AFC data on October 14. Figure 2(a) shows the spatial distribution of AFC data records in one day.It presents that the records are most densely distributed at Grand Theatre station and Laojie station, closely followed by Huaqiang Road station and Luohu station, and the records of other stations have relatively sparse distribution.is about temporal distribution of AFC data records, which shows that the spatial distribution of records has a peak value at both 8:00 and 18:00 on both weekdays and weekends.Additionally, the characteristics of temporal distribution of records on weekdays and weekends is quite similar, which suggests that there are similar metro travel patterns, with morning and evening peaks on weekdays and weekends in Shenzhen.Therefore, the models with average daily ridership of the whole week (the operation times of Shenzhen metro is 6:30-23:00) as the response variable will be built intending to find the factors influencing the station-level ridership.

Explanatory variables
The explanatory variables represent factors hypothesized to influence station ridership (Table I).The variables can be classified into four categories: (1) land use; (2) social economics; (3) intermodal traffic access variables; and (4) network structure.
As the average friendly walking distance is generally assumed to be 500 m in largeand middle-sized cities according to Dovey et al. (2017), we also define the distance of PCA of each Shenzhen Metro station as 500 m.In our work, we use a buffer to create circular PCA by 500 m.Based on the buffer with a radius of 500 m determined, population, all of the land use-related data and the number of bus stations were collected subsequently.

Land use variables.
All of the land use-related data within a PCA were collected from Baidu Map with the assistance of API, and land use variables consist of the residences, entertainment, services, business, education and offices closer to the station.Specifically, the information covers the numbers of residence, restaurants, schools, working buildings, hospitals, banks, shopping places and hotels within 500-m PCA.
2.2.2 Socialeconomics variables.Social-economic variables consist of the population distribution of Shenzhen in 2013 and operation days since the metro stations opened.The information of days since the metro lines and stations opened was collected from a website Transit ridership named "UrbanRail"[2].The higher residential population is hypothesized to be positively associated with ridership.Here, we obtained information about population distribution in the whole city of Shenzhen in 2013 from the website of Worldpop [3].During data preprocessing, the population within each buffer can be obtained by summing up the value of the grid falls into the metro station buffer by using ArcGIS 10.2. Figure 3 shows the population distribution of the whole city of Shenzhen in 2013 and 500-m buffers of metro stations.
Through the preliminary visualization in Figure 3, it is noted that population is densely distributed near the metro region.The influence of population density within each station buffer on ridership is pending for analysis in the model.

Intermodal traffic access variables.
As for intermodal traffic access, here we considered the feeder bus system.The number of bus stations near a metro station was hypothesized to be positively related to station ridership, which was also collected from the Baidu Map.
2.2.4 Network structure variables.In this paper, network structure variables comprise the degree centrality and betweenness centrality of the metro network nodes and the distance to the city center.In the field of complex networks, as the degree is a simple  centrality measure that counts how many neighbors a node has, and the betweenness centrality for each node refers to the number of shortest paths that pass through the node (Erciyes, 2014); thus, they are correlated to the information for transfer stations or terminal stations, and the importance of stations in the aspect of their controlling overflows passing between others of metro networks.As for the distance Dist i of each station to the city center, which is Shenzhen Municipal People's Government, located in Futian District, we calculate it by the following equation ( 1) considering the effect of the radius of the earth: Where, R is the radius of the earth, and (Lat 0 , Lon 0 ) and (Lat i , Lon i ) are the latitude and longitude of the city center and station i, respectively.The related geographical data were collected from Google Maps.

GWR with global LASSO
The first method is to implement LASSO for all stations' variables first to perform variable selection, and after that feed the selected explanatory variables into the GWR model to understand the spatially varied effects of those selected factors on metro station ridership.
3.1.1Geographically weighted regression.In this study, we use geographically weighted regression (GWR) models to estimate station-level ridership.GWR model is an extension of ordinary least squares (OLS) or linear least squares, which is shown as follows: Geographical location factors are introduced into regression parameters to allow local parameter estimation, and the extended GWR model is as follows: Where y i and x i1 , x i2 ,. ..,x ip are observed values of the response variable y and explanatory variables x 1 , x 2 ,. ..,x p at the location of (u i , v i ), which is geospatial coordinates of the observation point i = (1,2,. ..,n), and « i is the normally distributed error term (with the expected value 0 and constant variance).b k (u i , v i )(k = 1,2,. ..,p) refers to p unknown functions associated with the spatial position.The geographic location of each observation point (u i , v i ) is weighted by GWR model, and the weight generally is a kind of the distance decay function (Fotheringham and O'Kelly, 1989).In the model, the determination of bandwidth will directly affect the weight function and also the precision of the model, thus the determination of bandwidth is crucial.
3.1.2Least absolute shrinkage and selection operator.The structured data has 14 explanatory variables (shown in Table I) with a limited amount of observations, which may cause multicollinearity and overfitting.Redundant variables should be removed to make the process of modeling more efficient.Therefore, before fitting the regression model, it is necessary to select features from the original variables candidates.As a kind of shrinkage Transit ridership methods, LASSO tends to not only reduce the variability of the estimates, thus improving the model's stability, but also set some of the coefficients to zero, enabling variable selection.LASSO makes use of the ' 1 norm.' 1 penalties are convex and the assumed sparsity can lead to significant computational advantages.LASSO is defined as follows: Subject to: where s is a parameter that controls the degree of coefficient shrinkage.Tibshirani (1996) proved that LASSO constraint X k jb k j # s is equivalent to adding the penalty term l X k jb k j to the residual sum of squares (RSS).Thus a direct relationship between s and l !0 which is a complexity parameter that controls the degree of shrinkage of coefficients.
Hence, coefficients of LASSO can also be expressed as: The generally used methods for solving LASSO are standard convex optimizer (Gauraha, 2018) and least angle regression (LARS) (Efron et al., 2004).In this study, we adopted LARS to solve LASSO.

Geographically weighted LASSO model
The second method is based on the GWL framework developed by Wheeler (2009).This method performs the local model selection by implementing LASSO for each station, so that one can understand what factors influence which stations and how strong the influencing effects are.
The algorithm to estimate the GWL solutions is shown as following: Step 1: estimate the local scaling GWL parameters (shrinkage parameter s i at each location i and bandwidth b) by minimizing leave-one-out-cross-validation (LOOCV) root mean square error (RMSE).Here we choose the bandwidth b in the binary search for the minimum RMSPE.
Calculate the n Â n inter-point distance matrix D with the coordinates (u i , v i ) of station i.
Calculate the n Â n weights matrix W using the distance matrix D and the initial b . The diagonal elements in the weights matrix are defined as For each station i, i = 1,. ..,n: SRT 1,1 Set the square root of W(i) as W 1/2 (i) and W 1/2 (i) ii = 0, i.e., set the (i,i) element of the square root of the diagonal weights matrix to 0 to delete the observation point i.
Calculate X w = W 1/2 (i)X and y w = W 1/2 (i)y with W 1/2 (i)at station i.
Call lars (X w ,y w ), seek the lasso solution that minimizes the error for y i , and save it.
Stop when there is a slight change in the estimated b. Save the estimated b, indicator vector z of which variable coefficients are shrunken to zero.
Step 2: estimate the final local scaling GWL solutions using the shrinkage parameter s i and kernel bandwidth parameter b estimated in Step 1.
(1) Calculate the weights matrix W using the distance matrix D and the b estimated in Step 1.
Call lars (X w , y w ) and save the series of lasso solutions.
Choose the lasso solution that matches the LOOCV solution on the basis of the shrinkage parameter s i and the indicator vector z.

Spatial autocorrelation test to variables
Before building GWR and GWL models, the analysis is performed to determine if the candidate variables are spatially autocorrelated.The test of spatial autocorrelation can detect how strong spatial correlation of variables is, which will provide a theoretical basis for the feasibility of applying a GWR model.Moran's I is a measure of spatial autocorrelation developed by Moran (1950).Moran scatter plot can reflect the spatial autocorrelation intuitively.The scatter plot has four quadrants.If the observed value falls to the first and third quadrants, it indicates that there is a strong positive spatial correlation.If it falls to the second and fourth quadrants, it indicates there is a strong negative spatial correlation.Figure 4 shows several variables' Moran scatter plot.According to Moran scatter plots (see Figure 4), it can be seen that Moran's I values of all variables above aren't equal to 0, which indicates these variables are not randomly distributed in space, and mostly falls to the first and the third quadrants.It shows that each variable is positively spatially correlated more or less, especially three explanatory variables, namely, population, distance to the city center and days since opening, have strong spatial correlations as the values of Moran's I are all greater than 0.3 (Cressie, 1992).The result also lays the foundation for the feasibility of the follow-up study.

Results of Model 1 (GWR with global LASSO)
Since strong spatial correlation has been found for the variables in the research, it is reasonable to build GWR models to analyze influencing factors on station ridership of Shenzhen Metro.Through variables' selection based on LASSO, the explanatory variables selected from the candidate variables listed in Table I are pop, Between, Days_open, Shopping, Dis_to_cent.It indicates that population, betweenness centrality, days since stations opened, numbers of shopping places within PCA and distance of stations to the city Transit ridership center are important features for influencing metro station ridership for most of metro stations in Shenzhen.
Next up, GWR for modeling average daily ridership of the whole week and its influencing factors is built.The results of GWR model compared with those of OLS model that also includes the same variables selected by implementing LASSO are presented in Table II.
First, according to Table II, AICc value of GWR model is less than that of their corresponding global regression (OLS) model.According to the evaluation criterion of Brunsdon et al. (1996), if the AICc value of GWR model is at least 3 less than that of OLS model, we can consider that GWR model fits better than OLS model even considering the complexity of GWR model.What is more, the adjusted R 2 value of GWR model is obviously greater than that of the corresponding OLS model, which shows that GWR model has strong explanatory power even under consideration for model complexity.Likewise, the parameter value (Sigma) indicating the model error of GWR model is also lower, and the residual sum of square from the GWR model is smaller than that from the OLS model.Generally speaking, the results show that the goodness-of-fit indicators of GWR model perform better than those of OLS model.Additionally, ANOVA tests shown in Table II are carried out to find out if the global (OLS) regression model and the GWR model have the same statistical performance (the same size of error variance).The results of ANOVA test suggest that there is a significant improvement when GWR is adopted.
GWR model for average daily ridership of a whole week regression performs pretty well in terms of the value of R 2 , which means that we only need to know the information of population distribution, betweenness centrality, days since stations opened, number of shopping places within PCA and distance of stations to the city center; we can use GWR model to explain 81 per cent of the response variable and average daily ridership, and meanwhile, the data related to these explanatory variables are quite easy to collect.
According to Voronoi algorithm (Fu et al., 2006), the Shenzhen Metro coverage area can be divided into several Thiessen polygons according to the locations of stations.In this context, the spatial distribution of local coefficients is visualized by Thiessen polygons.Through  Transit ridership 3284.89 per day.However, these elasticities distribute unevenly in space.More trips per capita were expected in the center and mid-north, where commerce, administration and education are concentrated, while elasticity values were lower in the west and east.Moreover, the t-values map on the right shows that the effect of population is more significant in the middle area at a 0.05 level (the absolute value of t-values larger than 1.96) [Figure 5(b)].In general, GWR has strong spatial explanatory power based on the local analysis of the variation of each coefficient across space (elasticities).The accuracy of the estimated responses is measured by calculating RMSE.The RMSE is the square root of the mean of the squared deviations of the estimates from the true values and should be small for accurate estimators.R 2 is a statistical measure that represents the proportion of the variance for a response variable that's explained by explanatory variables.
In general, the performance rank of five methods is "GWL>GWR with global LASSO>GWR>OLS>OLS with LASSO".
First, we can see the superiority of three local models for feature selection (GWL, GWR with global LASSO, GWR) over the global models (OLS, OLS with LASSO) in terms of the estimation error of response variable and goodness-of-fit.Second, it should be noted that the local models such as GWR with global LASSO and GWL perform better than the original version of GWR model, which proves the importance of feature selection.Third, GWL performs better than GWR with global LASSO, which indicates that the locally implemented LASSO for each station during the procedure of GWR performs better than the globally implemented LASSO for feature selection before GWR.Fourth, GWL for metro network performs substantially better than the other four models at estimating the response variable.Therefore, we can conclude that GWL model which incorporates locally implemented LASSO for the metro network is able to estimate Shenzhen Metro ridership more accurately.
To investigate which station a certain variable has the most impact on, the local regression coefficients' distribution for each variable of all stations in GWL model is plotted in bubble plots.The spatial distribution of local coefficients is shown in Figure 7. Through understanding the spatial distribution of local coefficients (elasticities), the relations between the variables varying across space (estimated coefficients) and variables selection of all stations can be revealed.In Figure 7, the bubble with the bold outline demonstrates the coefficient for the variable in the station equals to 0, and other bubbles' colors identify the range of coefficients; the bubble with lighter colors means the coefficient is larger, and vice versa.Take the population factor as an example; stations with large positive coefficients are mainly distributed in the center, indicating that more trips per capita were expected in the central south area of the metro network, where commerce, administration and education are concentrated.Besides, we can note that for the factors of degree and hospital, there are numerous stations with zero coefficient, so these factors are not so important factors for influencing most of Shenzhen Metro stations' ridership.
Through comparing the interpretation of the coefficients of GWR with global LASSO and GWL models, we can find that the coefficients of both models are spatially varied, helping us gain local insights into analyzing the influencing factors of Shenzhen Metro ridership.In addition, for Model 1, the explanatory variables are selected for all stations Transit ridership uniformly before conducting GWR, whereas for Model 2, the variables of each station are selected respectively during the procedure of GWR, and therefore, the difference between the coefficients of two models are: first, the coefficients of Model 2 include all potential candidate variables initially but Model 1 selects several important factors at the beginning.Second, for some stations, the coefficients for the certain variables of Model 2 may be shrunk to zero, but the coefficients for variables of Model 1 cannot be zero, which means that for Model 2, different stations may have different influencing factors and the degree of impact also can be varied, and for Model 1, we can only discuss the spatially varied impacts of those common important factors on the metro ridership of stations in different locations.Moreover, in Model 2, factors with coefficients of numerous stations being zero are in accordance with the factors which are not selected in Model 1, such as hospital and degree.In other words, GWL model paid more attention to the spatial difference of influence of factors on metro ridership at each station than GWR model with LASSO does.Generally, both models can provide us local perspectives more or less while interpreting coefficients.

Conclusion
In summary, this paper builds two spatial models to analyze the influencing factors of Shenzhen Metro ridership at the station level from a local perspective.One model is GWR model with global LASSO for variables selection, and the other one is GWL model, which implements LASSO for each calibration location during the procedure of GWR, i.e.GWR with local LASSO.We demonstrate the applicability of these two models through the spatial autocorrelation test and superiority of them over global models through a real-world case study of Shenzhen Metro systems, and meanwhile, we not only analyze the influencing factors of Shenzhen metro station-level ridership from a local perspective but also conduct a comparative analysis on these two models.Additionally, different from previous work, we borrow the conceptions, including degree centrality and betweenness centrality, from complex network theory to better quantify the network structure factors related to the practical significance of metro networks, which cover comprehensive information compared with dummy variables.
The results of the case study show that the local models including GWL model, GWR without feature selection and GWR model with global LASSO perform better than global models including OLS and OLS with LASSO in terms of estimation error and goodness-offit.Besides, the estimation error of GWL is lower than that of GWR with global LASSO, which indicates the locally implemented LASSO for each station during the procedure of GWR performs better than globally implemented LASSO for feature selection before GWR.With regards to the interpretation of coefficients of two models, the coefficients of GWL model include all potential candidate variables initially but GWR with global LASSO model select several important factors at the beginning.Additionally, for GWL model, different stations may have different influencing factors and the degree of impact also can be varied, and for GWR with global LASSO, we can only discuss the spatially varied impacts of those common important factors on the metro ridership of stations in different locations.To sum up, GWL model pays more attention to the spatial difference of influence of factors on metro ridership at each station than GWR model with global LASSO does.
In general, the two local models presented in this paper not only improve the performance of traditional OLS multiple regression on modeling metro ridership and its influencing factors in terms of goodness-of-fit and estimation error but also inspired metro planning, passenger flows management and periphery development from a local perspective.

Figure 2 .
Figure 2. Spatial and temporal distribution of AFC data records Figure 3. Population distribution and 500-m buffers of metro stations Figure 4. Moran scatterplot of variables

4. 3
Results of Model 2 (GWL) and comparative analysis of two models GWL model (GWR with locally implemented LASSO enabling simultaneous coefficient penalization and model selection) is conducted on the Shenzhen Metro data set.The comparison of results of GWL model and OLS model, OLS with LASSO for feature selection, GWR and GWR with global LASSO for feature selection for estimating average daily ridership of the whole week are shown in Figure 6.
Figure 6.Comparison of regression performance of models for ridership estimation Figure 7. Spatial distribution of local coefficients (elasticities) of GWL model

Table II .
Results