Identification of Enablers and Barriers for Public Bike Share System Adoption using Social Media and Statistical Models

Public bike share (PBS) systems are meant to be a sustainable urban mobility solution in areas where different travel options and the practice of active transport modes can diminish the need on the vehicle and decrease greenhouse gas emission. Although PBS systems have been included in transportation plans in the last decades experiencing an important development and growth, it is crucial to know the main enablers and barriers that PBS systems are facing to reach their goals. In this paper, first, sentiment analysis techniques are applied to user generated content (UGC) in social media comments (Facebook, Twitter and TripAdvisor) to identify these enablers and barriers. This analysis provides a set of explanatory variables that are combined with data from official statistics and the PBS observatory in Spain. As a result, a statistical model that assesses the connection between PBS use and certain characteristics of the PBS systems, utilizing sociodemographic, climate, and positive and negative opinion data extracted from social media is developed. The outcomes of the research work show that the identification of the main enablers and barriers of PBS systems can be effectively achieved following the research method and tools presented in the paper. The findings of the research can contribute to transportation planners to uncover the main factors related to the adoption and use of PBS systems, by taking advantage of publicly available data sources.


Introduction
Urban mobility is a major problem for EU citizens, as evidenced by a Eurobarometer study conducted in July 2007 [1], as 90% of Europeans thought that the traffic situation should be improved in their zone. The bicycle, as a sustainable transport mode, may play a key role to address this problem. According to the EU Transport Council in 2001, "a sustainable transport system is one that allows individuals and societies to meet their needs for access to areas of activity with total safely, in a manner consistent with human and ecosystem health, and that is also balanced equally between different generations" [2].
In this context, bicycles help to reduce pollution, as up to 70% of pollution is caused, in large cities, by traffic and private vehicles [3]. This gives an idea of the possibilities of bicycle transport to greatly favor a sustainable transport system.
In the last decade, public bicycle programs or public bike share (PBS) systems have undergone thundering growth, due to the development of better bicycle tracking methods with technological

Literature Review
Research into public bicycle systems has attracted the attention of researchers who reported their findings in the literature from different perspectives and viewpoints in the past years.
Shaheen et al. [8] studied the PBS systems as a sustainable transportation alternative, showing the evolution of three generations of PBS systems in Europe, America, and Asia from 1965 to 2010. The figures reported in the paper (100 bike sharing systems operating in 125 cities with more than 139,300 bicycles) have had an important growth in the following years. At the end of 2016, the amount of public use bicycles worldwide was 2,294,600, almost twice as much as the previous year (1,270,000) and the number of bike sharing systems grew up to 1188 [9]. In 2019, the total number of PBS systems is 2785, of which 143 have been implemented in Spain [10] (see Table A1, in Appendix A).
In the research community there exists also a growing interest to understand how public bicycle systems have been implemented and what difficulties influence travel behaviour. Conforming to Fishman et al., [11] the increase of public bicycle share systems world-wide has driven the growth of related academic literature. In consequence, a number of features related to the utilization of PBS systems have been identified.
The system's infrastructure and operating characteristics are positively related to the use of bike sharing, such as accessible sign-up procedures, opening hours of 24/7, or incentives for sign up. Buck and Buehler [12] and El-Assi et al. [13] concluded that the existence of bicycle lanes is evidently a significant characteristic. Nevertheless, a critical impact on PBS users is the location of docking stations [14]. Positive aspects are related to the closeness of docking stations to residential housing [15,16], to retail outlets, transit stations [13,17,18], and to other share stations [13,19]. On the other hand, stations located away from the center of the PBS system [17] or near major roads [20] tend to reduce ridership. Bhuyan et al., [21] on the other hand, introduces a new equity-based planning methodology that minimizes segregation and marginalization for planning practices. The geographical methodology proposed includes a modified bike equity density population index and a traffic stress index level to prioritize bicycle-sharing infrastructure.
Socio-demographic features have been considered by researchers, and population and employment density, mixed-uses, retail density, and the education level of riders are associated to higher public bicycle use [13,19,22,23]. As a result, economic savings have been established to stimulate those on a low income. Regarding gender, men use bike share more than women do, however the disproportion is not as exaggerated as personal cycling [16]. In addition, regarding gender and use in terms of origin destination stations of their trips, Nickkar et al. [24] concluded that females tended to have more recreational trips and start and end their trips from and to the same station, when compared to male riders.
Weather conditions have been also studied, showing that rain, high humidity levels, and cold temperatures negatively affect bike share use [25]. Moreover, lower amounts of ground snow, lower humidity levels, and higher temperatures were positively correlated with bike ridership [13].
The optimization of the service in order to boost the usage has raised a clear interest of the scientific community. In this context, a number of spatiotemporal bike mobility models based on historical PBS data have been developed to predict station-level hourly demand in a large-scale bike-sharing network [26][27][28]. Moreover, Hu and Ji et al. [29] devise a trip advisor that recommends bike check-in and check-out stations with joint consideration of service quality and bicycle utilization. Nickkar et al. [23] conclude that there are distinctly different patterns in bike share use on weekends and weekdays.
The digital footprint of the users has been an important data source to fill the gap between bike sharing demand and supply. In this context, some studies make use of bike share systems smart cards [30,31] to study the demand modelling travel time and trip chain by gender and day of the week. Bordagaray et al. [31] contribute to the knowledge about the behaviour of cycling in PBS systems and, in addition, provide a key instrument that is beneficial both for decision makers and for operators, that supports demand analysis for redesigning the service and optimizing it. Social media provides a publicly accessible mean of digital footprints, and current researches include the evaluation of urban transport through the analysis of social media content. Das et al. [32] examine Twitter channels to extract patterns for understanding factors that influences people towards biking. Serna, Gerrikagoitia, Bernabe, and Ruiz [33] have analyzed the bike-sharing systems in Spain through natural language processing (NLP) techniques of social media. Collins et al. [34] also have used sentiment analysis of Twitter data to evaluate transit riders' satisfaction in the city of Chicago (USA). Rahim Taleqani, Ali, et al. [35] have examined Twitter posts to evaluate public opinion on dockless PBS systems. They use sentiment analysis to decide the polarity (positive or negative) of tweets, the tweets' underlying issues, and their extent of commitment and influence in decision-making.
Dockless bicycle sharing systems are a current trend, and there are recently available studies that asses the promotion of bicycle use to change travel modes in metropolitan areas, and the critical factors that further that aim [36][37][38][39].
The impact of online social media together with statistical data that provide variables such as demographics and climate data (population, temperature, precipitation, number of docking stations, etc.) on the use of PBS systems has been ignored so far in the literature. This study tries to fill this gap, and contributes to a new stream of research on complementing traditional data with information extracted from online social media to explain travel behaviour and develop predictive models of travel modes' use.

Research Methodology and Processes
The proposed methodology approach comprises nine main steps, in two different phases, shown in

156
Spanish bike observatory and official statistics) and sentiment analysis data provided in phase 1.

160
The study area embraces public bike share systems in Spain. Data from social media, the addition, data from TripAdvisor has also been analyzed due to the relevant presence of related 164 UGC. In total 12,316 comments from the three aforementioned social media networks written in    First, phase 1 deals with sentiment analysis based on social media (TripAdvisor, Facebook, Twitter) and the result is the identification of the main concepts commented by users and the polarity of them. After, phase 2 creates a panel model that combines statistical data (PBS websites, Spanish bike observatory and official statistics) and sentiment analysis data provided in phase 1.

Study Area
The study area embraces public bike share systems in Spain. Data from social media, the Spanish bike share observatory, PBS websites, and Spanish official statistical have been used. Regarding social media data, the main data sources are related Twitter and Facebook pages. In addition, data from TripAdvisor has also been analyzed due to the relevant presence of related UGC. In total 12,316 comments from the three aforementioned social media networks written in different languages (English, Spanish, and Catalan) have been analyzed. The UGC data sample includes data from 2013 and 2014. The 2013-2014 timeframe has been selected because the last published PBS statistics are from that period.
In addition, Spanish bike share observatory, PBS websites, and Spanish official statistical data from 2013 to 2014 have been used to obtain the statistical model on PBS use. Regarding these data, 32 PBS systems with available data in the 2013-2014 period have been included in our investigation.

Social Media Data Analysis Methodology
Social Media analysis is done following a quantitative and qualitative approach [40]. For the reason that the aim of the use of sentiment analysis techniques is to obtain an overall impression of sensed feeling, great quantities of data were examined [41,42]. The methodology comprises six main steps.

Social Media Sentiment Analysis Process
Step 1: Source Identification To achieve this phase, a semantic discovery program on the internet has been implemented to discover sources of significant data to be examined. This program is a meaning-based search engine. In this way, it can provide search outcomes based on meaning matches, rather than search terms popularity. This program has uncovered a diversity of sources of interest such as Facebook and Twitter channels, the Spanish bike share observatory, and the "Bike Sharing World map" [10], aside from specified segments of TripAdvisor with reference to the "Transportation" classification.
Considering that each online social network usually has a specific user profile, we decided to collect social media data from three different sources: Twitter, Facebook and TripAdvisor.
The characteristics of the user profile of the TripAdvisor are very different to Twitter and Facebook. TripAdvisor contains traveller experiences and it may be used for distinct motives: leisure, work, etc. Twitter and Facebook contain mainly daily user experiences and opinions about bike agencies. In this way, the probability of having biased information is lower.
Facebook: For 32 PBS systems in Spain, there is a Facebook page. Spanish, English, and Catalan posts of these pages have been monitored.
TripAdvisor: The compilation of information was carried out with comments that contain information about mobility in "Transport" and "Outdoor activities" sections (Spanish and English).

Step 2: Social Media Source Acquisition
This procedure consists in the treatment of the unstructured data, such as data collection, normalization, and cleaning. A scraper program has been developed to capture data from TripAdvisor. Data collecting from Twitter has been done by selecting bike-sharing channels using the twitter4j Java API (application programming interface). The data extraction from bike sharing in Facebook has been implemented using RestFB API, a simple and flexible Facebook Graph API client.

Step 3: Data Preparation for the Analysis
It consists of two distinct parts: morph syntactic and modelling analysis. The initial phase consists in loading one by one the comments to detect the language using a Shuyo language detector [43]. After that, Freeling, an Open-Source Suite of Language Analyzers [44] with the corresponding WordNet lexicons [45], and Aspell spell checker [46] are configured for these languages. In the case of Aspell, localisms and abbreviations are added. Then, the spell checker is applied to correct the texts. The normalization of the comments is a critical process, that includes the treatment of the abbreviations and as well as of emoticons. Next, ad-hoc software and the Freeling Analyzer (WordNet embedding) are applied and as a result of this process, each word is morphed syntactically noted within a transport category.

Step 4: Sentiment Analysis
In this step, bicycle and bike nouns are identified with their adjectives and common nouns that are classified by number of occurrences to get detailed information about them. Moreover, sentiment analysis is done using a spectrum from −1 to 1.
The polarity is calculated with the SentiWordNet [47,48] polarity lexicon. Nevertheless, in SentiWordNet the meaning of 'synset' is used to define a term with a particular significance and part-of-speech tagging. Subsequently, a word can have several 'synsets', ergo distinct meanings depending on the context, and therefore, the scores can be totally different, both positive and negative, as well as neutral. In order to select the correct 'synset', UKB Word Sense Disambiguation program [49], has been used. Then, a random manual analysis of 10% of the total of the comments classified as positive or negative is performed, in order to obtain a greater accuracy adapting to the domain. With these two classifications, the algorithm is trained with a supervised learning method using the previously analyzed data model.

Step 5: Repository
This procedure is essential for the scalable storage and management of the data. The downloaded comments are homogenized to a common structure (XML) and saved in Apache Solr search engine [50].

Step 6: Dashboard
A software platform implements the processes described in the previous steps. This platform has a visual interface (dashboard) to manage and interpret easily the results. The dashboard (shown in Figure 2) is based on Solr Apache Foundation (open source) and it has been created with a rich and flexible user interface, customizable pie charts, time series, etc. Analyzed data has been indexed to provide a high velocity query answer.

Spanish Official Statistics Data
This section utilizes the results of the sentiment analysis presented above to carry out an investigation of the features influencing bike share use. The objective is to assess the usefulness of sentiment analysis to extract information from social media data to be used as explicative variables in travel behaviour models. 3.4.1.
Step 7: Bike Share Use Model In the process of PBS use of data that are discrete and nonnegative, our approach is to model such count data supposing that the total of uses is derived from a distribution of Poisson. For a discrete random variable, Y, observed frequencies of bike share use yi, i = 1 . . . n, where yi is a nonnegative integer count, and regressors xi (Equation (1)), This model has a common limitation, which is that the distribution of Poisson restricts mean and variance of y i to be equal (λ i ). We want to prevent this restriction from being imposed a priori, so that we utilize the NB (negative binomial), which nests the Poisson distribution as a special case, and accommodate better over dispersed data. The NB model arises as a modification of the Poisson model, in which the mean is µ i (Equation (2)), where exp (ε i ) has a gamma distribution with mean 1.0 and variance α.
To make the most of the PBS use data available from two years, we use a RENB panel model (random effect negative binomial) [51], and the formulas are shown in Equation (3) and Equation (4): where N represent the PBS systems, T i are the observations in the i th bike share system (1 or 2, not for every system are data available for two years). As noted, we use the normal distribution for the random effect instead of the gamma, which is a simpler alternative.

Step 8: Public Bike Share Use Data Description
To develop an appropriate statistical model to examine the connection between the use of PBS and characteristics of the systems, utilizing sociodemographic and climate (see Table A2 in Appendix A) and positive and negative opinion data extracted from online social media information from 32 bike share systems in Spain were used. Daily average PBS use data from year 2013 and 2014 are available, although not for every system. Therefore, the total use data available pooling the two years are 51 cases.
For each observation, 15 possible explanatory variables were considered. PBS use data was provided by the Spanish bike share observatory. The characteristics of the systems were collected from their public websites. Demographics and climate date were collected from official statistics. Positive and negative opinion variables are the average of all factors measured by the sentiment analysis from Twitter, Facebook and TripAdvisor data in 2013 and 2014. This aggregation is needed because there are not data for all opinion variables and cities. Table 1 provides a sample summary statistics of the variables.

Research Results and Discussions
The outcomes of the sentiment analysis reveal that the price of the service, bike, bike stations, dockings, service, experience, maintenance and schedule are the most mentioned topics.
For each topic the good (positive) and bad (negative) attributes (P column −1 to 1) and for 32 cities, the total of mentions (# column) was estimated (see Table 2). Sentiment analysis uncovers that the worst rated features are related to availability of stations, docks and bikes; maintenance, condition and schedule. Maintenance problems have five times as many references as the other features. Negative opinion has also been detected in relation to the service management like incidents that are not managed properly or the computer system to manage the service. The demand for overnight services is also rising.
The worst are the conservation, the quantity and disrepair of the stations and bike docks, the amount and appearance of the bicycles and the schedule.
The inadequate schedule is revealed aside from one of the principal obstacles to utilize PBS systems by certain persons. PBS managers should take this reality into account to meet the requirements of potential customers. With 8.5%, the bike station's density is a problem for advancement. Often the impulse to extend into the area causes some installations to expand the room among stations and therefore consumers may have to abandon the bicycle excessively distant from the start location or while searching for further locations, if the bike station is complete or vacant. It is a technical problem and can consequently allow the user to become less aware of the problem. Sentiment analysis demonstrates that the amount of stations and the range of metropolitan areas are problematic.
Among positive aspects, price and biking experience are the most valued elements. There are only 14 negative experiences of a total of 845 comments. Systems that provide electric bikes such as Urbanbike and Elecmove (Bilbao, Seville respectively) have remarkable positive results.
The perspective of end users and organizations or companies can be differentiated in Facebook and Twitter and a separate analysis has been performed. Price is positively rated for both groups and it has been qualified as competitive, super promotional, interesting, incredible, very good bargain or very special for companies. For users, price is qualified as: very good price, appropriate, very good value, cheap, very good, good price, inexpensive, very permissible value, unbeatable quality/price is ideal, right, unbeatable value for money, very reasonable, recommended, reasonable, great value, perfect, great, super affordable, cheap, super, quality correct price. The comfort of the bicycle (4.4%) and price (2.2%) seem to be a less critical issue to use the service more often. Depending on the sentiment analysis, the best-evaluated attributes include the price and cycling experience. These results confirm that cost is one of the beneficial variables leveraging PBS use.
Comments from TripAdvisor are positive mostly because the experience is generally dependent on the type of client, i.e., tourists or daily clients. Tourists comment mainly that leisure experiences are usually well organized, aided by a tourist guide and other amenities to help the customer appreciate the place. In TripAdvisor, the service's observations are very favourable, such as: on top of the excellent service (20), professional (22), fantastic (23), perfect (27), nice (40), helpful (48), amazing service (89), excellent (112), friendly (138), good (184), great (195 mentions), and not just good on Facebook or Twitter.
In order to develop PBS systems as a popular type of transport for a broader ratio of inhabitants, the electric bike and bike paths play a key role. It is noteworthy that the electric bike experience is well rated by users, and combined with proper paths the usage of the bike will be increased as physical condition and cycling skills lose importance.

Step 9: RENB Panel Model Results
The RENB panel model was computed using the Butler and Moffitt method, with Gauss-Hermite integration. Limdep v10 [52] was used to this end.
To avoid problems of multicollinearity between variables that may bias the standard error of the coefficients and hence result in wrong signs or implausible magnitudes in the coefficients, only the most significant variables are included in the model. Table 3 shows that log-likelihood ratios increase as more explicative variables are included in the model. Similarly, information criterion AIC and AIC/N (Akaike information criterion) also decrease, which are an indicator that the model is not over fitted. The dispersion parameter alpha parameter is nearly equal to zero, although with a low significance in the final model.
In the RENB panel model, all variables considered are very meaningful. An analysis of Table 3 demonstrates that three variables have a favourable effect, while two variables have an adverse effect on bike share use. As expected, high average precipitation is associated with lower use of bike share systems. This is in line with findings by [24]. On the other hand, high average temperatures are positively related to bike share use, as also found [13]. It would also be worth emphasizing that some important explanatory variables are not an inherent feature of PBS systems (e.g. average annual, temperature) and the providers of bike share services do not have an impact on them.
The number of docking stations is also associated to a high bike share use, supporting the hypothesis that it is important to increase accessibility to the system [20]. The number of docking stations is statistically significant in the current model, which prevent any scaling problem related to the different size of the cities.
The most important finding of the model results is the significance of the variables representing positive and negative opinions of the PBS system in social media. As expected, positive opinions are related to high PBS use, and negative opinions are associated to low PBS use. The significance of the positive opinion variable is especially high (z = 22.79), which is an indicator of the importance of this variable in explaining the use of PBS systems. This reasonable result supports the hypothesis that social media can complement travel data collection and act as a reliable source of data of adequate quality for transport planners, operators and policy makers to satisfy their requirements.

Conclusions
This paper presents a non-traditional methodological approach that has been applied to uncover the main enablers and barriers for the adoption of PBS systems taking advantage of publicly available data sources. It has been demonstrated that sentiment analysis is a useful technique to extract information on the perceptions and attitudes of people from social media comments toward PBS systems. Moreover, the results of the sentiment analysis are complemented with demographics, climate information and the features of the PBS to elaborate a statistical model that explains PBS use. The presented data, methodology, and results are very much gaining an understanding on how the use of PBS can be optimized becoming an efficient and competitive mean of transportation.
The findings of the sentiment analysis show that the success of PBS depends on a good balance of the following five key components: station density (having more bike docks than bicycles is vital to guarantee that parking space is available for a bicycle in several places, the distance among bike stations, etc.); bicycles per person (bicycles to population radio should be large enough to satisfy demand, and take into account bicycles accessible throughout heavy demand periods); coverage area (area to easily and conveniently cycle and park); quality of bikes (conservation and appearance); simple procedures of stations (simple method to checking out, friendly payment user interface, responsive design web site, etc.).
A random effects negative binomial panel model has been used to study PBS demand in Spain. Despite of the limited sample size, it was possible to obtain a model specification in which all explicative variables are significant, and their estimated coefficients have an acceptable significance. Weather, system features, and two variables related to the outcomes of the sentiment analysis (average positive and negative valuations of each system) are statistically significant. This result emphasizes the importance of considering social media data to complement traditional data for transportation planning applications.
The research findings may be helpful for decision-making on sustainable mobility in urban transport. This research concludes that social media sentiment analysis can be applied to improve the traditional techniques, by studying travel behaviour and obtaining an overall and enhanced view of urban transport planning. Continuous information from the social media enables updating prediction models of demand and controlling the quality of PBS services. In that way, investments in PBS services might be more effective. Moreover, it is important to remark that the methodology can be applied to several travel modes beyond PBS in Spain, overcoming the shortcomings of information collection using surveys. Additionally, in real time and over long periods, information is accessible to enable dynamic analysis to be carried out.
Next, some general conclusions of the PBS systems in Spain are presented as results of analysing general parameters of this mode of transport such as evolution, number of stations, age and distribution.
Regarding the evolution, there have been 143 PBS systems from 2007 to 2019 and 79 of them are currently in operation (55%). In relation to the age of the active systems and the duration of those that closed, it has been observed that at the end of 2014 most of the PBS systems in operation were between three and eight years old (5.7 on average). The effects of the economic crisis together with the end of public programs could be the cause for the lower number of inaugurations in recent years, resulting in a shortage of "young" systems (less than three years old). In 2018, 52% of active PBS systems are eight years old or older. 49% of closed PBS systems were two years old or less. This may show a certain lack of planning or of real long-term support for the public bicycle, as it is well-known that the public bicycle is not cheap because it requires long terms for depreciation, and as every mode of transport needs time to create a stable modal change.
Since 2010 the total number of stations of all public bicycle systems has increased. Regarding the size of the systems, they have grown almost permanently, from about 5,000 bicycles and 800 stations in 2008 to about 25,000 bicycles and 2,000 stations in 2014. Although many systems closed, the survivors increased their size considerably and the new ones were much larger than those that closed. The closed systems had an average of six stations, while the still active systems have 32. Therefore, it can be said that in Spain a public bicycle overpopulated by small systems has evolved to another one represented by medium-sized ones. In 2018, a large part of the PBS systems that have been implemented had a rather small size (63% had less than 10 stations). However, only 16% of systems with less than five stations and 19% of those with five to 10 have survived. In contrast, all systems with more than 30 stations were still running.
Concerning distribution, it can be concluded that a higher percentage of PBS systems has been closed in municipalities with a smaller population than in those with a large population. Probably, the European crisis together with a lower budget of these municipalities have made the project economically unfeasible in many cases. The municipalities with between 20,000 and 50,000 inhabitants have hosted the largest number of PBS systems in Spain. However, the smaller the municipality, the lower the probability of its survival. In municipalities of less than 20,000 inhabitants about three quarters of the systems have closed. On the contrary, all the systems that have been installed in cities with more than 500,000 inhabitants still remain active. Currently, there are barely any PBS systems left in small municipalities. Possibly, it has also been influenced by the fact that public subsidies are targeting municipalities between 50,000 and 300,000 inhabitants (for example, the savings and energy efficiency subsidies of the Generalitat of Catalonia of 2010). In 2018, only 8% of the systems implemented in municipalities with between 20,000 and 50,000 inhabitants have survived, while that percentage rises to 100% in municipalities with more than 500,000 inhabitants.
Regarding limitations, the available statistical data from the public bike observatory correspond to the 2013-2014 period, so the more recent social media data has not been used to obtain the statistical model on PBS use. Nevertheless, when the limited sample provides statistically significant results, authors are positive about the extending the use cases to other PBS systems from different countries combining social media with PBS statistics.
For future work, Global map, that is a new data source that comes from the EUNOIA (Evolutive User-centric Networks for Intraurban Accessibility) project [53] will be complete the research. EUNOIA is committed to taking advantage of smart city technologies and complex system science to develop new models and tools for the design of sustainability policies for city governments and their citizens. In 2019, this data source provides online information about bike sharing in 468 cities around the world, 15 of them in Spain: Albacete, Barcelona, Bilbao, Castellón, Gandía, Gibraltar, Girona, Leon, Palma, Santander, Zaragoza, Seville, Tres Cantos, Valencia, and Valladolid. There are data about bike use by bike station and hour for the last 24 hours. This level of granularity provides data for predictive analytics and enhanced demand models.