Data driven methods for effective micromobility parking

In this work, we propose a data-driven method to use proven clustering algorithms for establishing shared electric scooter (SES) parking locations and assessing their anticipated utilization. We first address the problem of finding locations for a given number of parking facilities, based purely on demand, that maximize the number of trips that would likely be parked at these facilities. We then formulate an enhanced version of the SES parking facility problem in which exogenous environmental factors are considered, such as sidewalk width. Parking SESs on narrow sidewalks raises accessibility concerns for other users of this infrastructure and capturing these trips in dedicated parking facilities is a valid priority to trade off with pure demand maximization. These methods are demonstrated in two case studies, which use a large SES dataset from Nashville, Tennessee, USA. We provide empirical results on how many facilities are needed to serve demand of SESs and necessary capacity allocation of the facilities. When the methodology considers sidewalk width in facility placement, the refined parking locations can address 300% more problematic trips parked along narrow sidewalks, with only a nominal sacrifice, around 13%, in the overall number of trips served.


Motivation
The launch of shared electric scooters (SESs) within cities has added a new dimension to the transportation sector, amassing 38.5 million trips in 2018 [1]. The rapid deployment and user adoption of the dockless devices has quickly created management challenges for cities, especially related to infrastructure. Since the dockless devices can in principle be parked anywhere, users must be convinced to sacrifice some convenience for orderliness. Policy approaches such as geofencing, parking fines, parking guidelines, and signage are all common elements of the strategy [2]. A study during the San Francisco scooter pilot program found that encouraging riders to park in designated areas to decreased scooter parking citations by 88.9% and gained higher civilian approval of micromobility as a whole [3]. The determination of the best locations for these parking facilities is vital, especially when the use of parking facilities is voluntary.
Poorly parked scooters are not only a source of potential injuries, but could also promote inequity. Ill-parked scooters can become obstructions that can prevent people with disabilities from freely moving around on the sidewalks [4]. This, especially, becomes a concern when scooters are parked on sidewalks and cause noncompliance with the Americans with Disabilities Act [5] standards for sidewalk clearance. For example, Figure 1 shows the approximate width of sidewalks in Nashville, TN. Sidewalks colored orange and red (i.e., less than 7 feet wide) simply do not have adequate width to accommodate even a carefully-parked scooter, while still maintaining adequate width for accessibility.
The competition for scarce sidewalk space is demonstrating a need for dedicated parking infrastructure in key locations to reduce these concerns. Determining the locations for dedicated parking is based on trip demand and, potentially, the state of existing infrastructure such as sidewalk width. The high volume of scooter trips combined with the complex nature of the location problem make determining where to place these facilities a challenging process.

Problem Statement
The first concern of this work is identifying parking infrastructure locations for dockless e-scooters.
In this context, parking areas would be marked on small areas of roads or sidewalks for e-scooters and may be instituted as encouraged practice or as mandatory. The infrastructure is desired to be effective in improving user parking behavior as broadly as possible, so therefore should be placed in the most effectual locations.
A secondary, but still vital concern for SES infrastructure is the issue of poorly parked scooters becoming obstructions along sidewalks. Scooters parked along narrow sidewalks have a larger possibility of obstructing the pathway and creating issues with ADA compliance. Therefore this work provides methods to prioritize infrastructure in areas that can mitigate this problem, while maintaining allocation of facilities to high-demand areas.

Contributions
To our knowledge, this is the first study in the United States to propose a data-driven solution for locating and prioritizing SES parking facilities in continuous space based on multiple factors, while also evaluating the efficiency and anticipated utilization of parking facilities. It also provides empirical results on how many facilities are needed to serve SES demand in a city setting and the capacity allocation of such parking facilities. In addition to demand, our study is unique in considering additional factors for facility prioritization, such as influencing characteristics of the built environment. This very well could assist cities in using their own micromobility data to place SES parking facilities in the most effective areas.
A summary of the contributions of this work are as follows: 1. We propose a data-driven solution for locating and prioritizing SES parking facilities in continuous space using straightforward clustering methods.
2. We extend our proposed parking methodology to incorporate external infrastructure for the parking placement problem through weighted clustering. As an example, we focus on sidewalk width for ADA compliance and find that a 300% increase in capture of problematic narrow-sidewalk parking can be gained with only 13% trade-off in overall trip capture.
3. We provide empirical results on how many facilities are needed to serve SES demand and the capacity allocation of these facilities. Empirical results are presented from two case studies, one focused on the Vanderbilt University area and one on Nashville, TN.
The remainder of this article is organized as follows. In Section 2, we summarize recent literature related to our research. Section 3 explains in detail the data-driven methodology for facility placement and evaluation. In Section 4, we discuss the case study dataset from Nashville, TN. Section 5 shows results from the first case study on determining efficient demand-based parking locations on Vanderbilt's campus. Section 6 contains the second case study that also considers factors related to existing infrastructure in the placement of parking facilities for the Nashville metro area. In Section 7, we discuss practical application of our findings. In Section 8, the article is concluded along with an overview of possibilities for future areas of research.

Literature review
In this section, we present a review of relevant literature pertaining to facility location planning using optimization and data-driven methods such as clustering.
Optimization approaches are quite common in literature when determining optimal facility placements. These methods often follow the approach: maximize or minimize an objective function quantifying the desirability of a facility placement, given a set of constraints. The objectives and constraints for facility location problems vary across the available literature and include: maximizing coverage of the network [6], minimizing burden on users to find a parking facility [7], and maximizing user satisfaction [8], to name a few. Constraints may take into account a limited budget [8], availability of devices in the network [6], or the capacity of each parking location [9], for example. Researchers have also explored the reallocation of dock capacity [10] and the rebalancing of bicycles in bike sharing facilities [11] from an optimization standpoint.
A study on dockless bikes in Shanghai, China presents an optimization framework that is used for the planning of parking infrastructure utilizing the clustering algorithm DBSCAN; parking locations were geofenced parking zones with no physical boundaries [12]. Another study in Ningbo City, China, used k-means to analyze the spatiotemporal behavior with the goal of grouping stations according to similar characteristics [13].
Other studies have used clustering to determine resource allocation in bikeshare systems, grouping stations on similarities of usage patterns [8]. A micromobility study performed in Madrid, Spain used a GIS approach for docked bike share station placement [7] which minimized cost between demand points and maximized station coverage. They also evaluated practicality of the locations in regard to popular destinations.

Methodology
In this section we describe our approach to placement and sizing of parking facilities for shared urban mobility devices (SESs). First, we conceptually describe our methodology and present a flow chart as a visual aid. We then provide a mathematical representation of our methods. The clustering algorithms and process by which their parameters are tuned is then presented. Next, we discuss the placement of the actual facility within a cluster and how it is assessed. Finally, we explain how to incorporate external features by utilizing weighted clustering.

Conceptual overview of methodology
The purpose of this analysis methodology is to find a set of point locations for micromobility parking facilities that maximize the total number of unique trips that could use them. The input is a list of trip endpoints from historical data -where each trip was parked. Clustering algorithms are used to determine high-density clusters of points where parking facilities should be located. The methodology can also take as optional input built environment factors that correspond to each trip endpoint. The output is a list of parking facilities, prioritized according to the number of trips that will potentially use the facility. Figure 2 shows the steps of the methodology on a small area of real data in which a parking location is placed. In step 1, we collect raw scooter trip data, consisting of each trip's end locations.
We preprocess data, apply temporal and geographical filters, and separate data into training and testing sets, which is discussed in Section 4. Only the training dataset is shown. In step 2, we apply clustering algorithms and perform hyperparameter tuning, explained in Section 3.3, on training data to generate clusters of trip ending points. Additional factors for facility placement may be considered within step 2 by modifying the weight of each point during clustering, which is discussed in Section 3.5. A parking facility is placed within each defined cluster of points, and is assumed to be located at a single point with a circular area of influence centered at this point. In step 3, we determine exact parking facility locations based on distributions of trip ending points within clusters, which is discussed in Section 3.4. In step 4, we assess parking locations by counting the number of trip endpoints falling within a capture radius. We assume that users ending their trips within this distance from the facility will park at the facility instead of the surrounding area. A capture radius of 100 feet is used for analysis and represents a reasonable distance for a micromobility user to walk from a parking facility to their actual destination [14].
The performance of a given output -a set of facility locations -is determined by the total number of trips that fall within the capture radius from any of the parking locations. This number of trips is referred to as trip capture. The objective is to place a desired number of parking facilities such as to maximize trip capture. However, given that this approach does not inherently produce a global optimum, a large number of possible realizations with varying performance are possible for a desired number of clusters. This fact requires a process of hyperparameter tuning for clustering algorithms to find the highest-performing output, which we discuss in detail in Section 3.3.
We use a training dataset of trip endpoints to establish parking locations with the clustering algorithms. The performance of the parking locations is calculated and reported using a testing dataset, which does not intersect with the training dataset in order to eliminate potential bias. The relationship between the number of parking facilities and total trip capture works both ways. A desired level of trip capture (e.g., 25% of trips) can also be used to estimate the number of parking facilities required. The number of trips captured per facility, which we refer to as capture efficiency, is also an informative metric for decision making. It can help quantify utility or cost-benefit of the infrastructure investment. The following summarize the performance metrics for parking locations: • number of trips captured: the number of trip endpoints located a distance less than or equal to a given capture radius from the parking location coordinates.
• percentage of trips captured: the proportion or percentage of trip endpoints captured; applicable for a single location or totalled across a set of locations.
• capture efficiency: the number of trips (or mean number of trips) captured per parking location, given a set of multiple locations.
• capacity: the size of a parking facility in terms of the number of scooters it can accommodate.

Mathematical description of methodology
We now describe in mathematical notation the methodology of this work. This notation maps onto the conceptual overview and will be referenced later in further descriptions of the methods.
Let be the input set of trip endpoints, each defined by longitude (x) and latitude (y) The endpoints, , may be divided into mutually exclusive training data and testing data, denoted tr and ts , respectively. Let be a set of parking facility locations (longitude, latitude), such that = {( 1 , 1 ), ( 2 , 2 ), . . . , ( , )}. A clustering algorithm combined with a facility placement procedure, represented loosely by the function , produces high-demand clusters of trip endpoints and places parking facilities within each cluster.
It uses training data to generate facility placement so that the testing dataset may be used to assess the performance of the facilities. The clustering algorithm is parameterized according to relevant hyperparameters . The function can therefore be written ( tr , ) = .
Finding a set parking facilities, , that maximizes trip capture depends on finding an ideal hyperparameter set, , because most clustering algorithms can generate a range of possible outputs for a desired number of clusters (in this case, parking facilities). Let the trip capture for a set of parking locations, , be based on the testing dataset of trip endpoints and be denoted by the function ( ts , ). The maximum trip capture for a given number of facilities, , can therefore be written: The value of that maximizes this equation is denoted * . Note that in this work, * is determined by sampling the space of and may not necessarily constitute the true global optimum.

Clustering algorithms
In this work, we assess the performance of three different clustering algorithms -k-means, DB-SCAN, and HDBSCAN -against our objective of finding the highest performing parking facility locations. The intention of this comparison is not an exhaustive survey of possible algorithms, but rather to identify a method that has been used for these types of problems previously and select one for continued use in the case studies of this article. A factor in choosing these algorithms was their straightforward understanding and implementation.
k-means is a classical clustering algorithm [15] that has been used successfully in facility placement in other domains [16,17,18]. It is initialized with a specific number of clusters into which all of the training data is partitioned. DBSCAN is a density-based clustering algorithm that can discover clusters of arbitrary shape conforming to two hyperparameters and will exclude points in the dataset that are not part of a defining cluster [19]. HDBSCAN was introduced [20] as an extension [21] to the existing DBSCAN algorithm and can find clusters of varying densities using two hyperparameters [22].
The process of hyperparameter tuning in this work is an essential step to determine * , the hyperparameter set for an algorithm that maximizes trip capture for a given number of parking facilities, . Note that * is selected by the sampling process described here, and may not be the true global optimum.
k-means does not require hyperparameter tuning, but due to its random initialization of centroids, requires numerous runs to determine its best performance. We performed 10 runs of k-means for each value of and select the result maximizing trip capture.
A two-stage grid search was used for DBSCAN and HDBSCAN, both of which have two parameters that can be changed. The first stage searches a wide range of values and the second searches more narrowly in an area of interest identified during stage one. From the explored hyperparameter combinations for each algorithm, a best combination, * , was selected for each value of (number of clusters

Parking facility placement and assessment
We now discuss how to take a cluster of points, representing a high-demand parking area, and choose a single point for the parking facility. Clusters can have non-uniform shape and density distribution, so choosing a single point within the cluster can influence trip capture. We desire a low-complexity method of determining facility placement within clusters, because in practice the placement will be dictated by factors on the ground (e.g., on street parking, sidewalk area, etc.).
In case study 1 at Vanderbilt University, we place each parking location at the latitudinal and longitudinal median coordinates of each cluster. This choice was made based on the observed shape of clusters in this case study, which was predominantly circular or ovular. In case study 2 for the broader city of Nashville, more elongated and non-convex clusters emerge, which require a different method. We evaluate the trip capture of at least 100 candidate locations by sampling a grid space within the bounds of each cluster and take the location maximizing trip capture.
More complex methods for this level of facility placement exist but, again, are perhaps secondary to the built environment considerations at this scale. It must be noted that the placement by these methods is performed without consideration of physical constraints; for example, a parking location may occur within a building or in the middle of a street.

Weighted clustering
In this section we show how to incorporate features external to a micromobility trip dataset by performing weighted clustering. In the second case study of this work, we demonstrate this process by incorporating the sidewalk width at each location where a scooter trip was parked. This factor is of particular interest in cities where high demand is being experienced in areas where sidewalks are narrow and scooters present a potential obstruction. It may be strategic to reallocate some parking facilities from higher demand areas in order to address sidewalk constraints. Increased weighting of narrow-sidewalk trips allows us to evaluate this tradeoff, which we explore in Section 6.
By assigning each training data point in tr a weight, , we can emphasize the importance of some trip endpoints over others with respect to identified parking clusters. Consider a point that has double the weight of another point , such that = 2 * . In effect, this amounts to two points at the location of relative to one point at the location of . Weighted clustering is intuitively supported in DBSCAN via the hyperparameter: the weight of points in a cluster must sum to greater than or equal to . The default weight of each point (the "unweighted clustering" case) is, therefore, equal to 1.
For weighting trips according to the width of the sidewalk along which they are parked, we utilize a step function, given in equation (2), that assigns higher weights to trips ending along sidewalks with width less than an threshold, thresh . The increased weight is denoted ADA and corresponds to a threshold where ADA requirements [5] could potentially be infringed by a parked scooter: thresh = 5 feet.

Micromobility dataset
The The subset of data used in case study 1 includes only trips that begin or end within Vanderbilt University's campus boundary, while case study 2 uses the full Nashville dataset. Summary statistics for these two datasets can be found in Table 1.

Case study 1: Determination of efficient parking locations
In this section, we demonstrate the results of applying the parking facility placement methodology on the Vanderbilt subset of scooter trips. We also demonstrate facility capacity allocation and results. The Vanderbilt data subset is used in case study 1 because the distribution of demand on the campus is very localized around buildings and this provides a straightforward interpretation of location and ranking results.

Facility placement and capture potential
We explore three clustering algorithms for the placement of SES parking facilities with the primacy goal of evaluating representative candidates that are straightforward to implement and interpret.
We sort the Vanderbilt subset of the scooter trip dataset chronologically and take the first 75% of trips as training data and the latter 25% as testing data. The facility locations generated by each algorithm on the training data are evaluated with respect to the trip capture using the testing dataset and a capture radius of 100 feet. Trip endpoints that are located within the capture radius of two parking facilities are counted only once towards the total trips captured by all facilities.
The trip capture results for k-means, DBSCAN, and HDBSCAN algorithms are shown in Figure 3 in terms of the total number of trip endpoints that were captured out of the testing dataset. DBSCAN performed better than HDBSCAN and k-means across the range of cluster counts. Some noise is seen in the performance of all three algorithms such that, at times, increasing the number of clusters decreased the percentage of trips captured. This is due to the variability of the non-exhaustive hyperparameter tuning process. The performance gap between algorithms stayed relatively consistent across cluster counts.
At low numbers of clusters, k-means finds large, sparsely distributed clusters and facility placement by the median-based approach is not well-suited to these malformed clusters. For this reason we find density based methods such as DBSCAN and HDBSCAN provide more consistent results regardless of the number of clusters. The cluster assignment of trip endpoints is shown by the color of points (unassigned points are in grey). The selected facility location for each cluster is denoted by a red X and a 100-foot capture radius is overlaid on each. A subset of parking locations, representing unique areas of interest on campus described in Table 2, is selected and marked with letters A-E on the map in Figure 5.
While there appear to be areas of concentrated but un-clustered points, these areas fail to meet the mathematical cluster requirements for this run of DBSCAN. Indeed, it is difficult to discern visually from a map of tens of thousands of points where the most significant concentrations lie. This is an advantage of our methodology when using DBSCAN: clear delineation between clusters meeting criteria and all other points.

Parking facility capacity
In order to evaluate the expected capacity requirements for parking locations, a derivative dataset is constructed that consists of periods of time and locations where devices are idle. By matching trips made on the same device, we determine the dwell period and location of devices between trips.
We again examine the five parking facilities marked A-E in the 19-cluster efficient realization. For each location, we compute the number of devices dwelling there during every minute covered by the training dataset. Only for the capacity evaluation, we use a capture radius of 300 feet, as opposed to the 100-foot capture radius for parking location determination. We are intentionally making an higher estimation of trip capture in order to air on the side of placing facilities that will be able to fulfill a larger demand and accommodate possible changing user behavior in the future.
Times at which no scooters are present at the location are removed and the remaining minutes are exhibits the same mid-day peak, but is heavily weighted towards weekend activity. Location A (Rec. Center) has a unique evening peak demand, stable throughout the week.

Case study 2: Parking considering sidewalk infrastructure
In this section, we present a case study on applying factors secondary to user demand for parking facility placement. The secondary factor considered is sidewalk width nearest the parked scooter, which is important in evaluating whether the sidewalk is at risk of being blocked by parked scooters.
Even if a sidewalk meets width design standards, it can still be blocked by a device that was not parked carefully. We begin by describing how sidewalk width is calculated for each parked trip and incorporated into the clustering algorithm. We then analyze the effects different weights have on the formation of clusters and capture efficiency of the systems.

Sidewalk width and data preprocessing
The width of the sidewalk associated with each trip is calculated specifically for its parked location, because sidewalk width can vary significantly even along the length of a city block. We begin with the dataset of scooter trip endpoints, and spatial layers of sidewalk geometry.

Number of Scooters
Parking location E 98% 95% 90% 85% 80% 75% Figure 6 : Distribution of the number of parked scooters based on duration of time that occupancy level is present. This is calculated only for times when one or more devices are present. The 75th, 80th, 85th, 90th, 95th, and 98th percentile capacity values are shown for each. Note that some percentile lines are overlaid on top of each other due to discrete capacity values. Also note that the x and y axis have been standardized across the subplots to 14 scooters and 12,000 minutes. with associated sidewalk width less than 3 feet. Sidewalks narrower than 3 feet are not compliant with the 2010 ADA Standards for Accessible Design [5], so we assume that this was an anomaly in sidewalk width processing since no sidewalk should be built at this width.
We separate the training and testing datasets for case study 2 by stratified random sampling (as opposed to a temporal split in case study 1). Stratification is based on sidewalk width for trips, in order to ensure that the more rare narrow-sidewalk trips are proportionally represented in training and testing data. We take 10% of trips as training data (approximately 95,000 trips) and the remaining 90% as testing data. The comparatively low volume of training data is due to computational constraints imposed by running clustering on such a large number of points.

Assessment of weighted clustering
In order to evaluate the effects of increasing the weight of narrow-sidewalk trips, we select three representative hyperparameter combinations, listed in Table 3 that generate varying number of parking locations in the baseline case. These three hyperparameter combinations are then used to train DBSCAN on the various degrees of sidewalk width weighting.
Recall from Section 3.5 that represents the raw weight of trip and that the weights are normalized across all trips to sum to , the total number of trips. The normalized weight for trip is . For non-weighted clustering, which we refer to as the baseline case, all values are equal: = 1. Sidewalk width for each trip is denoted and we increase the weight of narrow-sidewalk trips relative to wider sidewalk trips.
We use these representative hyperparameter combinations to explore the effects of assigning a higher value of (pre-normalized) to narrow-sidewalk trips. A step function is applied to trip sidewalk widths, as described in Section 3.5: trips with respective sidewalk width less than thresh = 5 feet are problematic trips and given a larger value, which we denote ADA ; trips above the threshold are given = 1.    Total capture efficiency decreases almost immediately as the value of ADA increases because this shifts the emphasis from areas with sheer volume of trips towards those with narrowsidewalk trips, and starts to form new clusters on these areas. This effect of shifting parking location emphasis can be seen in Figures 9a and 9b, which compares locations for the baseline case with the case where ADA = 36. With a higher value of ADA , however, clusters make a significant shift from Downtown Nashville to Midtown Nashville, where trip demand is lower but sidewalks are far narrower. Generally, more clusters start to form as we increase the value of ADA -59 clusters for the baseline case and 100 clusters when ADA = 36, in this example. Figure 9c is a heatmap of all scooter trips in the city, which are heavily concentrated in Downtown Nashville. Figure 9d is a heatmap of only the problematic trips, which are less localized and more frequent in areas outside of Downtown Nashville.  Cities indicate by their legislation that the need for parking facilities is as much about corralling the trips as it is alleviating the pressure on the sidewalks from problematic trips. Cities that want to achieve a balance that fulfill the objectives of corralling high number of trips while also alleviating the pressure of problematic scooters incur on sidewalks should choose ADA value that balances both objectives. Figure 10 shows the ratio between problematic scooter trips captured and non-problematic trips captured. As is expected, larger values of ADA shift this ratio towards problematic trips. All three hyperparameter combinations begin in the baseline case by capturing only one problematic trip for approximately every 60 non-problematic trips (1:60). Some variation is seen between the hyperparameter combinations at large values of ADA , but around ADA = 30 we see all three results capturing problematic trips at a ratio of approximately 1:4.

Discussion
Many practical considerations arise with respect to the real-world placement of parking facilities within cities. We consider one such influencing factor, sidewalk width or ADA compliance, in this work and how taking it into account will ultimately influence the number or percentage of trips in the area that are captured within parking facilities. The degree to which a city is willing to sacrifice trip capture for other factors will vary and could be evaluated on a case-by-case basis.
Other practical considerations for facility placement could include distance between facilities, prohibited parking areas, and density of parking facilities within an area. There are valid reasons why a city may elect to establish a dense set of parking facilities in one area, as opposed to a wider range of facilities that may capture more trips. Reaching a critical density of parking facilities may condition users to expect their presence and use them regularly. Maintaining a smaller distance between facilities will make it less likely that a user ignores parking facilities because they are too far from his or her destination. Additionally, placing parking locations close to prohibited parking areas may also increase compliance and decrease frustration with "geofencing". We have observed, qualitatively, that the actual allocation of parking facilities in Nashville has followed a broader approach. Parking areas have been allocated in a variety of neighborhoods rather than focusing strictly on downtown, which has by far the highest demand on a neighborhood level.
These considerations may be made to fit into the framework of weighted clustering under a similar mechanism that we used for sidewalk width.

Conclusions
This article proposes the data-driven placement of dedicated SES parking facilities, so as to maximize their impact in terms of potential capture of trips. These parking facilities are sorely needed in highly congested areas that are prone to clutter of SESs and other devices on sidewalks.
Data-driven placement increases the likelihood that, under an optional usage model, they are effective at convincing users to park devices because of proximity to their destinations. Parking The proposed methods can be customized using weighted clustering to encourage particular characteristics of facility placement. We explore the emphasis of scooter trips ending near narrow sidewalks as a second case study, which is relevant for reducing conflicts between parked scooters and pedestrians or ADA sidewalk requirements. The case study results show that 300% more problematic narrow-sidewalk trips could potentially be captured with only a 13% reduction in overall trip capture.
Future work includes accounting for irregularly-shaped (non-circular) clusters of trip endpoints that should probably be served by multiple parking facilities when capture radius does not cover the entire area of dense points sufficiently. Location determination should consider truly feasible facility locations that are free from obstructions such as buildings and roadways. Capture zones that are non-circular in shape due to occlusion by buildings could be also assessed. We could also compare the clustering methods employed in this work to methods of continuous density approximation, such as kernel density estimation or Gaussian mixture models. Furthermore, a model could be created for the likelihood of SES users to park in optional dedicated facilities, parameterized by the distance from the facility to their destinations, the area in which the facility is located, and other factors.