Portraying ride-hailing mobility using multi-day trip order data: A case study of Beijing, China

As a newly-emerging travel mode in the era of mobile internet, ride-hailing that connects passengers with private-car drivers via an online platform has been very popular all over the world. Although it attracts much attention in both practice and theory, the understanding of ride-hailing is still very limited largely because of the lack of related data. For the first time, this paper introduces ride-hailing drivers' multi-day trip order data and portrays ride-hailing mobility in Beijing, China, from the regional and driver's perspectives. The analyses from the regional perspective help understand the spatiotemporal flowing of the ride-hailing demand, and those from the driver's perspective characterize the ride-hailing drivers' preferences in providing ride-hailing services. A series of findings are obtained, such as the observation of the spatiotemporal rhythm of a city in using ride-hailing services and two categories of ride-hailing drivers in terms of the correlation between the activity space and working time. Those findings contribute to the understanding of ride-hailing activities, the prediction of ride-hailing demand, the modeling of ride-hailing drivers' preferences, and the management of ride-hailing services.


Introduction
In the era of mobile internet, ride-hailing that allows a passenger to hail a private car or a taxi for traveling through a mobile application is a newly-emerging travel mode and it is attracting much attention and many users all over the world (Furuhata et al., 2013;Jin et al., 2018;Tirachini, 2019). For the ride-hailing, the transportation network company (TNC) provides a mobile internet-based platform, and a passenger matches his/her origin and destination with the driver (of a private vehicle or a taxi) who would like to give a ride for the purpose of earning money or saving travel cost (Wang and Yang, 2019).
Ever since its emergence, a variety of studies have been carried out to understand the ridehailing services provided by fixed-or part-time drivers as well as the usage of passengers.
Three kinds of data sources, namely, questionnaire survey data, self-collected operation data, TNC-released operation data, are the widely-used data source of the existing studies, largely determining the output of the studies.
Questionnaire survey is one of the most important research tools in the existing ride-hailing studies, which unveils the characterization of ride-hailing, in particular from the microscopic perspective that is related to personal selection. For example, Anderson (2014) made ethnographic interviews and identified three types of driving strategies for providing ride-hailing services, which were incidental, part-and full-time driving. However, as mentioned by the authors, the ratio of the three types was unknown limited by the lack of data reflecting the overall population. Rayle et al. (2016) compared their intercept survey results with taxi trip data in San Francisco, United States and showed that taxis and ride-hailing were different in user characteristics, wait times, etc. It was found that at least half of ride-hailing trips replaced travel modes such as public transits and private cars, other than taxis. Tang et al. (2019) designed a questionnaire for frequent ride-hailing users and conducted an app-based survey through the platform of DiDi Chuxing. A total of 9762 survey responses were obtained and travelers' behavior changes impacted by ride-hailing were investigated. Taking Santiago de Chile as an example, Tirachini and del Río (2019) examined the characterization of people's selection on ride-hailing services and its effects on travel behavior. Through a household travel survey conducted in Toronto, Canada, Young and Farber (2019) answered the questions regarding ridehailing usages, such as who, when, and why people use the ride-hailing. From the perspective of ride-hailing users, Alemi et al. (2018Alemi et al. ( , 2019 collected 1975 samples through an online survey and unveiled the factors that affected the adoption and the usage frequency of ride-hailing services in California, United States. More recently, Tirachini and Gomez-Lobo (2020) found that ride-hailing services usually increased vehicle kilometers traveled by conducting an on-line survey data-based Monte Carlo simulation. Vij et al. (2020) surveyed 3,985 Australians on their attitudes and opinions towards on-demand transportation services and found that the services have the potential to increase public transportation usage although current market is still limited.
Although the questionnaire survey could reveal many latent details such as the intension of selection, the cost of sending questionnaires is high and respondents' answers may not be completely consistent with their daily behaviors. To avoid the shortcoming, researchers developed various interesting ways to collect ride-hailing data and investigated ride-hailing activities using more observed behavior data. For example, Cooper et al. (2018) repeatedly sent synthetic requests to the ride-hailing platform through computer programs at 200 locations across all San Francisco. Responses of the ride-hailing vehicles nearby were recorded and then employed to estimate the spatial-temporal characteristics of the ride-hailing services in San Francisco. Also using that data, Erhardt et al. (2019) conducted a before-and-after assessment and found that ride-hailing services were the biggest contributor to the growing traffic congestion in San Francisco. One of the authors of Henao and Marshall (2019a,b) drove a ride-hailing vehicle himself to collect trip data and passengers' feedback in Denver, United States. Using the data, they not only found that the ride-hailing resulted in 83% more vehicle kilometers traveled than that when no ride-hailing existed (Henao and Marshall, 2019a), but also investigated the actual earning of a ride-hailing driver when considering those factors such as time spent without passengers, driver residential location (Henao and Marshall, 2019b). Qian et al. (2020) developed a web crawler on the Uber mobile platform to collect ride-hailing data in New York, United States. A variety of aspects of ride-hailing services were then characterized such as the market share and the distributions of the origin and destination.
TNCs are not willing to share their data with researchers or the public (Costain et al., 2012;Li et al., 2019;Henao and Marshall, 2019a,b), resulting in the fact that little is known about the aggregated characteristics of ride-hailing-related urban mobility, even though almost ten years have passed since ride-hailing first appeared. Until recently, some TNCs conditionally released a part of their data and changed the status quo to some extent. Leveraging those big-sample or even overall-population data, ride-hailing behavior and its related human mobility are understood more comprehensively. Based on the ride-hailing trip data provided by a TNC in Austin, United States, Yu and Peng (2019a,b) found strong relationship between ride-hailing demand and built environment through the geographically weighted Poisson regression and the structural equation model. Sun and Ding (2019) proposed a two-level growth model and investigated the spatiotemporal evolution of ride-hailing markets under the new restriction policy of Shanghai, China by using randomly sampled 20,000 ride-hailing and 33,500 taxi orders data of Shanghai. Zhang et al. (2020a) identified the distribution of regions with high travel intensity and explored the correlation between travel intensity and points of interest by using 209,423 ride-hailing order records in a day in Chengdu, China. From the labor (driver) side, Dong et al. (2018) analyzed 6,471 ride-sharing 1 drivers' activities in a month and identified two kinds of ride-sharing drivers, i.e., daily home-work commuting providers and no-constant-origindestination providers, in which the daily home-work commuting providers accounted for only a small part of total drivers. Moreover, it was found that ride-sharing drivers intended to make long distance trips compared with taxi drivers. Chen et al. (2017) found that ride-hailing drivers could benefit significantly from the flexibility, which is deemed as one of the attractions of ride-hailing, by analyzing hourly earning data for Uber drivers. Moreover, Hall and Krueger (2018) explored ride-hailing drivers' preference and showed that ride-hailing drivers tended to work substantially fewer hours compared with taxi drivers.
From the questionnaire survey data to the ride-hailing operation data, our knowledge regarding ride-hailing is gradually deeper and wider. However, the data employed for analysis is still very limited, which slows the steps of understanding the special travel mode in the spotlight and its impact on our society. To further enrich the knowledge, the paper analyzes multi-day ride-hailing driver activity data in an entire city. Such data contains not only the spatiotemporal dynamics of ride-hailing demand in a city but also the characterization of ridehailing driver's behavior of provision-of-service. The uniqueness of the data makes the paper, to the best of our knowledge, the first that examines ride-hailing from a multi-day perspective using TNC-provided ride-hailing trip data. More specifically, this paper portrays ride-hailing activities from the perspectives of regional mobility and drivers' multi-day behaviors, respectively. Many details, such as the temporal varying and spatial flowing of ride-hailing trips, and the spatiotemporal characterization of ride-hailing drivers' behaviors, are explicitly investigated. Those findings, which are obtained by directly analyzing the data, contribute to the understanding of ride-hailing activities, the prediction of ride-hailing demand, the modeling of ride-hailing drivers' preferences, and the management of ride-hailing services.
The rest of the paper is organized as follows. Section 2 introduces the notations, the multiday trip order data used here, and the perspectives of analyses in the paper, i.e., regional and ride-hailing driver's perspectives. In Section 3, the flowing dynamics of ride-hailing demand is understood from a regional perspective, and in Section 4 the ride-hailing driver's preferences to the newly-emerging job are characterized from a driver's perspective. At last, Section 5 concludes the study with discussions.

Notations
The notations regarding ride-hailing trips are defined as follows. Let i = 1, 2, ..., I denote drivers, where I is the total number of drivers. Let TR ij = {O ij , D ij } be driver i's trip j = 1, 2, ..., J, where J is the total number of driver i's trips and O ij and D ij are the information regarding the origin and destination points of a trip, respectively. To save space, denote by Ξ either O or D. Then, the information regarding the origin or destination point of a trip is is the time when the trip started (i.e., a passenger was picked up) or ended (i.e., a passenger was dropped off); lon Ξ ij and lat Ξ ij are the longitude and latitude of the location where a passenger was picked up or dropped off, respectively. Regarding a trip, we can calculate its duration and displacement, denoted by T ij and L ij , respectively, as follows.
where dis(·) is the function of calculating the distance between two points on the earth surface. Note that, different from vehicle kilometers traveled, the displacement only reflects the straight-line distance between the origin and destination points of a trip, while it is widely used in studying human mobility (Liang et al., 2012;He, 2020).

Data description
Beijing, the capital of China, is one of the largest cities in the world. By 2018, the total population of Beijing was approximately 21.5 million. The central area is enclosed by four urban freeways, namely, Rings 2∼5 (Figure 1), and commonly, Ring 5 is treated as a separate line of the central and suburban areas.

Perspectives of analyses
A ride-hailing system, which involves users, drivers, and an online platform, usually operates as follows. A user who is looking for a ride-hailing service first sends a request to the platform. The platform immediately matches the request with the drivers who are close to the origin of the demand, and then assigns the request to one of the qualified drivers according to some mechanism of selection. Based on the data we obtained and our daily experience, the standby ride-hailing vehicles are usually sufficient in Beijing, China, where ride-hailing has been very popular (Nie, 2017). Therefore, a user can usually get served immediately after sending a request out.
Looking at the ride-hailing mobility from a regional perspective, a travel demand could be met by one of the nearby drivers. Therefore, analyzing the ride-hailing data at a regional level, i.e., the inter-region transfer of passengers, could uncover the dynamics of travel demand.
From a ride-hailing driver's perspective, after a travel request is sent, one of the nearby ridehailing drivers will receive the request that is sent by the TNC. Although ride-hailing drivers are unable to directly determine what kind of requests are assigned, they could make limited selections by pre-defining the range where they prefer to provide services or by canceling the assignment (Yang et al., 2018). The decision making is determined by their preferences to ridehailing, such as the time when they would like to provide services and the place where they would like to go. Although the TNCs have some mechanism of selection and assignment of transactions, the free-market essence of ride-hailing determines that the drivers have a certain freedom of selection. If the temporal and spatial activity pattern exhibited by a driver appears in a single day, we may say that it depends on both the demand (passenger) and supply (driver) sides. However, if the activity pattern of a driver is repeated for a few days, we can reasonably deem that the recurrent pattern is mainly determined by the preferences of the driver who provides the service. Therefore, analyzing ride-hailing driver's multi-day trip data could better characterize ride-hailing drivers and help understand the labor market at the supply side.

Temporal varying of ride-hailing trips
We are interested in the temporal varying pattern of the generations of ride-hailing trips.
Therefore, we present the daily and hourly changes of trip numbers in the studied dataset in Figure 2 and we have the following observations.
[1] The total ride-hailing demands on Friday and Saturday are slightly larger than those on the other days (Figure 2(a)).
[2] Once the number of ride-hailing trips reach high values around 40,000 in the morning ( Figure 2(b)), it generally stabilizes at high values and lasts to midnight (except for those on Wednesday and Thursday 2 ). It turns out that (i) there is no clear off-peak period at noon and (ii) the trip number drops until midnight instead of late evening. The observations are different from the pattern of distinct morning-and-evening peaks exhibited by daily traffic and taxi demand (He et al., , 2019.

Spatial flowing of ride-hailing trips
We attempt to unveil how ride-hailing trips spatially flow in a city. To the end, we take the area presented in Figure 1 to be the study area, which is 52 km long from the west (longi-tude=116.11) to the east (longitude=116.72) and 58 km long from the south (latitude=39.69) to 2 We carefully checked the data and we did not find any evidence that indicates that the fluctuations are resulted from data damage. Currently, it is still difficult to clearly explain the "abnormal" pattern on Wednesday and Thursday partially due to the lack of other source data for cross validation. We report it here and expect an ongoing work for better interpretation. the north (latitude=40.21). The total numbers of the origin and destination points in the area are 4,763,115 and 4,743,235, respectively, which are approximately 94% of the total contained in the dataset that we use here.
We divide the above area using same-size square grids with a side length of L; L is equal to L lon and L lat in longitude and latitude, respectively. Therefore, for an area of XL × YL (i.e., XL lon × YL lat ), we have X × Y grids in total. Here, we set L = 1 km and thus X = 52 and Y = 58.
Then, we map those origin or destination points (i.e., Ξ ij ) into the grids by using basic arithmetic operations as follows.
where (x, y) indicates the grid that an origin or destination point belongs to; A left and A bottom are the left and bottom edges of the selected area; · is an operation that rounds a number up to an integer. As a consequence, we obtain the number of the origin or destination points in grid (x, y) within time interval k of day g; denoted by N Ξ x,y (g, k). In this study, we set k ∈ K = {0, 1, 2, ..., 23}, i.e., 24 hours of a day. In addition, we define the grids satisfying N Ξ x,y (g, k) > N * as dense origin or destination grids, where N * is a threshold.
From the resulting N Ξ x,y (g, k), we initially find three distinct time periods, i.e., morning, evening and midnight, and we separately plot them in Figure 3. By comparing the heat maps of origins (Figure 3(a)(d)(g)) with those of destinations ( Figure 3(b)(e)(h)), directional flowing of trips can be observed as follows. In the morning, the number of dense destination grids ( Figure 3(a)) is larger than that of dense origin grids (Figure 3(b)), given the condition that the total numbers of the origin and destination points are approximately equal. Figure 3(c) more directly compares the frequencies of the grids with different numbers of the origin and destination points. It shows that the number of the grids that contain more than (e.g.) 100 destination points is clearly larger than that of the grids that contain more than (e.g.) 100 origin points.
Likewise, in the evening, the numbers of dense origin and destination grids ( Figure 4 reflect the spatiotemporal rhythm of the city in using ride-hailing services, which can be interpreted as follows. Note that the total numbers of the origin and destination points in an hour in the studied area are approximately equal.
[3] There are two peaks for the appearance of the dense origin grids during a day. One appears at approximately 15:00 and the other one at approximately midnight. The second peak is higher than the first one. The observations indicate that, during a day, the origins of trips usually spatially shrink twice. One is at approximately 15:00, which may be associated with the purpose of back-home-from-work, and the other at approximately midnight, probably related to back-home-from-overtime and back-home-fromentertainment. The second is more intensive, implying that ride-hailing services are more needed at night.
[4] There is only one peak for the appearance of the dense destination regions during a day, which occurs at noon. The reason cannot be directly speculated, while it is worth studying more carefully in the future.
[5] Comparing the rise and fall of the two time series, the demand for ride-hailing services scatters in the city in the morning, while the destinations are more concentrative largely because of to-work activities; at night, the trend is in the opposite. This observation implies that, in the city, the places of residence may be more scattered than the places of work. 3.3. Multi-day repeatability of regional origin or destination numbers We are interested in understanding the multi-day repeatability of the regional intensity (i.e., number) of the origin and destination points, since it is closely related to the prediction of ride-hailing activities. First, we measure the following coefficient of variation of the origin or destination points that fall into grid (x, y) during time interval k on all weekdays.
where µ Ξ x,y (k) and σ Ξ x,y (k) are the mean and standard deviation of the numbers of the origin and destination points that fall into grid (x, y) during time interval k on all weekdays, which are written as follows.
where G ={Mon, Tue, Wed, Thu, Fri}. Only the trips occurred on weekdays are considered here because travel demands on weekdays and weekends are usually different.
It is natural to believe that the grids with very few origin and destination points have no intense real-world functions. Therefore, we only take the grids whose µ Ξ x,y (k) 10 into account. After the refinement, the number of the considered grids for origins is 1954 (64.8% of the total number of grids) and 2025 (67.1%) for destinations. The numbers of the origin and destination points remained in those grids are 3,322,246 (99.6%) and 3,306,820 (99.5%), respectively, meaning that the trips are greatly remained. Figure 5 presents the numbers of grids within different values of cv Ξ x,y (k) and we have the following observations.
[6] The values of cv Ξ x,y (k) at most time are between 0.15 and 0.25, indicating that the number of either the origin or destination points in a region essentially does not change dramatically at the same time of each weekday. Therefore, only when an advanced prediction method, which is expected to forecast the future accurately, could result in smaller prediction deviations, we say that the method is effective.
[7] For several time intervals (Origin: 8:00, 9:00, 18:00, 19:00; Destination: 9:00, 18:00, 19:00), the values of cv Ξ x,y (k) are relatively large and range between 0.3 and 0.4, indicating relatively low daily repeatability of regional origin and destination numbers. It implies that making predictions by utilizing multi-day repeatability is more difficult for those time intervals.  Subsequently, we calculate the following average of cv Ξ x,y (k) to measure the multi-day repeatability of a region.

Coefficient of Variation
cv Ξ x,y = 1 |K| ∑ k∈K cv Ξ x,y (k) (5) Figure 6 presents the result and the following can be found.
[8] The multi-day repeatability of regional origin and destination numbers is at [0.2 0.4], measured using cv Ξ x,y in Equation 5. The peak is at 0.25 and few is larger than 0.5.

Classification of regions
In this subsection, we classify city regions (i.e., grids here) based on the moment when the peaks of the number of the origin or destination points appear. First, we identify the peak of a time series as follows. Employ a time window with width φ (φ should be odd), denoted by w 1 , w 2 , ..., w R , and move the window from the beginning to the ending of a time series. A peak, i.e., µ Ξ x,y (w 1 2 φ ), is identified when the following two conditions are satisfied.
where µ Ξ x,y (k) is the mean of µ Ξ x,y (k), k ∈ 0, 1, ..., 23; θ is a coefficient that is larger than 1, indicating that the peak must be θ times more than the 24-hour average. We set φ and θ to be 5 and 2 based on our observations of the data, respectively. Then, we set the following five types for a grid according to the time period when the peak appears.
• Type-I: No clear peak.
• Type-II.A: A morning peak appears between 6:00 and 11:00 • Type-II.B: A noon-and-afternoon peak appears between 11:00 and 16:00 • Type-II.C: An evening peak appears between 16:00 and 21:00 • Type-II.D: A night peak appears between 21:00 and 5:00 As done in Section 3.3, we remove the grids in which the average number of the origin or destination points during all weekdays is less than 10. The classification results are plotted in Figure 7 and we have the following observations.
[9] Most grids have no clear peaks (i.e., Type-I). Considering the fact that we have removed the grids without sufficient origin or destination points, we assert that a large proportion (more than 50%) of the grids within a weekday have relatively stable demands.
[10] Only 4%∼5% of the grids show clear noon-and-afternoon and evening peaks (Type-II.B and Type-II.C), unveiling that few grids have suddenly increased-and-dropped trip demand in the noon-and-afternoon and evening.
[11] The second largest proportion of the grids are in Type-II.A (29% for origin and 20% for destination, respectively), i.e., a morning peak. Those grids are active for generating or receiving traffic demand in the morning.
[12] Type-II.D grids for origins are less than that for destinations, i.e., during the midnight more places become active as travel destinations than as origins. It is related to the fact that more activities are back-home at midnight.

Driver's perspective: characterization of ride-hailing drivers
In this section, we turn our lens to an individual ride-hailing driver and his/her multi-day activities. First, to glance, we randomly select 9 drivers and plot their trips in a week in Figure   8. Temporally, it can be seen that some drivers work almost every day in a week (Drivers 1, 2, 4, 6, and 9), while some only on certain days (Drivers 3 (weekends), 5, 7 (weekdays) , and 8).

Spatial and temporal distributions of ride-hailing trips
To acquire the proportions of part-and full-time ride-hailing drivers, we first measure the empirical cumulative distribution of all drivers' trip numbers in a week. Figure 9 presents the results and we have the following observations.
[13] A large part of ride-hailing drivers are part-time drivers. 59,884 (43.4%) drivers take less than 25 trips in a week (Figure 9(a)), and 69,138 (50.1%) drivers' total trip durations are less than 10 hours in a week (Figure 9(b)). Although we don't have a universal criterion to distinguish part-and full-time ride-hailing drivers, we can speculate from the result that at least half of the ride-hailing drivers are part-time drivers.
Those observations are consistent with the findings in the existing studies Hall and Krueger, 2018). Moreover, we obtain the exact percentage based on the overall population for the first time.
In addition to the empirical cumulative distribution, we calculate the durations (T ij ) and displacements (L ij ) of all trips on different days following Equation 1. The results are presented in a log-log plot in Figure 10 and we have the following observations.    [15] The shapes of the distributions on different days are similar, indicating the high repeatability of the ride-hailing trip duration and displacement distribution on multiple days.
[16] The durations of most trips are between 300 sec and 3000 sec (i.e., between 5 min and 50 min), and the displacements of most trips are less than 20 km. Those values reflect (or depend on) the size of the Beijing city.
[17] The distribution tails of both the trip duration and displacement follow the power law with the similar exponent values of -4.27 and -4.52 (Table 1), respectively, indicating the existence of occasional long-distance trips. The power-law observation is largely different from the exponential distribution exhibited by taxis (Liang et al., 2012). The observation that the exponent values of two distributions are approximately equal implies the positive correlation between the trip duration and displacement.

Temporal characterization
This subsection focuses on the characterization of a ride-hailing driver from the temporal perspective, i.e., which time period is preferred by a driver to provide ride-hailing services. First, we define "work" for a ride-hailing driver as follows: a work consist of a series of successive trips and the time interval between two successive trips is short (qualified by using a threshold).
When a driver is working, the driver intensively provides service within a time period. Let ε be the threshold and set it as ε = 3600 sec with the consideration that a driver will take another ride-hailing order within one hour if the driver is working. Mathematically, driver i's trips during work r is written as follows.
where W T ir is the time span of driver i's work r; l = 1, 2, 3, 4 indicates the four periods. The total length of the periods on the five weekdays is |P 1 l | = 6 × 5 = 30 h and that on the two weekends is |P 2 l | = 6 × 2 = 12 h. Then, we say that driver i usually works in period P κ l if δ il > δ * where δ * is a threshold. Figure 11 presents the temporal characterization resulting from the perspective of working periods after setting δ * = 40% and we have the following findings.
[20] Only for P κ 1 =[0:00, 6:00], the working drivers on weekends are more than those on weekdays, indicating that more drivers select to continuously work at the weekend night to serve people's demand of night-life traveling.  Moreover, we characterize working time preferences from the individual perspective, i.e., drivers' preferred working periods. Table 2 presents the results and we have the following observations.
[21] Most drivers who frequently work during a day only work in a time period (i.e., 6 hours; see Ranks 1 to 7, 25,884 drivers in total, i.e., 18.7%), implying that the ride-hailing drivers usually don't work as long as those for other fixed-time jobs (e.g. working in an office), although they also frequently work in a fixed time period.
[23] Like [20], few drivers select to work during 0:00 and 6:00 on weekdays, while more (2529) drivers work in the same period on weekends.

Spatial characterization
This subsection characterizes a driver from the perspective of his/her activity space. To that end, we employ the hierarchical clustering to cluster all origin and destination points of a driver's all trips according to their spatial positions.
The hierarchical clustering is a cluster analysis method, which works by first building a cluster tree (a dendrogram) to represent data and then cutting the tree to group the data. When building the cluster tree here, the input is all origin and destination points of a driver in the week, and the metric of measuring the distance between two points is the distance on the earth surface. The linkage criteria of point sets (i.e., cluster distance) is the shortest distance between the points in the two sets. In cutting the tree, a threshold β is employed, meaning that the distance between any two final clusters is larger than β. Then, we enclose the maximum cluster and calculate its area by using the Delaunay triangulation.
To give a direct visualization of ride-hailing drivers' activity space, we primarily plot 9 randomly selected drivers' working space in Figure 12. It is found that some drivers provide service in a large spatial range even covering the whole central area of the city (such as Drivers 10, 11, 12, 16, 17, and 18), while some only work in a small area (such as Drivers 13, 14, and 15).
Naturally, we are interested in understanding the impact of the cluster distance on the clustering. According to the hierarchical clustering, the distance between any two data points in a (i) Driver 18 Figure 12: Clustering of spatial distribution of (9 randomly selected) driver's origin and destination points of the trips in a week (β = 5 km). Grey polygon: the maximum cluster. cluster is less than the cluster distance, and thus the cluster distance roughly indicates the density of a driver's activities in a cluster, i.e., the preference of accepting long-or short-distance trips. The larger the cluster distance is, the lower the activity density in a cluster will be, i.e., more preferring to the short-distance trip demand. Focusing on the maximum cluster, we calculate the relationship between the cluster distance and the percentage of drivers whose γ% of all origin and destination points are within the maximum cluster. The results are presented in Figure 13 and we have the following findings.
[24] Cluster distance of 10 km could make the maximum cluster contain most driver's activities (In Figure 13(a), the percentage is 85% when γ% = 90%, while the percentage is 96% when γ% = 60%), implying that most drivers are not willing to go to a place where is 10 km away from his/her regularly-cruising region (i.e., the maximum cluster).
[25] To involve more drivers working in their maximum cluster (i.e., increasing along the yaxis of Figure 13 quickly involving those spatially intensely-working drivers, the pace becomes slow in terms of involving those spatially loosely-working drivers. In particular, the peak when γ% = 90% occurs at the cluster distance of 5 km that is greater than the cluster distance at the peak of smaller γ.
Then, we look at the distribution of the areas of ride-hailing drivers' daily activity spaces, which is presented in Figure 14. Let A 2 , A 3 , A 4 , and A 5 be the areas inside Rings 2∼5, and A 2 = 62 km 2 , A 3 = 159 km 2 , A 4 = 302 km 2 , and A 5 = 667 km 2 . We have the following observations by taking the areas inside Rings 2∼5 in Beijing as references.
[26] The percentages of the drivers, the areas of whose activity spaces are smaller than A 2 , between A 2 and A 3 , between A 3 and A 4 , between A 4 and A 5 , and larger than A 5 , are 19%, 16%, 15%, 28%, and 8%, respectively.
[27] Amongst those groups of drivers, the largest group (i.e., 28%) is the drivers who prefer to work in the spaces whose areas are between A 4 and A 5 . The areas of drivers' activity spaces are approximately equivalent to the area of the central area of Beijing, since it commonly treats inside Ring 5 as the central area of Beijing.
[28] The second largest group (i.e., 19%) is the drivers who prefer to work in relatively small spaces, whose areas are smaller than A 2 . The smallest group (i.e., 8%) is the drivers who usually work in rather large spaces, whose areas are larger than A 5 . γ% of activity data = 60% γ% of activity data = 70% γ% of activity data = 80% γ% of activity data = 90% (b) Marginal increase of percentage Figure 13: Percentage of the drivers whose γ % of all activity data (origin and destination points) are in the maximum cluster. Figure 14: Distribution of ride-hailing drivers' largest activity-space areas (cluster distance is set to 5 km). The regions that are smaller 5 km 2 are removed from the statistics, since very small area may be resulted by few ride-hailing activities.

Correlation between temporal and spatial characterizations
This subsection combines the temporal and spatial characterizations to see if some correlations exist. To that end, we jointly plot drivers' total working time in a week and the areas of their activity spaces in Figure 15. Interesting observations are found as follows.
[29] The correlation plot shows two branches when the working time is larger than 22 h (A and B in Figure 15), indicating two categories of ride-hailing drivers. The majority of the drivers are in Category A and the activity space is enlarged with the growth of the working time. A minority of the drivers (i.e., Category B), whose activity spaces are not increased with the growth of the working time, are observed for the first time. Unlike those in Category A, the drivers in Category B only prefer to work in a small space.
[30] The correlation between the working time and the activity space is approximately linear positive, i.e., the increase of working time results in the expansion of the driver's activity space. It is particularly obvious for Category A.
This observation confirms the existence of ride-hailing drivers' selection and preferences in providing service as discussed in Section 2. This is a major difference between the ride-hailing and taxi drivers 3 .

Discussion and conclusion
Using the multi-day ride-hailing driver activity data in an entire city, this paper characterizes the ride-hailing activities from regional and driver's perspectives, respectively. A series of findings that are labeled as from [1] to [30] in Sections 3 and 4 are obtained, which are summarized as follows.
Spatiotemporal flowing of ride-hailing trips: A regional perspective Only for several intervals (usually at morning and evening peaks), the repeatability is low, indicated by the coefficients of variation are between 0.3 and 0.4. On average, the coefficients of variation are between 0.2 and 0.4. The results imply that, due to the existence of the high repeatability of the ride-hailing demands, a newly-proposed advanced prediction method could be deemed as really taking effect only when its prediction accuracy is higher than that resulted from the repeatability at the natural condition. it is found that most drivers in Beijing are not willing to go to a place where is 10 km away from his/her regularly-cruising region. 28% of ride-hailing drivers provide the ride-hailing services in a city-wide space; 19% of all drivers are active in a relatively small space; Only 8% of all drivers would like to provide service in a space that is larger than the central area of the city.
• Observation [29]- [30]. Two categories of ride-hailing drivers are found after combining the temporal and spatial characterizations of ride-hailing drivers. One (the majority) is the drivers whose activity spaces are linear positively correlated to the working time, while the other is the drivers who only prefer to work within a limited space. This observation confirms the existence of ride-hailing drivers' selection and preferences in providing service, which is a major difference between ride-hailing and taxi drivers.
From those findings, we can see that ride-hailing mobility has its own characteristics, such as the shrinking and expanding processes, the power distributions of trip duration and displacement, and the two categories of drivers. Many of them are quite different from our knowledge of other travel modes such as taxis (Liang et al., 2012;Cai et al., 2016;He et al., 2017;Dong et al., 2018;He et al., 2019He et al., , 2020. Understanding these characteristics not only benefits TNCs but also helps traffic managers. For example, the TNC could design a more-targeted dynamic incentive system to adjust the supply of ride-hailing vehicles. Traffic managers could consider to treat those full-time ride-hailing drivers as professional drivers and regulate them as done to taxi drivers. The debate if TNCs increase traffic congestion could be answered from a more detailed time-space dimension with high precision. For the insights that are more related to scientific researchers, the region-dependent repeatability provides the related ride-hailing demand prediction with a baseline, indicating a capable method should result in the prediction errors that are smaller than the fluctuation shown by the demand itself (Ke et al., 2017;Zhang et al., 2020b). The ride-hailing drivers' limited selection behavior within a small spatiotemporal scope also deserves comprehensive analysis and modeling.
As the most existing studies 4 did, this paper only focuses on the ride-hailing activities in Beijing, China. Nevertheless, we attempt to avoid mentioning specific locations, and we believe the above findings not only directly benefit transportation researchers and managers in Beijing, but also contribute to the general understanding of ride-hailing activities.
The grid size used for the regional analysis in Section 3 is set to be 1 km. Obviously, using different sizes will change the results with specific numbers, while we don't think it will radically differentiate the patterns that we observed due to the fact that the analysis is conducted from the macroscopic perspective (He, 2020). The selection of the grid size is relevant to the modifiable areal unit problem, and it always causes information loss and result bias no matter what size is chosen (Fotheringham and Wong, 1991;Clark and Scott, 2014;Nelson and Brewer, 2017;Zhou and Yeh, 2020). Therefore, it is interesting to start an investigation to evaluate the sensitivity of the analysis result to the grid size.
Thanks to the uniqueness of the data used here, we obtained many new observations by only using simple and direct analysis methods. In the future, it is meaningful to mine the data with more advanced (e.g., machine learning) methods to understand the latent factors dominating ride-hailing mobility. In particular, theoretically modeling ride-hailing mobility and drivers' choice are of importance. Combining more data sources, such as point-of-interest data, traffic flow data, built-environment data, will also enrich our understanding of ride-hailing and the factors associated. Other research direction is to compare with the ride-hailing activities in other cities or with other travel modes. Although the requirement for data is absolutely higher, the comparative studies on different regions and modes are significant for both practice and theory.