Empirical study of long-range connections in a road network offers new ingredient for navigation optimization models

Navigation problem in lattices with long-range connections has been widely studied to understand the design principles for optimal transport networks; however, the travel cost of long-range connections was not considered in previous models. We define long-range connection in a road network as the shortest path between a pair of nodes through highways and empirically analyze the travel cost properties of long-range connections. Based on the maximum speed allowed in each road segment, we observe that the time needed to travel through a long-range connection has a characteristic time Th ∼ 29 min, while the time required when using the alternative arterial road path has two different characteristic times Ta ∼ 13 and 41 min and follows a power law for times larger than 50 min. Using daily commuting origin–destination matrix data, we additionally find that the use of long-range connections helps people to save about half of the travel time in their daily commute. Based on the empirical results, we assign a more realistic travel cost to long-range connections in two-dimensional square lattices, observing dramatically different minimum average shortest path 〈l〉 but similar optimal navigation conditions.

The Bay Area road network is provided by NAVTEQ, a commercial provider of geographical information systems data 6 . The data encapsulate the attributes of roads, such as length and speed limit. In this road network, each link represents a road segment (24 408 in total) and each node represents an intersection (11 309 in total). To have a preliminary understanding of the network properties, we first measure the length l and the free travel time t (length divided by speed limit) of each road segment. We observe that most road segments are densely located in the cities, having a small length l, while a few long road segments are sparsely distributed in rural areas, having a length l > 10 miles (figures 1(d) and (e)). The longest arterial road segment and the longest highway road segment are roughly 15 miles and 6 miles, respectively. However, given the arterial roads' lower speed limit, the maximal free travel time of the arterial road segments is over 35 min, which is four times larger than that of highway road segments (figures 1(f) and (g)). The distributions of length l and free travel time t are also plotted in log-log graphs (insets of figures 1(d)-(g)). The length l and free travel time t of road segments follow power-law distributions in wide ranges, showing similar topological features to many practical networks, from the airline transportation network [21] to human communication networks [22]. The long-range connection in the road network is not as obvious as that in a square lattice. Some highway road segments do not share road intersections with arterial road segments, thus they fail to define shortcuts. We explore the long-range connections in a road network by first finding connecting nodes, which are the intersections connecting both arterial roads and highways. Consequently, we define a long-range connection as the shortest path (measured in travel time) in the highway layer between a pair of connecting nodes. Similarly, we define the long-range connection's alternative arterial road path as the shortest path between the same pair of connecting nodes through the arterial layer. The shortest paths are calculated by the Dijkstra algorithm [23]. The times needed to travel through a long-range connection and its alternative arterial road path are denoted as T h and T a , where T a is a similar measurement to the Manhattan distance r i j in a two-dimensional square lattice [10]. A long-range connection or its alternative arterial road path is constituted by one or several road segments of the same kind. As shown in figure 2(a), intersections A and B are two connecting nodes that connect both arterial roads and highways, the long-range connection from A to B is highlighted by the thick purple line (highway road segments h1, h2, h3, h4, h5) and its alternative arterial road path is highlighted by the thick blue line (arterial road segments a1, a2, a3, a4).
Measuring travel time T h and T a between each pair of connecting nodes, we find that 92% of the long-range connections have alternative arterial road paths (8% of them serve as the only path). An important distinction is that in previous works on a square lattice, all long-range connections have the same travel cost regardless of the Manhattan distance r i j between their two endpoints [9][10][11][12][13][14]. However, in the studied road network the average travel times T h and T a are 31.36 and 54.15 min, respectively, implying that on average the time cost when we use a long-range connection is about 58% of that cost when we use its alternative arterial road path. Interestingly, not all long-range connections have shorter travel times than their alternative arterial road paths (T h < T a ), we observe that 16% of the long-range connections have T h > T a . This could result from highways' limited spatial coverage (see figure 1(b)), which generates time-consuming detours (figure 2(b)).
We next analyze the probability density functions (PDFs) of T h and T a . As figures 2(c) and (d) show, the travel time T h follows a Gaussian distribution (fit 1) with a characteristic time T h ∼ 29 min, while the travel time T a has two different characteristic times T a ∼ 13 min and 41 min and can be approximated by two different fitting functions for large and small T a (dashed lines are plotted to guide the eyes): Orange links and gray links represent highway road segments and arterial road segments, respectively. The purple line (formed by highway road segments h1, h2, h3, h4, h5) is a long-range connection defined in this paper. The blue line is the alternative arterial road path (formed by arterial road segments a1, a2, a3, a4) between A and B. (b) The times needed to travel through a long-range connection and its alternative arterial road path are denoted as T h and T a . For 84% of the long-range connections: The probability density function (PDF) of T a .
According to these empirical results, first we can conclude that using T h to quantify a longrange connection's travel cost is more realistic than assuming the travel cost to be the unit for all shortcuts. Next, we find that the distribution of the travel time T a decays much slower than following fit 2, which could be caused by the time-consuming detours in the alternative arterial road paths.

The usage patterns of the long-range connections
To quantify the effect of long-range connections in actual road usage, we use the Bay Area daily home-work commuting OD data. The OD data are provided by the US census bureau (see www.census.gov/geo/www/tiger/tgrshp2010/tgrshp2010.html) and record the number of trips from residents' home locations to work locations at a street-block level. The highly refined spatial resolution creates too many zones, thus we group street blocks into the census tracts (1398 in total) they are located in and generate the OD in a census tract resolution. As figure 3(a) shows, the number of trips between a pair of ODs follows a power-law distribution P(n) ∼ n −2.88 , implying that trips are heterogeneously distributed between origins and destinations.
In daily commuting, people use different transportation modes which include car (drive alone), carpool, public transportation, bicycle and walk. Based on the mode split data [24], we calculate the vehicle using rate (VUR) in a census tract as follows: VUR(i) = P car drive alone (i) + P car pool (i)/S where P car driver alone (i) and P car pool (i) are the probabilities that residents in census tract i drive alone or share a car (the average carpool size S = 2.25 in California 7 ). We randomly assign the transportation mode (vehicle or non-vehicle) to the residents living in each census tract according to the calculated VUR. We then filter out the trips that are not made using vehicles.
To assign trips to the road network, we map each OD pair from the census tract-based OD to the intersection-based OD. We find the road intersections within a census tract and randomly select one intersection to be the origin or destination in the intersection-based OD. When no intersection is found in a census tract, we assign a trip's origin or destination to a randomly chosen intersection in the nearest neighboring census tract. With the intersection-based OD calculated, we use the Dijkstra algorithm [23] to find the path with the shortest travel time T (all) between the origin and destination of each trip and calculate the traffic flow in each road segment. In figures 1(a) and 3(b), we show the estimated traffic flow, which follows a power-law distribution P(V ) ∼ V −1.48 .
To better understand the functionality of the long-range connections in people's daily commute, we try to find the shortest path in the arterial layer for each OD pair and compare the travel time T (arterial) with the shortest travel time using the whole network T (all). For 51% of the trips, we fail to find paths only composed of arterial roads, indicating the vital role that  long-range connections play in people's daily commute. For the other 49% of the trips, paths in the arterial layer exist and the ratio of T (all) and T (arterial) is found to peak at 0.5, suggesting that the use of long-range connections can help people save about half of their travel time in the daily commute ( figure 3(c)). For the shortest path of each trip, we further analyze the fraction of highway use measured in length and in travel time. As figure 4(a) shows, for 16% of the trips, people use arterial roads only. It is also observed that a driver is unlikely to intensively use arterial roads and occasionally use highways in his/her trip. In another words, a driver normally uses highways to complete a large fraction of his/her trip if he/she uses highways. As for the highway use measured in travel time, we obtain similar results ( figure 4(b)). As figures 4(c) and (d) illustrate, the fraction of highway use increases sharply with travel length (travel time) when the trip distance is small and gradually saturates to a value near one as the trip distance keeps increasing. The average fraction of highway use has already reached 65% when the travel distance is only 5 miles; note that highways only represent 25% of the road segments in the Bay Area road network. This indicates that the paths of moderate-and long-distance trips are dominated by highways, while arterial roads are heavily used in very short trips. This result is consistent with the usage patterns of infinite incipient percolation cluster (superhighways) in Erdős-Rényi networks, scale-free networks and square lattices [5].

Optimal navigation condition using more realistic travel cost information
In former models dedicated to the navigation problem in lattices, the travel cost of a long-range connection is equal to one regardless of the spatial locations of the underlying nodes it connects, thus highly overestimating the shortcuts' ability to reduce travel length (cost). Yet, in the studied road network the ratio of the travel times using highways and arterial roads peaks at T h /T a ∼ 0.5, indicating that a long-range connection typically saves about half of the travel time compared to its alternative arterial path ( figure 2(b)). Indeed, the long-range connections that connect distant nodes in many transport networks are not so 'short' as previously modeled. It is necessary to explore the optimal navigation conditions and calculate the average path length under more realistic travel cost scenarios. We generate a regular two-dimensional square lattice with N = 1000 000 nodes, pairs of nodes i and j are then randomly selected to receive long-range connections with probability proportional to the Manhattan distance r −α i j ( figure 5(a)), where α is the variable exponent controlling the number and the length of long-range connections. The addition of the longrange connections stops when the total length (cost) r i j reaches N . Different from the model presented in [10], the travel cost of each long-range connection is assigned in our model. We make a reasonable assumption that the travel length (cost) l of a long-range connection scales linearly with the Manhattan distance between the two nodes it connects, which is denoted by l = βr i j . In a road network the scaling exponent β quantifies the fraction of travel time saved by using highways. As illustrated in figure 5(a) the Manhattan distance between nodes i and j is six, the travel cost of the shortcut is three when the scaling exponent β = 0.5.
The optimal conditions were discovered at α = 0 and 2 for navigation using global or local information if no total cost constraint exists in adding connections [9]. The optimal navigation condition was found at α = 3 for a system subject to reconstruction cost [10], implying that more short (low-cost) connections are preferred when one has limited resources. Similar to former modeling frameworks, we use the average shortest path l as the navigation variable to be optimized. Three scenarios β = 0.5, 0.2 and 0.8 are studied, which correspond to the cases that long-range connections have moderate, low and high travel cost, respectively. Given that links have different travel cost in our model, the shortest path between a pair of nodes is calculated by the Dijkstra algorithm [23].
Although different minima l are found for the three scenarios due to the different travel costs of long-range connections, similar optimal navigation conditions are found at α ∼ 3 ( figure 5(b)). Comparing with the minimum average shortest path l found by assuming l = 1 for all connections, the minimum l is much larger for the moderate travel cost scenario, again validating that long-range connections' ability to reduce travel cost was overestimated in previous models. Finally, as the scaling exponent β increases, the differences between the average shortest path l at different variable exponent α decrease. When the scaling exponent β reaches one, long-range connections do not improve the navigation efficiency at all. In conclusion, the optimal navigation condition will not dramatically change when adding realistic travel cost to long-range connections, demonstrating the generality of the classic model raised in [10]. However, adding realistic travel cost to long-range connections will largely improve the accuracy of the estimation of l , indicating that travel cost is an important parameter to be considered when a long-range connection's transport efficiency is comparative with the underlying lattices.
Various technological and natural networks, from transportation networks [8,20] to social networks [22] and epidemic spreading networks [25,26], are characterized with two-layer structures. For many of them, the transport efficiency of a long-range connection is comparable with that of a short-range connection (e.g. the travel time taking a plane is comparable with the travel time driving a car if the travel distance is small). Therefore, it is necessary to empirically estimate the actual transport efficiency of long-range connections, and build up models that incorporate this important information to understand how the optimal transport condition is affected under different travel cost scenarios. In this study, we find that the travel time using highways (T h ) is about half of that using their alternative arterial paths (T a ) in this real-world transportation network, thus this gives us a reasonable justification to assign travel cost in long-range connections. Moreover, the empirical results on the distribution of T h /T a and the heterogeneously distributed travel demand allow for detailed information encapsulated in future models. The empirical investigations of the properties and usage patterns of long-range connections in practical networks offer us a way to introduce more realistic link properties and guidance to generate practical models dedicated to navigation optimization.

Conclusions
The optimization of a transport network's navigation efficiency has great impact not only in traffic engineering, but also in computer science and information spreading. We define longrange connections in a road network, analyze the time needed to travel through them and the time needed to travel through their alternative arterial road paths, which, we believe can enrich our understanding of the road network structure and provide useful information for the transport network's optimal design. We investigate the navigation problem by building a new model that encapsulates more realistic travel cost information. We find that the new optimal transport networks have similar optimal navigation conditions but different average shortest path compared to the scenario that all connections have equal unit travel cost. Due to the different populations in traffic zones and the different distances between traffic zones, travel demands are not always homogeneously distributed in an urban area [20]. In future models, not only network properties but also travel demands are necessary ingredients that need to be considered when evaluating or improving a transport network.
The studied road network possesses similar topological features with many practical networks, such as the airline transportation network [21] and human communication networks [22], where the lengths of links also follow power-law distributions. Therefore, the empirical findings of this work could be generalized to this broader set of networks. The travel (transport) cost of long-range connections is also ubiquitous in different kinds of networks. Our model employs a general scaling exponent β to incorporate the adjustable travel (transport) cost of long-range connections into the classic optimal navigation models, which we believe can provide a general modeling framework for navigation optimization in diverse problems related to network flows in science and engineering.