Local network connectivity optimization: an evaluation of heuristics applied to complex spatial networks, a transportation case study, and a spatial social network

Optimizing global connectivity in spatial networks, either through rewiring or adding edges, can increase the flow of information and increase the resilience of the network to failures. Yet, rewiring is not feasible for systems with fixed edges and optimizing global connectivity may not result in optimal local connectivity in systems where that is wanted. We describe the local network connectivity optimization problem, where costly edges are added to a systems with an established and fixed edge network to increase connectivity to a specific location, such as in transportation and telecommunication systems. Solutions to this problem maximize the number of nodes within a given distance to a focal node in the network while they minimize the number and length of additional connections. We compare several heuristics applied to random networks, including two novel planar random networks that are useful for spatial network simulation research, a real-world transportation case study, and a set of real-world social network data. Across network types, significant variation between nodal characteristics and the optimal connections was observed. The characteristics along with the computational costs of the search for optimal solutions highlights the need of prescribing effective heuristics. We offer a novel formulation of the genetic algorithm, which outperforms existing techniques. We describe how this heuristic can be applied to other combinatorial and dynamic problems.


INTRODUCTION
Spatial networks have become more popular as the interest in networks has spread into more fields and 29 spatial data, and the computational power and methods to analyze it, have become more accessible. 30 In terms of analysis, spatial network optimization has been at the forefront and focused on increasing 31 network connectivity and information flow (Schrijver, 2002;Wu et al., 2004). Heuristics have been 32 developed to rearrange existing networks or creating new ones that optimize the topology of the network 33 for synchronizability (Khafa and Jalili, 2019). Several effective methods have also been developed to 34 add new edges to a network that minimize the average shortest path distance (Meyerson and Tagiku,35 2009), minimize the network diameter (Demaine and Zadimoghaddam, 2010), or maximize the network's 36 centrality (Jiang et al., 2011) or connectivity (Alenazi et al., 2014). 37 While this optimization of spatial networks' global characteristics , optimizing local existing network 38 connectivity around a specific node or location with the introduction of costly new edges has not 39 been explored and yet is important in several domains. For example, increasing an existing network's 40 connectivity around a focal node while minimizing the costs associated with the number and lengths 41 of additional connections is essential in network layout planning for telecommunications and computer planners can optimize thoroughfare connectivity around schools to foster student walking and biking 48 while reducing busing costs (Auerbach et al., 2021) and increase accessibility and patient travel time to 49 health care facilities (Branas et al., 2005). 50 The search for new edges that maximize connectivity to a focal node and minimize the costs of these 51 new edges is not well understood and this search for optimal solutions can become costly when networks 52 are large and complex. To fill this knowledge gap, we compare a set of heuristics to optimize local 53 network connectivity applied to real-world networks and randomly generated ones. These heuristics are 54 drawn from Mladenović et al. (2007) review of combinatorial heuristics and from location models that 55 include a spatial component (Brimberg and Hodgson, 2011). We also offer a genetic algorithm with a 56 novel chromosome formulation where the genes are not properties of a specific variable but weights for 57 the probability to move in a given dimension across the solution space. 58 These optimization heuristics are then applied to randomly generated networks that vary in complexity 59 and size to evaluate their efficacy in finding the optimal new connections that maximize local connectivity. 60 Included in this set of random graphs we provide two novel formulations of random planar networks based 61 on the Voronoi diagram and the Delaunay triangulation. To complement the random network analysis, the 62 network connectivity optimization methods are also applied to two real-world case studies, one from urban 63 transportation planning and another from social network analysis. In this study, we show that optimization 64 heuristics are preferred for the analysis and practice due to the nonlinearity of the solution space and the 65 optimal solution's dependence on nodal characteristics, such as distance to the focal node. The novel 66 genetic algorithm outperformed the other heuristics as it was able to move from suboptimal solutions 67 and explore distance solutions quicker. This is important as researchers and engineers are working with 68 networks or growing complexity and size. 69 The organization of this paper is as follows. The next section describes the formulation of the 70 connectivity problem in more detail, the local search methodology, and the optimization heuristics (see 71 Appendix A for the specific pseudocode of the optimization algorithms). This is followed by a section 72 that details the data used for the study including descriptions of the random networks, the transportation 73 case study street networks, and the social network data. Results of the heuristics applied to the random 74 networks and the case studies are then presented. The paper concludes with a detailed discussion of these 75 heuristic results, the further implications of these techniques for urban transportation planning, and future 76 work for this avenue of research.

79
For the description of the optimization methodology the following nomenclature will be used (see Table 1). In connectivity optimization, network nodes are first segmented and assigned to 'close' and 'distant' sets by a chosen threshold distance D from the network's focal node F. The number of nodes ν is a network is N, and nodes are separated into two sets based on their shortest network path distances to the focal node, d(ν, F). The nodes that are within this distance are assigned to the 'close' set, Figure 1 (A)). The nodes that are outside the threshold shortest network path distance to the focal node, D, are assigned to the 'distant' set, When a new connection is added to the network, the shortest path distance from each distant node to the focal node is recalculated. If there are any distant nodes that are now within the threshold distance to the focal node they are assigned to the new set N C i, j . For example, if a new connection is established between distant node i and close node j, (Figure 1 (C)). The benefit of this new connection is B(i, j) and the cost associated with the new connection is C(i, j). The optimal solution is the solution with the greatest benefit, or number of new nodes now within the distance to the focal node which can be expressed as the bi-objective function where benefits dominate costs. For example, if B(i, j) = B(m, n) and C(i, j) < C(m, n), then the optimal additional edge is between i and j. For nondominated solutions, we select the solution that minimizes the distance to the so-called ideal point. The ideal point represents the solution that simultaneously maximizes the benefit and minimizes the cost. For the analysis in this paper the formulation of the objective function is as follows. For a new edge between i and j the number of nodes in N C i, j set is the benefit of this new connection, B(i, j) = |N C i, j |, and the cost of the new connection is the length of the edge C(i, j) = d(i, j). Therefore, heuristics may be employed to identify (nearly) optimal solutions quicker than an exhaustive search as 84 networks get larger. These optimization algorithms require a search space to explore and using nodal 85 characteristics we create such a multidimensional solution space (see Table 2). These nodal characteristics   Figure 1. Diagram of the sequence of the network connectivity optimization problem. The close nodes that are within a threshold network distance (orange dashed circle) from the focal node (black square) are colored green, distant nodes that could be within the threshold network distance with additional edges are colored red, and the gray distant nodes are outside the threshold distance regardless of any additional connections. Figure (A) is an example graph, (B) shows the same graph with the optimal new connection that maximizes the number of additional nodes within the threshold network distance and minimizes the length of the new connection, and the inset (C) highlights this optimal connection, between nodes i and j.
hill climbing (Greiner, 1992); hill climbing with a variable neighborhood search (Mladenović and Hansen,  Parameter selection was simplified for easy comparison of the methods (see the Supplemental Information 119 for the algorithms). To ensure that the heuristics did not converge on suboptimal solutions due to the 120 initial starting values, random restart, i.e., randomly selecting initial nodes to avoid local optima and 121 running the routine until the optimal solution is found, was used.

Manuscript to be reviewed
Computer Science evaluate all solutions are used to benchmark the other heuristics.

128
Hill climbing (HC). The solution space was observed to be hilly from the exhaustive search results, so several modifications were introduced to the hill climbing technique to avoid getting stuck in suboptimal solutions (Algorithm 2 in the Supplemental Information). A stochastic hill climbing (HCS), an advanced search method based on HC, routine is also explored where the selection of nodes for the next iteration is randomly picked with which terminates when an improved solution is no longer found (Algorithm 3 in the Supplemental Information). A hill climbing algorithm is coupled with a variable neighborhood (HCVN) where the size of the neighborhood starts with the nearest neighbors (η = 1) and is updated as follows: and the HCVN method terminates after n max is reached (Algorithm 4 in the Supplemental Information).

129
Simulated annealing (SA). As a meta-heuristic approach, the simulated annealing method randomly selects an initial solution from the solution space to avoid entrapment in a local optima. At each iteration, the heuristic evaluates the neighboring solutions and if it does not find an improved solution, it moves to a new solution with the following probability: The distance of the move decreases with the number of iterations until a better solution is no longer found 130 (Algorithm 5 in the Supplemental Information).

131
Genetic algorithm (GA). The genetic algorithm begins with a population of P randomly selected solutions with a set of chromosomes composed of genes which represent the weights of selecting a neighbor and are all initialized to unity (Algorithm 6 in the Supplemental Information). During each iteration of the method, solution scores (fitnesses) are computed by and a new generation of solutions are selected based on the following probability condition where s is the selection coefficient. Weak selection, s ≪ 1, is used to ensure that random mutations impact solution frequency. Crossover is conducted by alternating the weights for the offspring from each parent, also known as cycle crossover (Oliver et al., 1987). Mutations are introduced at a low rate µ ≪ 1 for each gene and increase the nodal characteristic selection weight by one. The probability that characteristic m is used to find a neighbor for node i is given by where K is the total number of nodal characteristics. This formulation ensures that the nodal characteristics 132 that improve the solution increase in weight which results in a greater probability they will be selected To test the efficacy of these optimization heuristics in finding the optimal new network connections 140 they were applied to randomly generated networks that vary in complexity and size. Several types of 141 random graph networks were generated to analyze the efficacy of the optimization heuristics for systems 142 with different topologies which are generally representative of naturally occurring and built systems: (1) 143 Erdös-Rényi networks, (2) Watts-Strogatz networks, (3) Barabási  networks. 155 We also introduce two novel types of random planar network versions of the Voronoi diagram and the Delaunay triangulation (Supplemental Information Figure S.1 (E) and (F)). Planarity is particularly important in many fields and networks generated from Voronoi diagrams and Delaunay triangles have been used in spatial health epidemiology (Johnson, 2007), transportation flow problems (Steffen and Seyfried, 2010; Pablo-Martì and Sánchez, 2017), terrain surface modeling (Floriani et al., 1985), telecommunications (Meguerdichian et al., 2001), computer networks design (Liebeherr and Nahas, 2001), and hazard avoidance systems in autonomous vehicles (Anderson et al., 2012). Delaunay triangulation maximizes the minimum angles between three nodes to generate planar graphs with consistent network characteristics while Voronoi diagrams, the dual of a Delaunay triangulation, are composed of points and cells such that each cell is closer to its point than any other point (Delaunay, 1934). To modify these edges are removed from network nodes randomly based on their distance from the focal node with probability where p R is the removal probability and weighted by the normalized edge distance from the focal node.

156
When edges are randomly removed from the connected Delaunay network or Voronoi network, with 157 weights given by node distance from a focal node, these networks display some of the properties similarly 158 found in the networks mentioned above, such as complexity and randomness.  Results of the transportation case study used for the analysis. A network of streets and residences around a school is shown in (A) and with the optimal new walking connection in (B). The red nodes represent the distant residences, i.e., the residences within the 1-mile Euclidean walking distance to the school but not the 1-mile street network walking distance, the green nodes are the close residences within the street network school walking distance, and the black square represents the school. The orange line is the optimal new walking connection that maximizes the number of additional residences (orange nodes) and minimizes the length of the new connection.

196
The topology of a network and the management of its system can improve information flow and have    Manuscript to be reviewed

Computer Science
Central Terminal in New York City (NY). Grand Central Terminal was selected as the location of the 209 simulated events as it is a major transportation hub located in the center of the city that serves over a 210 million commuters and visitors daily. New York City is also a major metropolis with tens of thousands of 211 Gowalla users present in the data set and the city has a history of incidents, such as terror attacks. Ten 212 dates were selected at random and for each date ten times were randomly selected between 1200 and 1800 213 local to simulate a crisis event (see Figure 3).

214
The social network problem was formulated such that an event occurred at the location (Grand Central

221
Several finding are worthy to note regarding the performance of heuristic algorithms used in the analysis.

222
First, there were consistent nonlinear relationships between the nodal characteristics and the quality of 223 the solutions for each type of random network and the school networks (see Figure 4). There was also 224 significant variation for which nodal characteristics were correlated with the quality of the solution across 225 networks (see Table 3). Among those, the distance between the close node and the focal node and the 226 distance between the distant node and the close node were most often highly correlated with the quality of 227 the solution across networks. The centrality measures were inconsistently related to the solution quality 228 for the random networks yet were related to the optimal solutions for the social networks.

229
The results of the termination times and the optimal solutions deviations from the optimization  priori. When the exhaustive search routine was applied to random networks and the real-world networks, 244 the optimal solutions were found to be related to nodal characteristics, which entails a great complexity 245 to find optimal solutions. Therefore, the heuristics employed to reduce the computational costs utilized and attempts to incorporate such a feature resulted in unrealistic computational times.

294
ACKNOWLEDGMENTS. 295 We would like to thank Alex Zendel (GIS Analyst at the Knoxville-Knox County Metropolitan Planning 296 Commission) for providing the street networks and residential data around the schools.  Manuscript to be reviewed

Computer Science
Local search selection criteria distance from focal node degree centrality d(i, F) C D i = ∑ j A(i, j) closeness centrality betweenness centrality eigenvector centrality pagerank centrality  Table 3. Mean correlation coefficients for the nodal characteristics and the solution benefits for the experimental networks. The three coefficients with the largest magnitude are highlighted in bold for each network type. (*) There was no variation in clustering coefficients as triplets were not common in the street networks.