Finding the Shortest Path with Vertex Constraint over Large Graphs

. Graph is an important complex network model to describe the relationship among various entities in real applications, including knowledge graph, social network, and traffic network. Shortest path query is an important problem over graphs and has been well studied. This paper studies a special case of the shortest path problem to find the shortest path passing through a set of vertices specified by user, which is NP-hard. Most existing methods calculate all permutations for given vertices and then find the shortest one from these permutations. However, the computational costis extremely expensive whenthe size of graph or given set of vertices is large. In this paper, we first propose a novel exact heuristic algorithm in best-first search way and then give two optimizing techniques to improve efficiency. Moreover, we propose an approximate heuristic algorithm in polynomial time for this problem over large graphs. We prove the ratio bound is 3 for our approximate algorithm. We confirm the efficiency of our algorithms by extensive experiments on real-life datasets. The experimental results validate that our algorithms always outperform the existing methods even though the size of graph or given set of vertices is large.


Introduction
Graph is an important complex network model to describe the relationship among various entities in real applications, including knowledge graph, RDF graph, linked data, social network, biological network, and traffic network [1][2][3][4].Shortest path query is a basic problem on graph model.For example, in knowledge graphs, it is to find the closest connection between two entities or concepts; in social networks, it is to find the closest relationships such as friendship between two individuals; in traffic networks, it is to compute the shortest route between two locations.
Shortest path routing is an important problem in locationbased services (LBS) and has been well studied in the past decades [5][6][7].However, a special kind of shortest path query with vertex constraint is more and more important in real life.For instance, in knowledge graphs, a data miner is interested in investigating the closest relationship between two entities connected by some specified entities or concepts.In traffic networks, carpooling becomes a common business with the rapid development of sharing economy.A car driver may carry some fellows on the way home from company and the fellows are going to get down at distinct locations.Thus a critical problem is how to find a route with the minimum length passing through these locations.In above examples, both knowledge graph and traffic network can be modeled as a large graph (, ).The query of shortest path with vertex constraint can be defined as follows: given a starting vertex V  , an ending vertex V  , and a subset   ⊆ , find a path with the minimum length among all the paths passing through every V  ∈   from V  to V  .The subset   is called vertex constraint; that is, the shortest path must pass through every vertex in the subset   .
The above problem is a special case of Generalized Traveling Salesman Path (GTSP) problem [8], which is known to be NP-hard.In GTSP problem, all the vertices in  are partitioned into several categories.The objective is to find a path that visits at least one vertex for every category specified by user.For example, a tourist plans to travel through three kinds of locations, e.g., a coffee shop, a gas station, and a bank.

Complexity
Because he/she may have several choices for every location category, then it is necessary to find an optimal route for him/her.The basic idea of most existing works on GTSP problem is as follows: they first compute all permutations for given categories.Each permutation represents a class of path which has the same order of the categories.Next, for every permutation, these methods enumerate all possible paths from source to destination by concatenating the subpaths between vertices in two successive categories.Finally, they find the optimal one from these paths.In our problem, every vertex in  represents a category different to others.Thus these methods need to calculate all the permutations of the vertices to be visited, which incur too heavy computational consumption.However, most of these permutations are unnecessary for computing the shortest path.Therefore, the main challenge is how to avoid computing unnecessary permutations when finding the shortest path with vertex constraint.In this paper, we propose a novel efficient algorithm based on the best-first search to compute the shortest path with vertex constraint.The main idea of our method is to avoid calculating the unnecessary permutations as soon as possible.We also propose an approximate algorithm in polynomial time which is more efficient for large graphs.The contributions of this paper are summarized below.
(i) We propose a novel and efficient exact heuristic algorithm with two optimizing techniques to find the shortest path with vertex constraint.
(ii) We also propose an approximate algorithm in polynomial time for our problem over large graphs.We prove the ratio bound of our approximate algorithm is 3.
(iii) We conduct extensive experiments on several real-life datasets.We compare our algorithms with the stateof-the-art methods.The experimental results validate the efficiency and effectiveness of our algorithms.
The rest of this paper is organized as follows.Section 2 gives the problem statement.Section 3 introduces the CH technique for preprocessing graphs.Section 4 proposes the best-first searching algorithm with two optimizing techniques.Section 5 proposes the approximate algorithm and analyzes the ratio bound.The experimental results are presented in Section 6.The related work is in Section 7. Finally, we conclude this paper in Section 8.

Problem Statement
An undirected weighted graph is denoted as (, , ) (or  for short), where  = {V  } is the set of vertices and  ⊆  ×  is the set of edges in . is a function that assigns a nonnegative weight  , on every edge because  is an undirected graph.The number of vertices (or edges) is denoted as || (or ||) in .A path  in  is a sequence of vertices; i.e.,  = (V 1 , V 2 , . . ., V  ), where every (V  , V +1 ) is an edge in  for 1 ≤  ≤  − 1.The weight of path , denoted as (), is the sum of the weights of all the edges in ; i.e., () = ∑ 1≤≤−1  ,+1 .We say a path  is simple if and only if there is no repeated vertex in .The shortest path between V  and V  is a path with the minimum () among all the paths between V  and V  .For simplicity, in the following, we use  * , to denote the weight of the shortest path between V  and V  in .
In this paper, we study the problem of finding the shortest path with vertex constraint.Table 1 summarizes the symbols in this paper.We first give the definition below.
Definition 1 (shortest path with vertex constraint).Given a graph , a vertex subset   ⊆ , a starting vertex V  , and an ending vertex V  in , a path is called the shortest path between V  and V  with vertex constraint of   , denoted as  * , , if it satisfies the following two conditions: (1)  * , travels through all the vertices in   ; i.e., V ∈  * , for every vertex V ∈   and (2)  * , is with the minimum weight among all the paths satisfying the condition (1). Figure 1 illustrates an example of the shortest path with vertex constraint.In this example,   is {V 3 , V 4 , V 5 , V 6 } and these vertices are colored with yellow in Figure 1(b).Two gray vertices, V 1 and V 8 , are the starting vertex and the ending vertex, respectively.Therefore, the shortest path between V 1 and which is shown as the green path in Figure 1(b).
Hamilton path problem is a special case of our problem; then, we have the following theorem straightforwardly.

Theorem 2. The problem of finding the shortest path with vertex constraint over graphs is NP-hard.
Proof.We proof it by reducing Hamilton path problem, which is NP-complete.Given a undirected graph  = (, , ), let V  and V  denote starting vertex and ending vertex, respectively.The weight of every edge in  is set as one.The vertex subset   ⊆  is set as   = \{V  , V  }.Obviously, there exists a Hamilton path from V  to V  in  if and only if the length is || − 1 for the shortest path from V  to V  with vertex constraint of   .This reduction can be done in polynomial time.Therefore, the problem of finding the shortest path with vertex constraint over graphs is NP-hard.

CH Technique for Preprocessing Graphs
Contraction Hierarchies (CH) proposed in [9] is a wellknown technique for speeding up the traditional shortest path query effectively.It essentially builds an index by maintaining the shortest paths for some pairs of vertices.In this paper, we use CH technique for preprocessing graphs to make our method more efficient.
Given a graph (, , ), CH first sorts all vertices in an ascending order and then contracts the vertices one by one under this order.Contraction of vertex V  can be described as removing V  from a graph by adding new edges which represent the shortest path between two vertices adjacent to V  .Such edges are called shortcut edges.Specifically, for each pair of incoming edge (V  , V  ) and outgoing edge (V  , V  ) of V  , if (V  , V  , V  ) is a unique shortest path, then a new shortcut edge (V  , V  ) is added with weight  , +  , to obtain a new graph   .
We use an example in Figure 2 to illustrate the process of vertex contraction.Figure 2(a) shows a graph before the contraction of V 1 .Note that there are two shortest paths between V 4 and V 5 , which are respectively.Thus it is unnecessary to add the edge from V 4 to V 5 when removing V 1 .We also note that there is only one shortest path from V 3 to V 4 .Because this path goes through V 1 , a new edge from V 3 to V 4 can be constructed by removing V 1 .Similarly, a new edge from V 3 to V 5 also can be constructed.Both the weights of such two new edges are 2.The result graph after contraction of V 1 is shown in Figure 2(b).
After contracting vertices, CH divides   into an upward graph   and a downward graph   .The shortest paths can be calculated on   and   .Given a starting vertex V  and an ending vertex V  , a forward Dijkstra [10] search from V  and a backward Dijkstra search from V  are executed on   and   , respectively.The more details about CH technique are given in [9].

Permutation-Expanding Algorithm
In this section, we propose an algorithm to find the shortest path with vertex constraint.We first introduce the definition of permutation expanding, which is the basis of our algorithm, and then we explain the algorithm Permutation-Expanding.Two optimizing techniques are proposed in Section 4.3 and we analyze the time and space complexity of our algorithm in Section 4.4 all vertices in   , where every V  ∈   and V  ̸ = V  for 1 ≤ ,  ≤ ,  ̸ = .Obviously, there are ! permutations for a given   .We use V  ≺ V  to denote that if V  is before V  in , a permutation is essentially an order of the vertices in   .We say a path  is under a permutation , denoted as |  , if it satisfies the following two conditions: (1) V  ∈  for every V  ∈   and (2) there exists a subpath , where V 0 and V +1 are the starting vertex and the ending vertex of , respectively.Each V   V +1 (0 ≤  ≤ ) is called a "segment" of .We use  |  to denote the set of all the segments of .
In the example of Figure 1, A path is called the shortest path between V  and V  under permutation , denoted as Given a permutation ,   =  ⊕ V is an expanded permutation with one vertex V from , where ⊕ is the concatenation operator appending V at the end of .Obviously,  ⊆    and |  | = || + 1.This process is called permutation expanding.
For the example in Figure 1, given a permutation  = V 3 V 4 ,   = V 3 V 4 V 5 and   = V 3 V 4 V 6 are two expanded permutations with one vertex V 5 and V 6 , respectively.

Main Algorithm.
We propose an algorithm, Permutation-Expanding, to find the shortest path with vertex constraint by expanding permutation incrementally.The main idea of the algorithm is essentially best-first searching on the shortest paths under 1-permutation to -permutation of   as soon as possible, until the optimal one has been searched.
The pseudocode of Permutation-Expanding is shown in Algorithm 1. Algorithm 1 utilizes a min priority queue  to maintain a set of tuples (, ()) (line 1). is a subpermutation of   .() is the weight of the shortest path under  from V  to the last vertex of .
).Here  * , represents the shortest path without vertex constraint and ( * , ) can be easily calculated by CH technique as discussed in Section 3. Initially,  only contains all the 1-permutations  of   with its () (lines 2-3).Algorithm 1 dequeues (, ()) iteratively according to ().In each iteration, a (, ()) with the minimum () is dequeued from  (line 11).Let   be the vertex set of .If   ̸ =   , the algorithm generates every permutation   by appending every vertex V ∈   −   at the end of  and enqueues (  , (  )) into .Otherwise,  is a permutation of   ; Algorithm 1 generates  ⊕ V  and enqueues it into  (lines 6-10).Algorithm 1 terminates when a permutation  ⊕ V  is dequeued for the first time, where  is a permutation of   (line 5).At this moment, ( ⊕ V  ) is the weight of the shortest path  * , with vertex constraint of   and we can obtain  * , by the CH technique (line 12).There is a special case that no path is between V  (or V  ) and V  where V  ∈   .Algorithm 1 can find such case by computing the shortest path between two vertices.For such case, we return no solution for this problem.
Example 4. Given a graph  shown in Figure 1 and (V 6 , 7) into  and then dequeues the first entry is dequeued from .Due to the fact that the last vertex of  is the ending vertex V 8 , where as the shortest path with vertex constraint of   .

Optimizing Techniques.
We give two optimizing techniques to improve the efficiency of Permutation-Expanding algorithm.
Cache Mechanism.Given two different permutations  and   , there may exist the overlapping segments for the shortest paths under  and   .The weights of these overlapping shortest subpaths are unnecessary to be calculated for many times during the permutation expanding.Cache Mechanism is utilized to maintain these values.For the example in Figure 1(a), V 1 and V 8 are the starting and ending vertices, respectively, and Obviously,  and   are two permutations of   .When calculating the shortest path between V 4 and V 5 for the first time, the distance between V 4 and V 5 is maintained and it only needs to be calculated once when  and   are both expanded in Permutation-Expanding.The experimental results validate that Cache Mechanism can avoid redundant calculation effectively.
Permutation Filtering.When a permutation  is dequeued from  in an iteration, Permutation-Expanding generates all expanded permutations   =  ⊕ V by appending every vertex V ∈   −   at the end of .Note that it is unnecessary to enqueue every   into  in this iteration.For two expanded permutations , , then permutation    can be filtered and it does not need to be enqueued into .The following theorem guarantees the correctness of permutation filtering.

Theorem 5. For two expanded permutations 𝜋
and V +  represent the precursor and successor of V  in subpath V   V  .A new path  * , can be obtained by utilizing the shortest path from We concatenate  * , ,  * , , and  * , to get a path   from V  to V  .Obviously,   is a path under a permutation    of   and we have (  ) ≤ ( * ).Theorem 5 has been proved.The conclusion of Theorem 5 is obvious.For the example in Figure 1, let does not need to be enqueued into  in the iteration when  = V 1 is dequeued from .The reason is that all the paths under the permutations expanded from V 1 V 6 cannot be the shortest path with vertex constraint.

Complexity Analysis.
In this section, we analyze the complexity of Algorithm 1.We first analyze the time complexity and then analyze the space complexity.Time Complexity.Because Algorithm 1 may calculate the shortest path for every two vertices in   in the worst case, it needs at most ( + 1)( + 2) calculations for the shortest paths, where  = |  |.For each shortest path calculation, CH runs in ( log  + ) time where  = || and  = ||.In addition, at most ! permutations of   may be created and every permutation is maintained as a tuple which can be done in O(1) time.Therefore, Algorithm 1 runs in ( 2 ( log  + ) + !) time.It is worth noting that  is always far less than  in real applications.
Space Complexity.Algorithm 1 mainly needs to maintain the expanded permutations and expand at most ! permutations.Therefore, the space complexity of Algorithm 1 is (!).

Approximate-Path Algorithm
In this section, we propose an approximate algorithm Approximate-Path to find the shortest path with vertex constraint in polynomial time.In the following, we first define query graph and then explain our approximate algorithm in detail.Next, we prove that the ratio bound of our approximate algorithm is 3. Finally, we analyze the time and space complexity of Approximate-Path.
Given a graph , a vertex subset   ⊆ , a starting vertex V  , and an ending vertex V  in , a query graph   (  ,   ) is a complete graph on   , where be a permutation corresponding to the order of vertices in preorder traversal on ; 16: Move the ending vertex V  to the end of  to get   = V  1 V  2 ⋅ ⋅ ⋅ V   ; 17: Generate the shortest path  between V  and V  under a permutation   ; 18: return ; Theorem 6.It is identical for the weight of the shortest path between V  and V  with vertex constraint of   in  and   .
The main idea of Approximate-Path is as follows.We first compute the minimum spanning tree  of   and then "adjust" some edges in  such that  is converted into a path  satisfying the vertex constraint.The pseudocode of Approximate-Path is shown in Algorithm 2. In Algorithm 2, the minimum spanning tree  of   is first generated in a similar way as Prim Algorithm [11] (lines 1-14).Next, Algorithm 2 executes a preorder traversal on  and then we have a permutation  corresponding to the order of vertices in such preorder traversal on  (line 15).Note that in  the ending vertex V  may not be the last one.In this case, V  is put into the end of  and we get a new permutation   (line 16).Finally, Algorithm 2 returns the shortest path under permutation   as a result (lines 17-18), which is an approximate solution for our problem.
a permutation corresponding to the preorder traversal on  shown in Figure 3(c).Then Approximate-Path removes the ending vertex V 8 to the end of  to get 3(d), and its weight is 12.The shortest path with vertex constraint   for the input graph is shown in Figure 1(b) and its weight is 10.Next, we prove that Approximate-Path is a 3approximation algorithm for shortest path problem with vertex constraint.

Theorem 8. Approximate-Path is a 3-approximation algorithm for finding the shortest path with vertex constraint.
Proof.Let  * , denote a shortest path with vertex constraint of   in   .Obviously,  * , is a spanning tree of   .Therefore, the weight of the minimum spanning tree  of   , computed by Approximate-Path, provides a lower bound on the weight of  * , : The preorder traversal  of  is essentially a vertex permutation of We use | , to denote a path on  under permutation .Note that | , may not be a simple path and every edge in | , appears at most twice.For the example in Figure 3 Here, the edge (V 3 , V 4 ) (or (V 4 , V 3 )) appears twice in | , .Because | , travels through every edge in  at most twice, then we have Based on inequality (1) and equation ( 2), we have Because   is a complete graph, we can generate a simple path |   , on   under permutation .
Additionally, the weight of every edge (V  , V  ) in   is equal to the weight of the shortest path between V  and V  in ; thus, the weight of edge (V  , V  ) cannot be larger than the weight of subpath between V  and V  in | , .It means Given the permutation  of preorder traversal of , Algorithm 2 obtains another permutation   by removing the ending vertex V  to the end of .For the last two vertices ) is an edge in , its weight must be less than the weight of .Otherwise, there must exist a simple path between V |  | and V  in  and its weight cannot be less than the shortest distance between V |  | and V  .Therefore, for both two cases,  * |  |, ≤ () and then we have Because (|   ,  ) is exactly the weight of the approximate shortest path returned by Algorithm 2, then the proof is completed.
Complexity Analysis.We first analyze the time complexity for Algorithm 2. In order to construct the minimum spanning tree of   , we utilize the CH technique to calculate the weight of shortest path between any two vertices in   .It needs ( 2 ( log  + )) time, where  = ||,  = ||, and  = |  |, then the time complexity of Algorithm 2 is ( 2 ( log  + )).In order to construct the minimum spanning tree, Algorithm 2 needs to maintain the weight of shortest path for any two vertices in   , then the space complexity of Algorithm 2 is ( 2 ).

Experiments
This section experimentally evaluates our algorithms against the current state-of-the-art methods.Section 6.1 explains the experimental settings.Section 6.2 presents the performance of algorithms.
6.1.Experimental Settings.All methods are implemented in C++ and tested on a Linux machine with an Intel(R) Core(TM) i7-4770K and 32GB RAM.We repeat each experiment 100 times and report the average result.If a method requires more than 24 hours or more than 32GB RAM to preprocess a dataset , we omit the method from the experiments on .
Datasets.We test 4 real road networks from the 9th DIMACS Implementation Challenge (http://www.dis.uniroma1.it/challenge9/index.shtml) and an email network (http://snap .stanford.edu/data/)as shown in Table 2.For each graph, each vertex represents a road junction and each edge represents a road segment.Table 2 describes the properties of the datasets, where ||, ||, and  are the number of vertices, the number of edges in the road network, and the average degree of vertex, respectively.The full name of each road network is shown in description.
Query Set.In this paper, we investigate the query efficiency by varying the size of the vertex constraint.The size of the vertex constraint is the number of vertices in   .We test 15 kinds of query sets Q1 to Q15, where every query set is a set of queries with an appropriate size of   .For each query set, we test 100 random queries and report the average querying time and space consumption as the results for the current query set.Specifically, the sizes of   for Q1-Q5 are 4,5,6,7,8, respectively, and the sizes of   for Q6-Q10 are 12,14,16,18,20, respectively.The starting and ending vertex for every query are additionally selected in random way.Q11-Q15 are generated as follows.We first randomly select 500 pairs of the starting vertex V  and the ending vertex V  and then calculate distance for every pair of V  and V  .We sort these distances in ascending order and generate Q11-Q15 by dividing these pairs of V  and V  into five query sets.For example, Q11 represents the queries for the pairs of V  and V  whose distances are in the top 100, and so on.For each query, we randomly select six vertices as   ; that is, the size of   is 6.
For a query, if the starting vertex and ending vertex are the same, we call this starting-to-starting query (STS query); otherwise, we call this starting-to-ending query (STE query).In this paper, we present the experimental results of our algorithms for both STS query and STE query.
Compared Methods.For each experiment, we compare Permutation-Expanding (PE) and Approximate-Path (AP) against three algorithms which are unidirectional Dijkstra Search (U.Dijkstra) [8], Level-Sweeping Search (LESS) [8], and Nearest Neighbor Algorithm (ANN) [12].We use CH technique to preprocess the input graphs.The first two INC [13] computes a simple path which does not contain repeated vertex; however, we do not require a simple path in this problem and (2) P-LESS [8] is an optimization algorithm of LESS and mainly achieves the size of search space which typically grows in size proportional to the density of category.
When each category contains only one vertex, P-LESS is equivalent to LESS.

Experimental Results
Exp-1.Query Efficiency.We investigate the impact of the size of   and show the experimental results of STE query in Figure 4(a).On each dataset, we find that U.Dijkstra has the largest querying time for every query.PE outperforms LESS by large margins depending on the size of   for each dataset and their maximum difference is close to two orders of magnitude.The reason is that LESS calculates all the permutations of   .In contrast, PE finds the shortest path with vertex constraint by expanding permutation incrementally, which can avoid calculating the unnecessary permutations as soon as possible.We can see that PE begins to degrade as the size of graph increases.Despite this degradation, it only requires no more than 3 seconds in the worst case (for Q5 on FLA).
For each dataset, we find that AP has the minimum time cost than the other algorithms on every query.Specifically, AP outperforms ANN by one order of magnitude.When the size of   is small, our exact algorithm PE runs less time than the approximate algorithm ANN, and AP answers these queries in subsecond time.We find the querying times of ANN and AP are nonsensitive to the size of   in Figure 4(a).
As shown in Figure 4(b), the query efficiency of STS query is similar to STE query.PE is better than the other exact algorithms and AP has the minimum time cost than the other algorithms on every query.For the same size of   and dataset, the querying time of STE query is less than that of STS query.The reason is that given a starting vertex, PE uses best-first searching on the shortest paths under 1-permutation to -permutation of   as soon as possible, until the optimal one has been searched out.PE gradually expands the path, and finally each vertex in   will be arranged according to its shortest distance from the starting vertex.However, STS query eventually returns to the starting vertex, so it will generate more permutations than STE query, which increases the running time of the algorithm.
When the size of   becomes large, for Q6-Q10 query, because the runtime of the exact algorithms is too long, here we only compare the query efficiency of the approximate algorithms.Figure 5 shows the results of these queries.We find the performance of AP is also better than ANN by an order of magnitude and the querying time of AP does not exceed 2 seconds in the worst case for both STE query and STS query.
Q11-Q15 has the same size of   and the query time is shown in Figure 6.As the distance between the starting vertex and the ending vertex increases, the time required for the query does not increase.This shows that the time required for the query is not related to the distance between the starting vertex and the ending vertex but is only related to the size of   and the scale of the graph.For PE and AP algorithms, they find the shortest path with vertex constraint by expanding permutation incrementally, which can avoid calculating the unnecessary permutations as soon as possible.Moreover, AP can quickly give a solution to the problem by using the query graph.Therefore, AP and PE are more efficient than the other algorithms.
Figure 7 shows the space consumption of our algorithms on Q1-Q5.We can find that the space consumptions of STE query and STS query are nearly the same on every dataset.For every dataset, U.Dijkstra has the largest space consumption.PE has the smallest space consumption among all the exact algorithms and ANN has the smallest space consumption among all algorithms.Because ANN only needs to calculate the |  | + 1 shortest subpaths and does not save any intermediate calculation results, it has less space consumption than AP.Note that our approximation algorithm is with the least space consumption except ANN.
Exp-2.Effectiveness of Optimizing Techniques.For PE, we design two optimizing techniques.The optimizing effectiveness of PE is shown in Figure 8.The speedup ratio is the ratio of the query times of using optimizing techniques and without optimizing techniques.We can see that the optimizing techniques can greatly reduce the query time.Figure 8(a) shows the effectiveness of optimizing techniques on STE query.The results show that the efficiency of PE can be increased by several times through optimizing techniques depending on the size of   for each dataset.In addition to COL, with the increase of the size of   , the ratio of speedup is also increasing.For COL, due to its larger diameter but  narrower width, which means that the traffic network is in strip sharp, PE can have better performance even without any optimizing technique.Consider an extreme case, when the network degenerates into a line, PE also can achieve the best performance without any optimizing technique.Of course, this kind of network is very rare in real life.Figure 8(b) shows the ratio of speedup on STS query.Since STS query needs to calculate more permutations than STE query, the ratio of speedup on STS query is relatively small.Exp-3.Relative Error.The relative error is (  −   )/  , where   and   are the weights of approximation solution and optimal solution, respectively.For every query in this group of experiments, we first use PE to calculate the optimal result, and then use ANN and AP to calculate the approximate result.Figure 9 shows the relative errors of those two approximation algorithms on the different datasets.For STE query, the relative errors in the two datasets NY and FLA are not much different.For datasets BAY and COL, the relative errors of ANN are lower than that of AP.With the increasing of the size of   , the relative errors of both algorithms gradually increase.In all datasets, the relative errors of AP do not exceed 25%.However, for STS query, the relative error is relatively smaller than STE query and the   relative errors of AP do not exceed 15%.For dataset FLA, the relative errors of AP are lower than that of ANN.

Related Work
In this section, we introduce existing works and categorize them as follows.
Traveling Salesman Problem (TSP).The traveling salesman problem is a very classic graph theory problem.So far, there are many algorithms to solve this problem, including exact and approximate algorithms [14].TSP can be transformed into a linear programming problem and solved by some methods for solving linear programming [15][16][17].Dorigo [18] solves TSP problem using ant colony algorithm.In this work, ants of the artificial colony generate pheromones on the edges of the graph.As the pheromone accumulates, the path formed by the pheromone trail produces a shorter feasible solution of TSP.As time progresses, the amount of pheromone in the shorter path gradually increases.The shorter the path, the more the pheromone deposited on it.There are also some approximate algorithms that can quickly give a better solution to the TSP problem [19][20][21].However, TSP is a special case of the problem we studied in this paper.All the methods for TSP cannot solve our problem when   ̸ = .Additionally, these methods cannot be used for large graphs.

Generalized Traveling Salesman Problem (GTSP).
The Generalized Traveling Salesman Problem is a variant of the classical Traveling Salesman Problem.It was first introduced in the late 1960s [22].There are some exact algorithms to solve the GTSP [23][24][25].Specifically, a salesman travels in  cities (each city can only be visited for one time) and has to eventually return to the starting city.Under the conditions that the distances between  cities are given and the traveling route meets certain constraints (for example, if a salesman would like to visit city 1, he/she must ensure that he/she has visited city 2 and city 3), an optimal traveling route can be explored known as Traveling Salesman Problem with Precedence Constraint (TSPPC).Ascheuer et al. [26] proposes an algorithm based on branch cut to solve the asymmetric traveling salesman problem with constraints.Moon et al. [27] and Wang et al. [28] solve the traveling salesman problem with constraints by genetic algorithm and integer programming, respectively.The Hamiltonian path problem with precedence constraints is also known as the sequential ordering problem, which can be described as finding the shortest path between the specified starting point and the specified ending point, which passes through every point once and satisfies the sequence constraints.Karan et al. [29] proposes an algorithm based on the branch boundary method to solve the sequential ordering problem.The existing algorithms for solving GTSP are essentially exhaustive for each possible path and cannot be applied to large graphs.Our algorithm can be applied to large graphs very well.
Trip Planning Query (TPQ).All vertices in a graph are divided into groups, each representing a category.Trip Planning Query is to find a minimum-cost route where, for each given category, at least one vertex should be contained.Li et al. [12] introduce four algorithms for answering TPQ; these algorithms achieve various approximation ratios with respect to  and . is the size of categories and  is the maximum cardinality of any category.Our algorithm is a 3-approximation algorithm and the ratio bound is lower than that of the algorithm in [12].Rice et al. [8] present two exact algorithms to solve this problem.These algorithms use an exhaustive way to search for the optimal path, which adds a lot of unnecessary calculations and greatly increases the running time of the algorithms.Hars et al. [13] propose a heuristic algorithm that follows the divideand-conquer approach to compute a simple path which passes through all vertices specified by user.The original question is divided into two subquestions and the algorithm consists of two main steps: (1) for a given set of must-visited vertices and the corresponding visited order, consider each pair of consecutive vertices represent a subpath of the entire end-to-end path, and then calculate all candidate subpaths; (2) concatenate candidate subpaths, one from each pair of consecutive vertices, in order to establish a simple path from starting vertex to ending vertex.Since the path we are finding does not require a simple path, the algorithm does not apply to our problem.Cao et al. [30] introduce some algorithms for solving Keyword-aware Optimal Route (KOR) queries.A KOR query adds a cost constraint based on the category constraint,; that is, the optimal path returned should satisfy the userspecified cost budget.Shang et al. [31] propose and study a novel problem for dynamically monitoring the shortest path in spatial network, with the aim of accelerating the shortest path computation in a dynamic spatial network.Shang et al. [32] design an exact algorithm and an approximation algorithm to solve Collective Travel Planning query problem.The query finds the lowest cost route connecting multiple sources and a destination with up to  meeting points.

Conclusion
To find the shortest path with vertex constraint, we propose an exact algorithm named Permutation-Expanding and give two optimizing techniques to improve its efficiency.Moreover, we also propose an approximate algorithm named Approximate-Path in polynomial time for this problem over large graphs.We conduct extensive experiments on reallife datasets and compare our algorithms with the state-ofthe-art methods.The experimental results validate that our algorithms always outperform the existing methods even though the size of graph or given set of vertices is large.In the future work, we will study the index techniques to facilitate the queries such that our algorithms are more time and space efficient on the larger graphs.

Figure 1 :
Figure 1: An example of the shortest path with vertex constraint.

Example 7 .
Figures 3(a) and 3(b) show the query graph   and the minimum spanning tree  of   , respectively.

Figure 3 :
Figure 3: An example of an approximate path.

Complexity 3 Table 1 :
List of notations.An undirected weighted graph V  , V  ,   Starting vertex, ending vertex, vertex constraint  , , () Weight of edge (V  , V  ), weight of path   * , |  , if every segment V   V  ∈   * , |  is a shortest path.Then we have the following theorem.Given an undirected graph , a vertex subset   , a starting vertex V  , and an ending vertex V  in , the shortest path  * , between V  and V  with vertex constraint of   is exactly  * , |  with the minimum weight among all the permutations of   ; i.e.,  * , = min{ * , |  |  ∈ Π(  )}, where Π(  ) is the set of all permutations of   .Proof.Assuming that  , |   is a path under a permutation   from V  to V  and the weight of  , |   is less than that of  * , |  , then there will be the following four situations.If   and  are the same permutations and not all of the segments V   V  ∈   * , |   are shortest path, obviously the weight of  , |   is greater than that of  * , |  .This contradicts the assumption.(3) If   and  are different permutations and every segment V   V  ∈   * , |   is a shortest path, because  * , |  is the path with the minimum weight among all the permutations, the weight of  * , |  is less than that of  , |   .This contradicts the assumption.(4) If   and  are different permutations and not all of the segments V   V  ∈   * , |   are shortest path, obviously the weight of  , |   is not smaller than that of  * , |  .This contradicts the assumption.To sum up,  * , |  is the shortest path between V  and V  with vertex constraint of   .For two vertex subsets   and    on , if   ⊆    , for every permutation  of   , there must exist a permutation   of    , such that * (1) If   and  are the same permutations and every segment V   V  ∈   * , |   is a shortest path, obviously  , |   and  * , |  have the same weight.This contradicts the assumption.
,   , V  , V  . , V  : starting vertex and ending vertex respectively // Output:  * , : the shortest path between V  and V  with vertex // constraint of   1: Let  be a min priority queue with entries in the form (, ()), sorted in ascending order of (); 2: for each V  ∈   do 3: Enqueue an entry (V  , ( * , )) into ; 4: Dequeue the first entry (, ()) from  and let V  be the last vertex of ; Dequeue the first entry (, ()) from  and let V  be the last vertex of ; 12: Generate the shortest path  * , between V  and V  under a permutation ; 13: return  , denote the shortest path from V  to V  .Consider the path  * the shortest path  * , under    from V  to V  is a subpath of  * , , then for any permutation    of   ,    ⊆     , there exists a permutation    of   ,    ⊆     , such that the weight of the shortest path under    from V  to V  must Input: * , ; Algorithm 1: Permutation-Expanding (,   , V  , V  ).not be less than the weight of the shortest path under    from V  to V  .Proof.Given a shortest path  * under    from V  to V  ,  * , is obvious a prefix subpath of  * .Let  * * , ), where () represents the weight of path .Next, we consider the subpath ,   , V  , V  .Output: .// Input: : an undirected weighted graph //   : a vertex subset of  // V  , V  : starting vertex and ending vertex respectively // Output: : the approximate shortest path between V  and V  // with vertexconstraint of   1: Let  be a min priority queue with the entries in the form ⟨V  , V  ,  * , ⟩, sorted in the ascending order of  * , , where  * , is the shortest distance between V  and V  ; 2:   ←   ∪ {V  , V  },  ← |  |; 3: for each V  ∈   − {V  } do 4: Enqueue an entry ⟨V  , V  ,  * , ⟩ into ; 5:   ← 0,   ← {V  }; 6: while   ̸ =   do 7: Dequeue the first entry ⟨V  , V  ,  * , ⟩ from ; 8: if V  ∈   then   ←   ∪ {V  },   =   ∪ {(V  , V  )}; 12: for each V  ∈   −   do 13: Enqueue an entry ⟨V  , V  ,  * , ⟩ into ; 14:  ← (  ,   ); 15: Traverse  by preorder and let In   , the weight  , of every edge (V  , V  ) is the shortest distance  * , between V  and V  in .Here,  * , is weight of the shortest path between V  and V  without vertex constraint in .The following theorem indicates that we only need to find the shortest path with vertex constraint over   .ComplexityInput: