Dynamic thresholding search for the feedback vertex set problem

Given a directed graph G = (V, E), a feedback vertex set is a vertex subset C whose removal makes the graph G acyclic. The feedback vertex set problem is to find the subset C* whose cardinality is the minimum. As a general model, this problem has a variety of applications. However, the problem is known to be NP-hard, and thus computationally challenging. To solve this difficult problem, this article develops an iterated dynamic thresholding search algorithm, which features a combination of local optimization, dynamic thresholding search, and perturbation. Computational experiments on 101 benchmark graphs from various sources demonstrate the advantage of the algorithm compared with the state-of-the-art algorithms, by reporting record-breaking best solutions for 24 graphs, equally best results for 75 graphs, and worse best results for only two graphs. We also study how the key components of the algorithm affect its performance of the algorithm.


INTRODUCTION
Given a directed graph G ¼ ðV; EÞ, where V denotes the set of vertices and E the set of edges, a feedback vertex set (FVS) is a vertex subset C & V whose removal leads to an acyclic graph. The feedback vertex set problem (FVSP) aims to identify a FVS of minimum cardinality. In other words, we want to remove the fewest vertices to make the graph acyclic.
The decision version of the FVSP is one of the 21 nondeterministic polynomial-time complete (NP-complete) problems, which were first proved in the early 1970s (Cook, 1971;Karp, 1972). Its broad applications include very large scale integration circuit design (Festa, Pardalos & Resende, 1999), deadlock detection (Leung & Lai, 1979;Wang, Lloyd & Soffa, 1985), program verification (Seymour, 1995), Bayesian inference (Bar-Yehuda et al., 1998), operating systems (Silberschatz, Galvin & Gagne, 2006) and complex network systems (Liu, Slotine & Barabási, 2011). A typical application of the FVSP is to control the state of a complex network system, making the system to change from any given state to an expected state by controlling a minimal subset of vertices from the outside. For instance, Mochizuki et al. (2013), Fiedler et al. (2013) and Zhao et al. (2020) investigated FVS-based control mechanisms. This FVS approach proves to be suitable when only the network formation is known, while the functional form of the governing dynamic equations are ambiguous (Zhao et al., 2020). Studies also showed that this approach needs to remove fewer vertices than other structure-based methods in many cases (e.g., Zañudo, Yang & Albert, 2017). Figure 1A shows a directed graph G with five vertices fa; b; c; d; eg. Figure 1B displays an arbitrary FVS fa; b; dg with a cardinality of 3, while Fig. 1C presents an optimal FVS fb; cg with a minimum cardinality of 2.
Some approximation algorithms were proposed for the FVSP to provide solutions of provable quality. Erdős & Pósa (1962) presented an algorithm with an approximation ratio of 2logn (n ¼ jVj). Later, Monien & Schulz (1981) improved the approximation ratio to ffiffiffiffiffiffiffiffi logn p . Even et al. (1998) realized an approximation factor of OðlogsloglogsÞ on directed graphs, where s is the size of a minimum FVS for the input graph. Other polynomial time approximation algorithms for the FVSP in tournament graphs include those presented by Cai, Deng & Zang (2001), Mnich, Williams & Végh (2015) and Lokshtanov et al. (2021).
From the perspective of solution methods for the FVSP applied to the ISCAS89 benchmark instances (up to 1,728 vertices), several exact algorithms combined with graph reduction have been proposed. Specifically, Levy & Low (1988) presented an exact reduction based on the graph structure and proved its equivalence to the original graph. Based on the exact reduction of Levy & Low (1988), Orenstein, Kohavi & Pomeranz (1995) proposed graph partitioning methods with new reduction operations, which achieved optimal results on all the ISCAS89 benchmark instances within 2 CPU hours on a Sun-4 station. Lin & Jou (1999) investigated the branch-and-bound algorithm that considers the exact reduction of Orenstein, Kohavi & Pomeranz (1995), which could find the optimal results for the ISCAS89 benchmarks in less than 3 s on a SUN-UltraII workstation. There are many vertices whose in-degrees or out-degrees are 0 or 1 in the ISCAS89 benchmark instances. The average reduction ratio (the sum of the deleted vertices/the sum of vertices of a given graph) of these reduction approaches is 72.48%, implying that these benchmark instances are easy for modern FVSP algorithms. Hence, we report in this work computational results not only on these ISCAS89 instances, but also on more challenging benchmark instances. Some theoretical exact algorithms were reported without experimental validation. For example, Razgon presented a backtrack algorithm that solved the FVSP in time Oð1:8899 n Þ (Razgon, 2006) and a branch-and-prune algorithm requiring Oð1:9977 n Þ time (Razgon, 2007). Fomin, Gaspers & Pyatkin (2006) developed a branching algorithm with a time complexity of Oð1:7548 n Þ. Some exact algorithms were also Experiments are performed on 101 benchmark instances from various sources to assess the IDTS algorithm. For the 70 instances with unknown optima, IDTS is able to improve 24 best-known solutions and attain the best-known results for 44 other instances. Only for two instances, IDTS reports a worse result. Moreover, IDTS easily attains the known optimal results for all 31 ISCAS89 benchmark instances.
The remainder of this article is arranged as follows. "Basic Notations and Fitness Function" introduces useful basic notations and fitness function of the FVSP. "Preliminaries" is a preliminary presentation. "Iterated Dynamic Thresholding Algorithm for the FVSP" explains the components of the IDTS algorithm. "Experimental Results and Comparisons" evaluates the algorithm with computational results. "Analysis" studies critical components of the proposed algorithm, and "Conclusions" provides conclusions.

BASIC NOTATIONS AND FITNESS FUNCTION
This section introduces relevant basic definitions, solution representation and fitness function, which are necessary for presenting the proposed algorithm.

Basic definitions
Given a directed graph G ¼ ðV; EÞ, basic definitions that are useful for describing the proposed IDTS algorithm are presented as below.
Definition 1: a critical vertex of G is a vertex that belongs to a FVS. We use C to denote the set of critical vertices that have been detected. C is a FVS only when all vertices of the FVS are detected.
Definition 2: an uncritical vertex is a vertex that does not belong to a FVS. We use U to denote the set of uncritical vertices, and V ¼ C [ U; C \ U ¼ [.
Definition 3: a redundant vertex refers to a vertex that is recognized as critical or uncritical according to the exact rules proposed by Levy & Low (1988). We use V r to denote the set of redundant vertices that have been detected, C r to denote the set of critical vertices of V r , U r to denote the set of uncritical vertices of V r , and V r ¼ C r [ U r ; C r \ U r ¼ [.
Definition 4: V 0 refers to the set of residual vertices after applying the removal exact algorithm proposed by Levy & Low (1988). C 0 denotes the set of feedback vertices of V 0 (that is, all vertices of a FVS are detected and belong to C 0 ), U 0 denotes the set of nonfeedback vertices of V 0 , and Levy & Low (1988) proved that the FVS of the reduced graph plus the FVS removed in the reduction process composes the FVS of the original graph. Let C r be the set of vertices removed in the reduction process, and C Ã 0 be the minimum FVS of the reduced graph G is a minimum FVS of the given graph G ¼ ðV; EÞ. In this case, only the feedback vertices of the reduced graph need to be found out.
In summary, the vertex set V of G consists of two disjoint sets fV 0 ; V r g or four disjoint sets fC 0 ; U 0 ; C r ; U r g.
Vertices in each DAG are in a topological ordering where the starting point of every directed edge is ahead of its terminal point (Galinier, Lemamou & Bouzidi, 2013).
VnU is a FVS. Hence, the objective of the FVSP is to find the set U that has the maximum cardinality to make G U ¼ ðU; E U Þ acyclic. Figure 2 presents an example illustrating these basic definitions. For the given graph G ¼ ðV; EÞ, let fa; b; d; hg be the current FVS. The set of critical vertices C is the current FVS fa; b; d; hg, and the set of remaining vertices is the set of uncritical vertices U, i.e., fc; e; f ; g; i; jg. According to the rules in "Reduction Procedure", h can be recognized as a critical vertex (C r ¼ fhg, purple vertex) and f ; g; i; j as uncritical vertices (U r ¼ ff ; g; i; jg, dark blue vertices). Thus the set of redundant vertices V r is ff ; g; h; i; jg, and the set of residual vertices V 0 is fa; b; c; d; eg. V 0 can be divided into C 0 ¼ fa; b; dg (orange vertices) and U 0 ¼ fc; eg (blue vertices). Clearly, the graph induced by the vertices in U ¼ fc; e; f ; g; i; jg is a DAG without directed cycles.

Solution representation and fitness function
The solution representation and fitness function of the FVSP are given as follows.
Solution representation: the constraint of the FVSP is that there is no cycle in U 0 after removing the set of redundant vertices C r [ U r and the set of critical vertices C 0 . To quickly assess the number of cycles in U 0 after each neighborhood operation, the number of conflicts (see "Preliminaries") is taken as the number of cycles (Galinier, Lemamou & Bouzidi, 2013). Let p be an assignment of the vertices of U 0 to the positions f1; 2; . . . ; jU 0 jg, and thus the permutation p denotes the candidate solution (Galinier, Lemamou & Bouzidi, 2013).
Fitness function: To evaluate the quality of the FVS C, the evaluation or fitness function counts the number of vertices in C. Recall that U r is the set of uncritical vertices of the redundant vertices and p is the corresponding permutation solution of C. The fitness function f 0 (to be minimized) is given by Thus, the minimization of the function f 0 is equal to the maximization of the fitness function f , which is expressed as:

PRELIMINARIES
In this section, we introduce two properties of FVS: number of conflicts and INSERT operator position.
Number of conflicts: u and v are a pair of conflicting vertices if v is ahead of u in permutation p and there is a directed edge from u to v. The number of conflicting vertex pairs of the permutation p is the number of conflicts. Let edgeðu; vÞ ¼ 1 if there is a directed edge from u to v. Otherwise, edgeðu; vÞ ¼ 0. We use cðu; vÞ, where u; v 2 p; u 6 ¼ v; edgeðu; vÞ ¼ 1, to indicate whether u and v form a conflicting vertex pair as follows where p v represents the position chosen for vertex v in permutation p. Then, the number of conflicts gðpÞ is given by Thus, for a conflict-free solution p, gðpÞ ¼ 0 holds. The time complexity to compute gðpÞ is Oðd max Þ, where d max denotes the largest degree of a vertex in the graph. Clearly, the number of conflicts is more than the actual number of cycles for the same solution. Figure 3 displays two cases between the number of conflicts and the number of cycles. The left part of Fig. 3A indicates the current remaining vertex set U 0 of the given G, and the right part is its corresponding permutation p ¼ fa; b; dg. In this permutation, no directed edge from b to a (d to b or to a) exists. Thus, there is no conflicting pair, meaning the number of cycles is 0. Figure 3B shows a situation where the number of conflicts is more than the number of cycles. Vertex b is behind vertex d in the solution permutation p ¼ fd; a; bg on the right side, and there is a directed edge from b to d. Accordingly, vertex b and vertex d are conflicting, and the number of conflicts is 1, which exceeds the number of cycles (0).
INSERT operator position: The INSERT operator inserts a vertex v from the uncritical vertex set to two possible positions in p (Galinier, Lemamou & Bouzidi, 2013); one is closely behind its numbered in-coming neighbors (named i À ðvÞ), and the other is just ahead its numbered out-going neighbors (named i þ ðvÞ). The number of conflicting pairs after the insertion operation at the above two positions is calculated respectively, and the position with less conflicting pairs is selected. We use , v; i À ðvÞ; i þ ðvÞ . to represent such a move, and p È , v; i À ðvÞ; i þ ðvÞ . to stand for the neighboring solution generated by applying the INSERT move to p. Moreover, gðp È , v; i À ðvÞ; i þ ðvÞ . Þ refers to the number of conflicts after inserting the vertex v 2 C 0 . N I ðpÞ contains the vertices that satisfy the condition gðp È , v; i À ðvÞ; i þ ðvÞ . Þ ¼ 0. That is, the vertex v to be inserted has to be a vertex that will not cause any conflict after being inserted into p. N I ðpÞ can be expressed as ITERATED DYNAMIC THRESHOLDING ALGORITHM FOR THE FVSP

Basic steps
This section introduces the iterated dynamic thresholding algorithm for solving the FVSP, which is composed of five main procedures as shown in Algorithm 1. Reduction procedure: IDTS adopts a set of conventional reduction rules (Levy & Low, 1988) to simplify the given graph G. Firstly, a set of redundant vertices V r (made up of the set of critical vertices C r and the set of uncritical vertices U r ) is confirmed according to those rules. Then, the redundant vertices and related edges (whose starting or ending vertex is a redundant vertex) are deleted, reducing the input graph G ¼ ðV; EÞ to the reduced graph G ¼ ðV 0 ; E 0 Þ (see "Reduction Procedure"). Initialization procedure: this procedure greedily chooses a vertex (Cai, Huang & Jian, 2006) such that its insertion to p does not increase the number of cycles. This process continues until no such vertex can be inserted (see "Greedy Initialization").
Local search procedure: this procedure consists of two complementary search stages: a dynamic thresholding search stage (diversification) to extend the search to unexplored regions, and a descent search stage (intensification) to find new local optimal solutions with improved quality. These two stages alternate until the best-found solution cannot be further improved for x continuous local search rounds (see "Local Search").
Perturbation procedure: When the search is considered as trapped in a deep local optimum, the perturbation procedure is initiated to move some specifically identified vertices between U 0 and C 0 to relieve the search from the trap. The solution perturbed is then adopted to start the next round of the local search procedure (see "Perturbation Procedure").
Recovery procedure: If the best solution ever found cannot be improved after c continuous local search rounds and the perturbation phase, the search then terminates and the recovery procedure starts. The best solution (minimum feedback vertex set) found in the search procedure is recorded as C Ã 0 and returned as the input of the recovery procedure. The current reduced graph G ¼ ðV 0 ; E 0 Þ is recovered to the original graph G ¼ ðV; EÞ, and the FVS C Ã 0 for G ¼ ðV 0 ; E 0 Þ is correspondingly projected back to C Ã 0 [ C r for G ¼ ðV; EÞ (see "Recovery Procedure").

Reduction procedure
The graph reduction procedure follows three rules when traversing all vertices in a given graph G ¼ ðV; EÞ and processes those that satisfy any rule proposed by Levy & Low (1988).
The three reduction rules are as follows.
Rule 1: If the in-degree (out-degree) of a vertex v is 0, that is, v is an uncritical vertex, then v and all its edges can be deleted without missing any optimal feedback vertex of G. Such vertices are added into the redundant uncritical set U r (U r ¼ U r [ fvg). For example, as shown in Fig. 4B, the edges of the vertex g whose in-degree is 0, and those of the vertex j whose out-degree is 0 can be deleted.
Rule 2: If the in-degree (out-degree) of a vertex v is 1, and there is no self-loop, then vertex v can be merged with the unique precursor (successor) vertex without missing any optimal feedback vertex of G. The merging process is that all edges connected to the vertex v are linked to the unique precursor (successor) vertex of v. Such vertices are added into the redundant uncritical set U r (U r ¼ U r [ fvg). Figure 4C displays that the vertex f with an in-degree of 1 can be merged with the precursor vertex a, and the vertex i with an outdegree of 1 can be merged with the successor vertex b.
Rule 3: If a self-loop exists for a vertex v, then v and all its edges can be deleted and recovered as a part of the feedback vertex set without losing any optimal feedback vertex of G. Such vertices are added into the redundant uncritical set C r (C r ¼ C r [ fvg). As shown in Fig. 4D, the self-loop vertex h and its connected edges can be deleted.
After deleting the sets U r and C r , the remaining vertex set V 0 ¼ VnðU r [ C r Þ, and the reduced subgraph G ¼ ðV 0 ; E 0 Þ, ðE 0 ¼ ðV 0 Â V 0 \ EÞÞ. After obtaining reduced G, the greedy initialization is used to generate an initial solution for it.

Greedy initialization
Given the reduced subgraph G ¼ ðV 0 ; E 0 Þ, its critical vertex set is defined as C 0 , and the uncritical vertex set as U 0 . Recall that p is an assignment of the vertices of U 0 to the positions f1; 2; . . . ; jU 0 jg. We initialize C 0 ¼ V 0 , p ¼ [. Then, p is iteratively extended in the greedy procedure by inserting a minimum-score vertex v of C 0 until no vertex can be inserted.
where deg À ðvÞ and deg þ ðvÞ are the in-degree and the out-degree of v respectively, and k is a parameter (k ¼ 0:3 according to Cai, Huang & Jian (2006)). (3) Update N I ðpÞ: after inserting the vertex v into p, we only have to recalculate the number of conflicts gðp È , u; i À ðuÞ; i þ ðuÞ . Þ of its neighbors u 2 N I ðpÞ according to Eq. (4). Any vertex satisfying gðp È , u; i À ðuÞ; i þ ðuÞ . Þ 6 ¼ 0 will be eliminated from N I ðpÞ. Thus, the complexity of this updating step is Oðd 2 max Þ, where d max is the largest degree of a vertex in the graph. This process continues until no vertex can be inserted into p, i.e., N I ðpÞ ¼ [.
In this initialization, a legal conflict-free p with a certain quality can be obtained, and further improved in the dynamic thresholding search stage of the algorithm.
To explain this process, we consider the reduced graph Figure 5 shows how the greedy procedure works. Firstly we calculate N I ðpÞ ¼ fa; b; c; d; eg. Then, it is detected that scoreðaÞ ¼ 4:7; scoreðbÞ ¼ 4; scoreðcÞ ¼ 4; scoreðdÞ ¼ 4:7; scoreðeÞ ¼ 4 according to Eq. (6). Finally, we select a vertex from N I ðpÞ with the minimum score and insert it into p. Suppose we select vertex e for insertion. N I ðpÞ is updated to fa; c; dg. As shown in Fig. 5A, the solution after the first greedy insertion is the permutation p ¼ feg. By repeating the above steps until N I ðpÞ ¼ [, we obtain the local optimal permutation p ¼ fe; cg (

Local search
The local optimization aims to improve the initial permutation provided by the greedy initialization, and it consists of two stages. The first stage (dynamic thresholding search) brings diversity as it accepts equivalent or worse solutions (line 4 of Algorithm 3), and the second stage applies a descent search that accepts only better solutions (line 5 of Algorithm 3) to guarantee a concentrated and directed search. These two stages alternate until the best found solution cannot be further improved for x local search rounds.

The dynamic thresholding search stage
There are many successful applications of dynamic thresholding search (Dueck & Scheuer, 1990;Moscato & Fontanari, 1990) (e.g., the frequency assignment (Diane & Nelson, 1996), In this work, three basic move operators (DROP, INSERT and SWAP) are adopted in the thresholding search stage that accepts both equivalent and better solutions. DROP deletes a vertex from the current permutation p and move it to C 0 ; INSERT extends the current permutation p by introducing a new vertex; SWAP deletes a vertex v from the current permutation p and inserts a vertex u into p.
Based on these three move operators, dynamic thresholding search (DTS) adopts both the vertex-based strategy and the prohibition mechanism to balance the exploration and exploitation of the search space. In each search round at this stage, the algorithm first randomly visits all vertices in V 0 one by one. For each vertex v considered, the set of candidate move operators is executed depending on whether the vertex is in or out of the Algorithm 2: Greedy initialization for FVSP Calculate N I ðpÞ according to Eq. (5) /*Step 1*/ 5 end Choose a vertex v 2 N I ðpÞ with the minimum score according to Eq. (6) and insert it into π /*Step 2*/ permutation p. If the objective value of the obtained solution is not worse than the best solution found ever to a certain threshold, the move operation on the vertex is executed. Otherwise, the operation is rejected. Each time a move is taken, the concerned vertex is marked as tabu and forbidden to be moved again during the next tt iterations (tt is the tabu tenure). This process continues until all vertices of V 0 are traversed.
(1) If v is outside p (i.e., v 2 C 0 ), the candidate move operator set consists of INSERT and SWAP. INSERT is applied first as it improves the solution quality. Then SWAP is applied, which keeps the solution quality unchanged. If neither operator can be applied, DTS just skips v. INSERT can be applied if the number of conflicts of v in p is 0 (i.e., gðp È , v; i À ðvÞ; i þ ðvÞ . Þ ¼ 0). SWAP is applied if v meets two conditions simultaneously: (1) v is not involved in any SWAP operation already taken at the current round; (2) v conflicts with just one vertex u in p (i.e., gðp È , v; i À ðvÞ; i þ ðvÞ . Þ ¼ 1, and can only be swapped with u).
Similar to the INSERT operation in the greedy initialization, we adopt gðp È , v; i À ðvÞ; i þ ðvÞ . Þ for quick computation during the DTS stage. That is, we Algorithm 4: The dynamic thresholding search Input: Reduced graph G ¼ ðV 0 ; E 0 Þ, solution π, best solution p ? , the iterations without improvement NoImprove Output: Solution π, best solution p ? , the iterations without improvement NoImprove if the number of conflicts after inserting v into p is 0 then else if v only conflicts with u 2 p and has not been involved in any SWAP operation then 10 Remove u from π and insert v into π /* SWAP operator */ (2) If v belongs to p, DROP and SWAP are the two candidate operators. SWAP is applied before DROP as SWAP does not degrade the solution quality while DROP does. If neither operation can be applied, the algorithm just skips v. SWAP can be applied only if v satisfies two conditions simultaneously: (1) v was not involved in any SWAP operation already taken at the current round; (2) the set NMðvÞ & C 0 of v is non-empty, which is defined as NMðvÞ ¼ fu : gðp nfvg È , u; i À ðuÞ; i þ ðuÞ . Þ ¼ 0; cðu; vÞ ¼ 1g: The vertex u that is to be swapped with v is a random vertex in NMðvÞ. DROP can be applied if it makes the number of the vertices in p still above the threshold determined by f ðp ? Þ À d after the DROP operation, where p ? is the best recorded solution and d (a small positive integer) is a parameter. For the DROP operation, we need to update the number of conflicts of v and all vertices neighboring to v and not in the solution p. The time complexity of DROP is Oðd 2 max Þ. Figure 6 shows an example of the dynamic thresholding search stage. To explain this stage, we consider the solution in Fig. 5B as the input solution. For Fig. 5B, Suppose that the vertices in V 0 are randomly shuffled into fa; e; d; b; cg. As shown in Fig. 6A, since the first vertex a is outside p, INSERT and SWAP are the two candidate operators. INSERT is chosen to be applied before SWAP. The INSERT operator cannot be used since the number of conflicts of v is not 0. However SWAP can be applied since c only conflicts with a and a is not in the tabu list. As shown in Fig. 6B, for the second vertex e 2 p, SWAP and DROP are the two candidate operators to be considered. SWAP is applied before DROP. SWAP can be applied since in this case NMðeÞ ¼ fb; dg and v is not forbidden by the tabu list. Thus the second vertex e is swapped with a random vertex in NMðeÞ, such as d. After that, the remaining vertices in V 0 are evaluated in the same way, while no operators can be applied to them. As a result, the improved solution is p ¼ fd; ag.

The descent search stage
To complement the DTS stage where both equivalent and worse solutions are accepted, the descent search stage is subsequently applied to perform a more intensified examination of candidate solutions. Basically, this stage iteratively selects a conflict-free vertex and inserts it into the solution until such a vertex does not exist anymore. Figure 7 shows an example of the descent search stage. To explain this stage, we consider the solution in Fig. 6B as the input solution, where C 0 ¼ fc; b; eg, U 0 ¼ p ¼ fd; ag and V 0 ¼ fa; b; c; d; eg. Through computation, N I ðpÞ ¼ feg. As shown in Fig. 7, the vertex e from N I ðpÞ is directly inserted into p and the best solution p ? is updated to fd; e; ag.

Perturbation procedure
As described in "The Dynamic Thresholding Search Stage", the threshold search accepts worse solutions that are within a certain quality threshold from the current solution, which relieves the search from the local optimum trap. However, there is a possibility that this strategy may fail. Therefore, we introduce a perturbation strategy that comes into effect when the search falls into a deep stagnation (i.e., the best solution does not change after x consecutive local search runs). The perturbation strategy incorporates a learning mechanism that gathers move frequencies information from the local search, which is then advantageously used to guide the perturbation. Algorithm 6 displays the perturbation procedure, which is decomposed into two steps: Step 1: Choose and sort L vertices in p. Choose L vertices in p with the highest move frequencies and sort them in a non-increasing order of the frequencies (lines 1-3, Algorithm 6). The move frequency of each vertex v is the number of times that v has been moved during the local search, which is initially set to 0, and increases by 1 each time v is moved from one set to another.
Step 2: Drop and insert the to-be-perturbed vertices. Each vertex v ðv 2 AÞ is dropped, whose order j in A is recorded (line 6, Algorithm 6). If j . db 1 Â ðjpj þ 1Þe and N I ðpÞ 6 ¼ [, randomly select a vertex u from N I ðpÞ and insert it into p (lines 8-9, Algorithm 6). Recall that N I ðpÞ represents the set of vertices in C 0 satisfying the condition gðp È , v; i À ðvÞ . ; i þ ðvÞ .Þ ¼ 0. Figure 8 shows an example of the learning-based perturbation applied to a local optimal solution as shown in Fig. 7, where C 0 ¼ fc; bg and U 0 ¼ p ¼ fd; e; ag. Suppose L ¼ 2 and the chosen vertices is sorted as A ¼ fe; ag according to the move frequencies. The first vertex e is dropped, which leads to an intermediate perturbed solution p ¼ fd; ag. Then, the second vertex a is also dropped. Since the order of a is 2, which is more than db 1 Â ðjpj þ 1Þe ¼ 1 vertex, and N I ðpÞ 6 ¼ [, the vertex b is selected randomly from N I ðpÞ and inserted into p, giving the perturbed solution p ¼ fb; dg.

Recovery procedure
This is a reversed procedure of the reduction procedure. It restores the original graph G ¼ ðV; EÞ from the reduced graph G ¼ ðV 0 ; E 0 Þ by adding back the removed vertices U r and the critical vertices C r . Levy & Low (1988) indicates that the FVS of the original G is Figure 9 depicts an example that shows how the minimum FVS is determined. In the reduction procedure, C r ¼ fhg; U r ¼ ff ; g; i; jg. After the search stage, After the recovery procedure, the FVS of the original G is C ¼ C 0 [ C r ¼ fb; c; hg. Computational complexity and discussion We consider first the greedy initialization procedure consisting of two stages. The first stage is to initialize the array N I ðpÞ, which can be realized in OðjV 0 jÞ. The complexity of updating N I ðpÞ is Oðd 2 max Þ. The second stage is to construct the initial solution p, which is bounded by Oðjpj Â d 2 max Þ, and jpj is the size of p. Therefore, the time complexity of the greedy initialization procedure is OðjV 0 j þ jpj Â d 2 max Þ. Next, the local search and perturbation procedures in the main loop of IDTS algorithm are considered. In each iteration of the local search, the dynamic threshold search and the descent search stages are performed alternately. The former is realized in Oðjpj Â d max þ jV 0 jÞ, and the latter in OðjV 0 jÞ. Thus, the complexity of the local search procedure is OðK 1 Â ðjpj Â d max þ jV 0 jÞÞ, where K 1 is the number of iterations of the local search. Then, the perturbation procedure can be achieved in Oðjpj Â ðb 1 þ b 2 Â d 2 max ÞÞ, which is much smaller than that of the local search. Therefore, the complexity of one iteration of the main loop of IDTS algorithm is OðK 1 Â ðjpj Â d max þ jV 0 jÞÞ, and that of SA is OðK 2 Â ðd 2 max þ jV 0 jÞÞ, where K 2 is the number of iterations during each temperature period. Therefore, it can be seen that the two complexities are of the same order of magnitude.

EXPERIMENTAL RESULTS AND COMPARISONS
We test the proposed IDTS algorithm for the FVSP on 71 commonly-used benchmark instances in the literature and 30 large instances generated by this work ("Benchmark Instances") and compare its results with the state-of-the-art algorithms in "Comparison with State-of-the-Art Results". In addition to these directed instances, we also present comparative results on directed graphs obtained by a slightly adapted version of the IDTS algorithm ("Comparative Results on Undirected Graphs"). Below, we first present the 101 directed graphs as well as the experiment settings.

Benchmark instances
We use 101 benchmark instances, which are classified into five categories. No optimal solutions are known for the instances of the first to forth categories, while optimal solutions are known for the instances of the fifth category.
1. The first category consists of 40 instances that are randomly generated by Pardalos, Qian & Resende (1998) using the FORTRAN random graph generator mkdigraph.f (http:// mauricio.resende.info/data/index.html). The name of these instances is in the form of P|V | − |E|*, where jVj 2 f50; 100; 500; 1;000g is the number of vertices in the graph, and jEj 2 ½100; 30;000 is the number of edges. Given the number of vertices and edges, a graph is built by randomly selecting |E| pairs of vertices as two endpoints of a directed edge. These instances are largely tested in the literature on the FVSP (Galinier, Lemamou & Bouzidi, 2013;Zhou, 2016).
2. The second category is composed of 10 random directed graphs, which are generated in the same way as the first category while the in-degree and out-degree of each vertex are no more than 10. These instances have R|V | − |E|* in their names. The number of vertices |V | is in the interval [100, 3,000], and the number of edges |E| in [500, 15,000].
3. The third category contains 10 artificially generated scale-free instances. These instances are generated by this work through the "powerlaw_cluster_graph" function of the "NetworkX" package, which is based on the algorithm proposed by Holme & Kim (2002). These instances are named as S|V | − |E|*, where |V | is in the interval [500, 3,000] and |E| is in [4,900, 29,900].
4. The fourth category is composed of 10 real-world instances from the Stanford large network dataset collection (http://snap.stanford.edu/data/). Nine of these instances are snapshots of the Gnutella peer-to-peer file sharing network. The remaining instance is a temporal network representing Wikipedia users editing each other's Talk page. The number of vertices |V | is in the interval [6,301,1,140,149], and the number of edges |E| in [20,777,7,833,140].
5. The fifth category is composed of the 31 classical (easy) ISCAS89 benchmark instances which are from digital sequential circuits (Brglez, Bryan & Kozminski, 1989). These instances have s* in their names, where the number of vertices is in the range of [3, 1,728], and the number of edges in the range of [4,32,774]. These instances, whose optima are known, are largely tested in the literature on the FVSP (Levy & Low, 1988;Lin & Jou, 1999;Orenstein, Kohavi & Pomeranz, 1995).

Experiment settings
The IDTS algorithm is programmed in C++ and compiled by GNU g++ 4.1.2 with the -O3 flag. Experiments are carried out on a computer with an Intel(R) Core(TM)2 Duo CPU T7700 2.4 GHz processor with 2 GB RAM running Ubuntu CentOS Linux release 7.9.2009 (Core).

Parameters
The IDTS algorithm requires five parameters: the maximum non-improving iteration depth x of local search, the tabu tenure tt, the first perturbation strength coefficient b 1 , the second perturbation strength coefficient b 2 and the thresholding coefficient d. To tune these parameters, the "IRACE" package (López-Ibáñez et al., 2016) was adopted to automatically recognize a group of appropriate values for eight representative instances (with 50-30,000 vertices), and its budget was set to 200 runs under a cutoff time described in "Stopping Conditions". Table 1 presents both considered values and final tuned values of these parameters.
These parameter values can be considered to form the default setting of the IDTS algorithm and were consistently used for our experiments to ensure a meaningful comparative study. By fine-tuning some parameters on an instance-by-instance basis, it would be possible to obtain better results.

Reference algorithms
Three state-of-the-art FVSP algorithms are adopted as reference methods to evaluate the IDTS algorithm for directed graphs.
Among them, the codes of BPD were kindly provided by its author, and were run by us under the same experimental conditions as for the IDTS algorithm for a fair comparison. We also carefully re-implemented the Red+SA algorithm (Galinier, Lemamou & Bouzidi, 2013), since its codes are unavailable. We used the re-implemented Red+SA algorithm (Re-Red+SA) to solve the instances of categories two to fifth and cited the results in Galinier, Lemamou & Bouzidi (2013) for the first category. Galinier, Lemamou & Bouzidi (2013) used a computer (Intel(R) Core(TM)) 2 CPU T8300 2.4 GHz with 2 GB of RAM, which is comparable to our Intel computer running at 2.40 GHz.

Stopping conditions
Cutoff time of each run. Reference algorithms BPD (Zhou, 2016) and SA (Galinier, Lemamou & Bouzidi, 2013) have different stopping conditions. Thus, we adopted these average computation times as the cutoff times for our IDTS algorithm for fairness. Following Galinier, Lemamou & Bouzidi (2013), for the instances of the first category, the cutoff time is set to 0.03 to 0.07 s for n ¼ 50, 0.06 to 0.34 s for n ¼ 100, 1.8 to 5.2 s for n ¼ 500, 11 to 25.5 s for n = 1,000. For the second and third categories, the cutoff time is set to 1,200 s. For the fourth category, the cutoff time is set to 6,000 s for all compared algorithms. For the easy fifth category, the cutoff time is set to 15 s.
Relaxed test. The SA algorithm (the Red+SA algorithm without the reduction procedure) (Galinier, Lemamou & Bouzidi, 2013), was run 1,000 times on each instance. Under this condition, it reported the currently best objective values for the benchmark  (2013), we also ran IDTS 1,000 times on each instance of the first category under the same stopping conditions.

Comparison with state-of-the-art results
Comparison of the results on the first-category instances Table 2 displays the results of the Red+SA, BPD, and IDTS algorithms on the commonlyused 40 instances of the first category in the literature. The first three columns reveal the name, the number of vertices and the number of edges of each instance. Columns 4-7 provide the results of the Red+SA on each instance: the best objective value (Best) over 30 independent runs, the worst result (Worst), the average result (Avg), and the cutoff time (in seconds). Columns 8-15 report the results of the the BPD and IDTS algorithm: the best, worst, average objective values and the average computation time (in seconds) to obtain the best result (tðsÞ). The last two columns (D 1 and D 2 ) indicate the difference between our best results (Best) and those of Red+SA and BPD (a negative value indicates an improved result). The row "p-value" is given to verify the statistical significance of the comparison between IDTS and the reference algorithms, which came from the non-parametric Friedman test applied to the best, worst and average values of IDTS and reference algorithms. A p-value less than 0.05 indicates a statistically significant difference. Moreover, the rows #Better, #Equal, and #Worse indicate the number of instances for which Red+SA and BPD obtained a better, equal, and worse result compared to the IDTS algorithm for each performance indicator. The bold entries highlight the dominating results between the compared algorithms in terms of Best, Worst and Avg values.
We notice from Table 2 that IDTS performs satisfactorily and dominates the Red+SA algorithm by obtaining better results (Best) for 10 instances (see negative entries in column D 1 ) and equally-good results for the rest 30 instances. IDTS also gets better results in terms of the worst and average results. As for BPD, in terms of the best results, IDTS obtains 16 better (see negative entries in column D 2 ) and 24 equal values; in terms of the worst and average results, IDTS obtains better values for all instances. The small p-values (<0.05) confirm the statistical significance of the reported differences between IDTS and the reference algorithms.
In Galinier, Lemamou & Bouzidi (2013), SA (i.e., the Red+SA algorithm without the reduction procedure) reported several improved results over 1,000 runs compared to the results of Red+SA in Table 2. Similarly, the IDTS algorithm was run 1,000 times, and the comparative results of SA and IDTS are shown in Table 3, where the last column (D) shows the difference between the best results of IDTS (Best) and those of SA (a negative value indicates a better result). It reveals that IDTS further improves the results of SA and discovers six record-breaking results (indicated in bold) for the instances P500-2000, P500-2500, P1000-3000, P1000-3500, P1000-4000 and P1000-5000. Table 4 shows the comparative results between IDTS and the reference algorithms on the 10 instances of the second-category. In terms of the best results, IDTS dominates Re-Red +SA by obtaining better values for all instances, and BPD by obtaining 7 better, and 3 equal    results. On the other hand, IDTS significantly outperforms Re-Red+SA and BPD in terms of the worst and average results by obtaining better or equal results for all instances (except for R3000-15000). The small p-values (<0.05) indicate that there are significant differences between our best results and those of the two reference algorithms Re-Red+SA (p-value = 1.60E−3) and BPD (p-value = 8.20E−03). Furthermore, Fig. 10 summarizes the performance of the IDTS algorithm with that of the Re-Red+SA and BPD algorithms on these instances. Figure 10A presents the relationship between the number of vertices and the best FVS size (the best objective value over 30 runs). Figure 10B shows the relationship between the number of vertices and the average computation time. One observes that the FVS size increases linearly while the average computation time increases exponentially with the increase of the number of vertices.

Comparison of the results on the second-category instances
Comparison of the results on the third-category instances Table 5 presents the comparative results of IDTS with the reference algorithms Re-Red+SA and BPD for the instances of the third category. As shown in Table 5, IDTS outperforms Re-Red+SA by obtaining better results for all instances in terms of the best, worst and average results. Compared with BPD, IDTS obtains seven better, two equal, and one worse values in terms of the best results; seven better, one equal, and two worse values in terms of the worst results; six better, one equal, and three worse values in terms of the average results. Finally, the p-values smaller than 0.05 indicate IDTS significantly dominates each reference algorithm in terms of the best results.

Comparison of the results on the fourth-category instances
The comparative results of IDTS and the reference algorithms Re-Red+SA and BPD on the fourth category are summarized in Table 6. It can be seen that IDTS outperforms the reference algorithms for the instances of the fourth-category. Compared with Re-Red+SA, IDTS obtains nine better and one equal results in terms of the best results, and better worst and average values for all instances. Compared with BPD, IDTS obtains five better, four equal, and one worse values in terms of the best results; four better, one equal and five worse values in terms of the worst and average results. The p-value of 2.70E−03 between   IDTS and Re-Red+SA in terms of the best results indicate that there are significant differences between their results.
Results on the ISCAS89 benchmark instances Table 7 shows the results of IDTS on the classical ISCAS89 benchmark instances. The instances with known optimal values were solved exactly by the branch and bound algorithm (H8WR) combined with eight reduction operations (Lin & Jou, 1999) and indicated by asterisks ( Ã ). It can be observed that IDTS can easily reach the optimal solutions for these instances, while Re-Red+SA and BPD miss the two optimal solutions indicated in boldface.

Comparative results on undirected graphs
To make our IDTS algorithm applicable to undirected graphs, we modified the neighborhood condition that the number of conflicts equals 0 (as described in "Preliminaries") to the constraint that for any vertex v 2 p, there is at most one neighbor vertex u 2 p of v in front of v. For our comparative study, we carefully re-implemented the SALS algorithm (Qin & Zhou, 2014) as its codes are unavailable. We regenerated 20 instances of the same characteristics using the generation method of Qin & Zhou (2014). These instances have ER* or RR* in their names, where the number of vertices is 100,000, and the number of edges is in the range of [100,000, 1,000,000]. The cutoff time is set to 6,000 s and both algorithms were run 30 times per instance. Table 8 displays the results of the SALS and IDTS algorithms on the 20 regenerated instances. Columns 1-3 show the name, the number of vertices and the number of edges of each instance. Columns 4-11 respectively provide the results of the SALS and the IDTS on each instance: the best objective value (Best) over 30 independent runs, the worst result (Worst), the average result (Avg), and the average computation time (in seconds) to obtain the best result (tðsÞ). The last column (D) indicates the differences between our best results (Best) and those of SALS (a negative value indicates an improved result). The row "pvalue" is given to verify the statistical significance of the comparison between IDTS and the reference algorithm, which came from the non-parametric Friedman test applied to the best, worst and average values of the two compared algorithms.
Moreover, the rows #Better, #Equal, and #Worse indicate the number of instances for which SALS obtained a better, equal, and worse result compared with the IDTS algorithm for each performance indicator. The bold entries highlight the dominating results between the compared algorithms in terms of the Best, Worst and Avg values.
The results indicate that our algorithm dominates the SALS algorithm (Qin & Zhou, 2014) by obtaining 19 better and one equal value in terms of the best, worst and average results. The small p-values (< 0.05) indicate that there are significant differences between our results and those of the reference algorithm SALS. This experiment demonstrates that the proposed algorithm is not only competitive for directed graphs, but performs very well for the undirected case of the problem as well.

ANALYSIS
This section conducts extra tests to analyze the advantages of two important components of the proposed IDTS algorithm: the thresholding coefficient and the perturbation strategy.
Effects of the thresholding coefficient IDTS adopts the thresholding strategy illustrated in "The Dynamic Thresholding Search Stage" to search both equivalent and better solutions. The oscillation between equivalent and better zones follows the increasing/decreasing of the thresholding coefficient d ð! 1Þ with an adjustment value. Thus, we analyze the effects of the thresholding coefficient by testing five candidate adjustment values: 1, 2, 3, 4, 5 for the instances with jVj 1,000 and 5, 10, 15, 20, 25 for the instances with jVj . 1,000 (the higher the value, the larger the oscillation between equivalent and better solutions). Figure 11 shows the Box and whisker plots of the results on eight representative instances with different number of vertices. Where the X-axis refers to the tested adjustment values and the Y-axis stands for the best objective values obtained. As a complement, we also calculate the p-values for each tested instance. Results are from 20 independent runs of each instance with a cutoff time as described in "Stopping Conditions" per run. We observe that the adjustment values affect the performance of IDTS algorithm greatly for most instances except two instances (P500-7000 and p2p-Gnutella25). Moreover, then IDTS algorithm with the adjustment value 1 performs the  best on instances with a number of vertices (50 jVj 100), with the adjustment value 4 on instances with a number of vertices (500 jVj 1,000), with the adjustment value 10 on instances with a number of vertices (1; 000 , jVj 3,000), and with the adjustment value 20 on instances with a number of vertices (jVj . 3,000). Finally, it is noted that the results of this experiment are consistent with the intuitive understanding that the higher the adjustment value, the more frequent the oscillation of the search between current configuration and new configuration. That is, large instances require large adjustment values to explore more new areas, while small instances require small adjustment values to fully explore each search area.

Effects of the perturbation operation
To evaluate the perturbation strategy of the proposed algorithm, we create two algorithmic variants (IDTS1 and IDTS2) where the perturbation strategy visits only feasible solutions. For IDTS, the perturbation first drops b 1 Â jpj vertices with the highest move frequency, and then applies both DROP and INSERT moves to the next b 2 Â jpj most frequently displaced vertices. For IDTS1, the perturbation strategy is disabled (i.e., by removing the line 7 in Algorithm 1). For IDTS2, the perturbation strategy only adopts the DROP move (by disabling lines 8-10 in Algorithm 6). A total of 20 relatively difficult instances are selected as per the results provided in Tables 2 and 3, that is, their best results could not be achieved by all algorithms. We ran IDTS, IDTS1 and IDTS2 10 times to solve each selected instance under the same stopping conditions as before. Table 9 displays the experimental results. The rows #Better, #Equal, and #Worse show the number of instances for which IDTS1 and IDTS2 achieved a better, equal, or worse result than the IDTS algorithm for each performance indicator.
Even though both IDTS and IDTS1 obtain 10 equal results, the former can achieve 10 better results (against 0 for IDTS1). The small p-values (<0.05) in terms of Best and Avg confirm that the reported differences between IDTS and IDTS1 were statistically significant. This experiment proves that the perturbation strategy adopted is an important way of diversification that makes the algorithm able to better explore the search space. Both IDTS and IDTS2 obtain 11 equal results while the former achieves nine better results than the latter. The small p-value (<0.05) indicates that IDTS is better than IDTS2. The above indicates that adopting DROP and INSERT operations in the perturbation procedure can enable the algorithm to reach a better performance.

CONCLUSIONS
An efficient stochastic local search algorithm IDTS was proposed to find the minimum set of feedback vertices in graphs. It begins with a low-complexity greedy initialization procedure, and alternates between a thresholding search stage and a descent stage. The IDTS algorithm has two innovative components, the solution-accepting strategy used in the thresholding search stage and the frequency-guided strategy in its perturbation procedure. The thresholding search stage involves an adjustable thresholding parameter d that controls the search behavior and algorithm performance. Since fine-adjusting this parameter for a given problem instance can bring better solutions, it will be meaningful to study self-adaptive mechanisms to automatically adjust this parameter during the search.
Experimental evaluations on 101 diverse graphs proved the dominance of IDTS over the state-of-the-art SA (Galinier, Lemamou & Bouzidi, 2013) and BPD (Zhou, 2016) algorithms. Particularly, it discovered 24 new best-known results (improved upper bounds), and reached the best-known or known optimal results of 75 other graphs. We also applied our algorithm to the case of undirected graph of the problem and showed its competitiveness against the SALS algorithm (Qin & Zhou, 2014). Besides, we conducted experiments to understand how each ingredient of IDTS (the thresholding and the short term learning-based perturbation) contributes to the algorithm performance.
Finally, it will be of interest to study the proposed framework for other critical vertex problems, such as the critical node detection (Béczi & Gaskó, 2021) and finding the nodes with the highest betweenness-centrality scores (Mirakyan, 2021).