The follower optimality cuts for mixed integer linear bilevel programming problems

We study linear bilevel programming problems, where (some of) the leader and the follower variables are restricted to be integer. A discussion on the relationships between the optimistic and the pessimistic setting is presented, providing necessary and sufficient conditions for them to be equivalent. A new class of inequalities, the follower optimality cuts, is introduced. They are used to derive a single-level non-compact reformulation of a bilevel problem, both for the optimistic and for the pessimistic case. The same is done for a family of known inequalities, the no-good cuts, and a polyhedral comparison of the related formulations is carried out. Finally, for both the optimistic and the pessimistic approach, we present a branch-and-cut algorithm and discuss computational results.


Introduction
The bilevel programming framework corresponds to a hierarchical decision-making process, where two players are involved: an upper level player, or leader, and a lower level one, or follower. Each one controls some of the variables, has some constraints to satisfy and an objective function to optimize. The follower decisions depend on the choices made by the leader (and vice versa), resulting in an optimization problem with a nested structure. This hierarchical framework naturally applies to many real-life problems, where it commonly happens that several stakeholders, with possibly conflicting interests, are involved in the decision process. Bilevel aspects are also hidden in single-level programming; the Benders decomposition procedure (Benders 1962) derives from a bilevel interpretation of a single-level problem; some separation problems can be formulated as bilevel problems (Lodi et al. 2014;Mattia 2012Mattia , 2013; the idea of worst case realization in robust optimization (Ben-Tal et al. 2004;Bertsimas and Sim 2003) corresponds to a nested optimization problem, as in bilevel programming. The mixed integer linear bilevel programming problem LBP that we B Sara Mattia sara.mattia@iasi.cnr.it 1 Consiglio Nazionale delle Ricerche, Istituto di Analisi dei Sistemi ed Informatica "Antonio Ruberti", Via dei Taurini 19, 00185 Rome, Italy consider in this paper is reported below.
Vector x ∈ R n x contains the leader variables and vector y ∈ R n y the follower ones. Sets N and F include the indices of the leader and the follower variables, respectively, that are restricted to take integer values. Matrices A ∈ R m l ×n x , L ∈ R m l ×n y and vector b ∈ R m l define the leader constraints. Matrices G ∈ R m f ×n x , H ∈ R m f ×n y and vector r ∈ R m f correspond to the follower ones. Both the leader and the follower variables can appear in both the leader and the follower constraints. Vector ζ ∈ R n y represents, for a given x chosen by the leader, the corresponding follower solution, if one exists. This solution must be optimal for the lower level problem for x, or follower problem, below.
FOLL(x) min f T y Hy ≥ r − Gx The right-hand sides of the follower problem depend on the current values of the leader variables, that are constant values, as far as the follower is concerned. If FOLL(x) admits an optimal solution, opt(x) is the corresponding optimal value, and Z (x) is the set of the optimal solutions to which the selected ζ must belong. Otherwise, if FOLL(x) is infeasible or unbounded, that is, Z (x) = ∅, leader solution x cannot be completed by a suitable follower solution and must be discarded. If Z (x) contains more than one solution, either the leader or the follower must operate a selection, based on some criteria or policies, which define set Y (x). That is, the bilevel problem becomes, indeed, a three level problem, where we have the leader level, the policy level and the follower level (Zeng 2020).
Two policies are considered for breaking ties when, for some x, |Z (x)| > 1. In an optimistic setting, it is assumed that ties are broken in favor of the leader, that is, ζ must be chosen among the solutions Y (x) = Y o (x) that minimize d T y and have the property that (x, y) satisfies the leader constraints. In a pessimistic setting, it is supposed that ties are broken against the leader, that is, ζ must have the property that (x, ζ ) does not satisfy the leader constraints or, if no such ζ exists, it must be chosen among the solutions Y (x) = Y p (x) that maximize d T y (Lozano and Smith 2017). Hence, the pairs (x, y) with y ∈ Z (x) that do not satisfy the leader constraints have consequences that depend on the policy. In an optimistic setting, such a pair authorizes the leader to force the follower into discarding y; in a pessimistic setting, it allows the follower to forbid leader solution x. In general, when |Z (x)| > 1 for some x, the optimistic and the pessimistic policy may lead to different solutions, even when f = 0 (see Sect. 2.4). However, we derived necessary and sufficient conditions for the optimistic and the pessimistic setting to be equivalent (see Sect. 2.3). These conditions are satisfied by a well-known class of bilevel programming problems, the interdiction ones (Mattia 2021;Smith and Song 2020;Tang et al. 2016). The same happens for bilevel models coming from the singlelevel problems in Ben-Tal et al. (2004); Benders (1962); Lodi et al. (2014); Mattia (2012Mattia ( , 2013 mentioned above. Another aspect to be considered is who breaks the ties, that is, if the policy level is managed by the leader or by the follower. One can assume that the follower has complete information on the leader objective function and constraints and that ties are broken directly by the follower (Lozano and Smith 2017).
If the follower has no such information, ties must be broken by the leader (Bard and Moore 1990). In the optimistic setting, ties can be broken in two different ways: ex ante, using a feasibility argument, that is, eliminating the solutions not respecting the policy (Y (x) = Y o (x)); ex post, using an optimality argument, that is, without eliminating in advance the solutions not respecting the policy (Y (x) = Z (x)), because all solutions not belonging to Y o (x) are guaranteed to be either infeasible or suboptimal anyway (see Sect. 2.2). The final outcome is independent of when ties are broken, if optimistic LBP admits an optimal solution, whereas it may be infeasible or unbounded, depending on the choice that has been made for breaking ties (see Sect. 5.1). It is not possible to break ties ex post, if the pessimistic policy is considered.
Below, we review some literature (Sect. 1.1) and discuss the contribution of the present manuscript (Sect. 1.2).

Literature review
Bilevel programming problems are very hard, both theoretically and computationally. Even when the leader and the follower variables are continuous (N = F = ∅), LBP is hard to solve in both the optimistic (Jeroslov 1985) and the pessimistic setting (Wiesemann et al. 2013). If the problem has upper-level continuous variables and lower-level integer ones (N = ∅, |F| = n y ), the optimum may not even be attained (Bard and Moore 1990;Köppe et al. 2010). In the optimistic case, if the lower level problem is a linear programming one (F = ∅), it is possible to use linear duality to reduce LBP to a compact single-level (quadratically constrained or mixed integer) problem (Wen and Yang 1990). This is not possible, in general, for follower problems with some integer variables (|F| > 0), where linear duality cannot be used, and for pessimistic problems, which are significantly more complicated than their optimistic counterparts (Xu and Wang 2014). For general results on bilevel programming and additional references, we address the reader to surveys (Colson et al. 2007;Dempe 2003;Liu et al. 2018;Vincente and Calamai 1994) and books (Bard 1998;Dempe 2002). Due to the nature of LBP, it is difficult to devise a general solution algorithm and several different approaches have been proposed. The common idea behind these approaches is to use a branch-and-cut algorithm, which starts from a suitable single-level relaxation of the bilevel problem and eliminates integer solutions that are not feasible from a bilevel perspective, by user-defined cutting planes or branching rules.
For single-level problems, the branch-and-cut approach (Padberg and Rinaldi 1991) is a well-known solution framework, which performs a branch-and-bound-type exploration of the solution space (Wolsey 1998) and possibly adds valid inequalities at each node of the branch-and-bound tree. The linear relaxation provides a valid bound on the solution of the integer problem. The purpose of the inequalities is to cut solutions of the linear relaxation of the current problem that do not belong to the integer hull. It may happen that a class of cuts is not able to separate a fractional vertex from the integer polyhedron, whereas another class is able to do that. There is a rich literature that investigates the relative strength of different valid inequalities and compares alternative formulations for the same integer problem.
In a bilevel context, most of these notions cannot be applied. For LBP, the problem obtained by relaxing the integrality restrictions on the variables does not provide a lower bound on the optimal value (Bard and Moore 1990). The only known single-level relaxation of LBP is the high point one (Bard and Moore 1990), that is obtained by removing the assumption that the follower solution must be optimal for FOLL(x). When the high point relaxation is unbounded, it does not mean that the bilevel problem is necessarily unbounded as well (see Sect. 5.1). Bilevel cuts are used only on integer solutions; their purpose is not to strengthen the current formulation eliminating fractional vertices, but to remove integer solutions that are not feasible for the bilevel problem. If an integer solution is infeasible, any class of bilevel cuts is supposed to eliminate it. Algorithms for bilevel problems either branch on fractional solutions or rely on standard single-level cuts for fractional vertices not belonging to the integer hull of the high point relaxation (DeNegre and Ralphs 2009;Fischetti et al. 2017). No theoretical comparison of bilevel cuts and of the corresponding formulations is given, so far, in the literature.
For the optimistic setting, the first approach for integer variables is outlined in Bard and Moore (1990). In DeNegre and Ralphs (2009), a branch-and-cut algorithm using no-good cuts to enforce bilevel feasibility in purely integer problems is proposed. In Xu and Wang (2014), the authors solve problems where G is integer and develop an algorithm where bilevel infeasible integer solutions are eliminated by suitable branching rules. An enhanced branching rule is discussed in Liu et al. (2021). In Lozano and Smith (2017), a branch-and-cut approach based on the value function is presented; it requires indicator variables and the integrality of Gx to enforce the bilevel feasibility of the produced solutions. In Wang and Xu (2017), the authors introduce an algorithm for problems with integer parameters and variables; it is based on the notion of scoop and uses a dedicated branching approach to eliminate bilevel infeasible points. In Fischetti et al. (2017), the authors define bilevel free sets, obtained by solving scoop problems, and use them within an algorithm based on intersection cuts, where some results require the integrality of Gx + Hy − r. Different cut categories are presented in Tahernejad et al. (2020). See (Kleinert et al. 2021) for a survey. On the contrary, very few algorithms have been implemented for the pessimistic setting, which received less attention than the optimistic policy. Moreover, most of the papers either focus on one setting or on the other, while the contributions that consider both policies and investigate the relationship between the two settings are very limited. No conditions for the two policies to be equivalent when |Z (x)| > 1 for some x have been proposed so far in the literature. For theoretical studies on the pessimistic problem, see (Dempe et al. 2014) and other papers of the same authors. While there exists a natural way to impose that an optimistic solution of a bilevel problem is selected (see Sect. 2.2), the same result does not apply to the pessimistic setting. This makes the pessimistic version of the problem harder to solve than the optimistic one (Xu and Wang 2014). Apart from Lozano and Smith (2017), none of the algorithms mentioned above considers the pessimistic case. An alternative approach for pessimistic LBP is proposed in Zeng (2020).

The contribution
The aim of the paper is to study bilevel mixed integer programming problems, both in the optimistic and in the pessimistic version. We consider problems where integer variables can appear in both the leader and the follower problem and, hence, the problem cannot be reformulated as a compact single level one by using linear duality. The relationship between the optimistic and the pessimistic setting is investigated; necessary and sufficient conditions to identify cases where the two policies are equivalent, for problems with |Z (x)| > 1 for some x, are derived (see Sect. 2.3). We also show that, although possibly counterintuitive, having f = 0 does not ensure that the policy can be neglected or that the bilevel problem can be transformed into a single-level problem for both policies (see Sect. 2.4). As far as we know, this is the first example of such analysis in the literature.
For both policies, a non-compact single-level reformulation of LBP based on a new family of inequalities, the follower optimality cuts, is presented (see Sects. 3.2 and 4.2). These inequalities are bigM constraints obtained by binarizing the general integer leader variables (Roy 2007), if any (see Sect. 3.1). Binarization has known disadvantages, due to the increase in the number of variables. However, it was proved that it may lead to stronger theoretical properties and to significantly smaller search trees, with respect to integer approaches (Bonami and Margot 2015;Dash et al. 2018 Both the no-good inequalities and the follower optimality cuts define single-level reformulations of the bilevel problem in the original space, if the variables are binary. A theoretical comparison of the corresponding formulations is presented, thus investigating the relative strength of the cuts. We prove that, in general, neither the follower optimality cuts dominate the no-good cuts nor the opposite, but the no-good cuts are dominated by the cover inequalities of the follower optimality cuts (see Sect. 3.4). To our knowledge, this is the first theoretical comparison of bilevel cuts and formulations for integer problems in the bilevel literature. The cuts used in other approaches do not define a single-level reformulation of a bilevel problem in the original space. The inequalities in Lozano and Smith (2017) require indicator variables, while a known issue of the intersection cuts in Fischetti et al. (2017) is that they are not able to eliminate integer infeasible points on the frontier of the bilevel free set, unless some additional requirements are satisfied.
Finally, we describe a branch-and-cut algorithm to solve the problem (see Sect. 5). Since the cuts we use define singlelevel reformulations of the bilevel problem, we can adopt a standard branch-and-cut scheme that requires neither dedicated branching rules (Wang and Xu 2017;Xu and Wang 2014) nor branching on integer solutions (Bard and Moore 1990;Fischetti et al. 2017) nor accessing the current basis (DeNegre and Ralphs 2009;Fischetti et al. 2017) nor any assumptions on the integrality of the coefficients (DeNegre and Ralphs 2009; Lozano and Smith 2017;Fischetti et al. 2017). We also show that under some assumptions, the follower optimality cuts can be used to separate some fractional solutions as well, thus providing the first example of bilevel cuts able to do that (see Sect. 5.2). A computational testing is conducted to evaluate the proposed algorithm, using instances from the literature. The results demonstrate that the algorithm based on the follower optimality cuts outperforms the other approaches on most of the tested sets of instances (see Sects. 6.4 and 6.5).
Differently from the no-good cuts, the follower optimality cuts can be used also when the follower variables are continuous. In fact, similar inequalities are well-known in the context of stochastic programming for problems where x is binary and y continuous (Laporte and Louveaux 1993). They have also been used for optimistic bilevel problems with binary leader variables and continuous follower ones arising from electricity market applications (see Kleinert and Schmidt 2019 and other papers of the same authors). The results in Kleinert and Schmidt (2019) show that even for problems with purely continuous follower variables, where one could have chosen the compact reformulation via linear duality (Wen and Yang 1990), a branch-and-cut approach based on inequalities of this type has to be preferred.
In Sect. 2, we introduce some notation, make the necessary assumptions and investigate the relationship between the optimistic and the pessimistic setting. In Sects. 3 and 4, we define single-level reformulations for the optimistic and the pessimistic problem and theoretically compare the follower optimality cuts and the no-good ones. In Sect. 5, we outline the algorithm to solve LBP. In Sect. 6, we discuss the computational experience. Conclusions are given in Sect. 7.

The optimistic versus the pessimistic policy
The relevant notation that is used throughout the paper is summarized in Table 1. For a matrix T, T j denotes its j-th column and T i its i-th row. Let P ={(x, y) : Ax + Ly ≥ b, Gx + Hy ≥ r, be the set of the leader/follower pairs satisfying both the leader and the follower constraints. The single-level problem obtained optimizing the leader objective function over P is known as the high point relaxation (HPR) of LBP (Bard and Moore 1990).

HPR min
The relationship between HPR and LBP is discussed in Sect. 5.1. Let P x = {x : ∃y such that (x, y) ∈ P} be the projection of P in the space of the x variables. Denote by P(x) = {y : Hy ≥ r − Gx, y i integer for i ∈ F} the feasible region of the follower problem for a given leader solution x.

Assumptions
We assume that all the variables have finite bounds or, equivalently, that P and P(x) for any x are bounded sets. This ensures that the follower problem is not unbounded and that Z (x) is a bounded set for any x (see Sect. 2.2); excludes that HPR, which is used as initial formulation of our branch-andcut approach, is unbounded; guarantees that we can binarize the general integer variables (see Sect. 3.1). What happens when this assumption is relaxed is discussed in Sect. 5.1. We also suppose that all the leader variables are integer, that is, |N | = n x . This eliminates the possibility to have an optimal value that cannot be attained, as in the examples in Bard and Moore (1990); Köppe et al. (2010). The proposed approach can handle the presence of some continuous leader variables, under Assumption 1 in Fischetti et al. (2017), which states that none of the continuous leader variables appears in the follower constraints. For the pessimistic case, we also require that no leader constraint contains both The follower problem for x, the optimal solutions and the optimal value Sect. 1 P, P x pairs (x, y) Satisfying the leader and the follower constraints, its projection in the x space Sect. 2

P(x)
The feasible region of the follower problem Sect. 2 u x , x , u y , y Upper and lower bounds on the leader and follower variables Sect. 3.1 follower variables and continuous leader variables. Indeed, under these assumptions the continuous leader variables are of no relevance in the leader/follower relation, which is only affected by the integer ones. Formulations and algorithms based on the follower optimality cuts do not require any specific assumption on the follower variables that can be either integer (|F| = n y ) or mixed integer (0 < |F| < n y ) or continuous (F = ∅). However, we do not consider the latter in the experiments. On the contrary, the no-good cuts require the integrality of all the follower variables, that is, |F| = n y .

Bilevel feasible solutions
Let B be the set of the pairs (x, y) ∈ P such that y is lower level optimal for x.
Belonging to B is, in general, a necessary but not sufficient condition for a pair (x, y) to be a feasible solution of LBP. Indeed, as we discuss below, extra conditions are needed to ensure that the policy (optimistic or pessimistic) is enforced, if |Z (x)| > 1 for some x. Denote by the set of the optimal solutions y of FOLL(x) such that (x, y) is feasible for the leader constraints. Let be the solutions y ∈ Z (x) with the property that (x, y) does not satisfy Ax + Ly ≥ b. Z I (x) and Z F(x) are a partition of Z (x). Since B ⊆ P, any pair in B must satisfy both the leader and the follower constraints. Then, pairs (x, y) with y ∈ Z F(x) belong to B, whereas the ones with y ∈ Z I (x) do not belong to B. Define by Y o (x) and Y p (x) the following sets.
The set of the feasible solutions of the optimistic problem is defined as follows: Definition 1 Let Ω be the set of the feasible solutions of optimistic LBP. Any (x, y) ∈ Ω must satisfy: The one for the pessimistic problem can be defined as below.
Definition 2 Let Π be the set of the feasible solutions of pessimistic LBP. Any (x, y) ∈ Π must satisfy: (x, y) ∈ B; Z I (x) = ∅; y ∈ Y p (x).
For the optimistic case, a non-empty Z F(x) ensures that there is at least one feasible pair (x, y) ∈ Ω for the considered leader vector x. This is not necessarily the case for the pessimistic setting. In fact, even when Z F(x) = ∅ and, hence, Y p (x) = ∅, no pair (x, y) for the considered x can belong to Π , if Z I (x) is non-empty. The presence of an optimal solution y of FOLL(x) with the property that (x, y) does not satisfy the leader constraints, completely forbids the choice of that x vector (Lozano and Smith 2017).
In the optimistic setting, we can find a solution in Ω by optimizing the leader objective function over B, without the need to consider Y o (x), as we prove below.
Theorem 1 Optimistic LBP can be rephrased as Proof Although B may include integer pairs (x, y) not belonging to Ω, those solutions are never optimal. Suppose that this is not true, that is, problem min (x,y)∈B c T x+ d T y admits an optimal solution not respecting the optimistic policy. Let (x, y) ∈ B be an optimal solution of the problem such that y ∈ Z (x) \ Y o (x) and let w be a follower solution in Y o (x). Then, d T w < d T y and, hence, (x, y) cannot be optimal, as (x, w) has a smaller objective value.
The approaches for the optimistic setting in the literature, optimize over B and not over Ω and all bilevel feasibility cuts for the optimistic problem only eliminate pairs not belonging to B, whereas they are not able to cut away solutions in B \Ω. This is true for the follower optimality cuts and for the algorithm for the optimistic setting presented in this paper, as well. Moreover, separating over B is difficult, as it amounts to solve the follower problem. Note that if optimistic LBP does not admit an optimal solution, optimizing over B or over Ω may lead to different outcomes. This may be the case, if we relax the assumption that the y variables have finite bounds (see Sect. 5.1).
If Y o (x) = ∅ for any x, because the problem of minimizing d T y over Z F(x) is unbounded, then Ω is empty and LBP is infeasible. Instead, the problem in Theorem 1 is unbounded. Optimizing over B corresponds to a situation where ties are broken ex post (see Sect. 1). In most of the literature on optimistic LBP, B is regarded as the set of the bilevel feasible solutions (DeNegre and Ralphs 2009; Fischetti et al. 2017;Xu and Wang 2014). It is easy to see that Theorem 1 does not apply to the pessimistic case, where ties cannot be broken ex post.

When the policy does not matter
In general, the sets of the optimal solutions for the optimistic and the pessimistic problem (Ω and Π ) may be different when |Z (x)| > 1 for some x. Now, we provide necessary and sufficient conditions to have Ω = Π , that is, we identify cases where the policy does not matter, although the follower problem may have multiple optimal solutions.
Proof Suppose that Π = B and let (x, y) be a pair in B\Π .
Since (x, y) ∈ B, then y ∈ Z F(x) and, since (x, y) / ∈ Π , The former implies that there is no pair in Π for the given x, whereas there exists at least one pair for x in Ω, as Z F(x) = ∅. The latter means that any pair (x, y p ) ∈ Π has the property that d T y p ≥ d T w. On the contrary, any pair (x, Then, by definition, there exists (x, y) ∈ B such that it either belongs only to Ω or only to Π . Therefore, either Π = B, and the proof is complete, or Ω = B. Assume that Ω = B and let (x, y) be a pair in B\Ω. Since (x, y) ∈ B, then Z F(x) = ∅ and, since (x, y) / ∈ Ω, there exists at least another solution w ∈ Z F(x) such that d T w < d T y. As y is worse than w from the leader perspective, in the pessimistic setting the follower will prefer y over w. Hence, there exists at least one pair, (x, w), that belongs to B but not to Π . It follows that Π = B. Now, we derive conditions to guarantee that Π = B.

Definition 3 LBP is insensitive to the leader objective function if, for any
have the same leader objective value.
A sufficient, but not necessary, condition for being insensitive to the leader objective function is that the leader and the follower objective coefficients can be obtained from one another by applying some scaling factor.

Theorem 3 Ω = B if and only if LBP is insensitive to the leader objective function.
Proof If LBP is insensitive to the leader objective function, all the solutions in Z F(x) minimize d T y. It follows that for any (x, y) ∈ B, y ∈ Y o (x) and, hence, B = Ω. Suppose now that LBP is not insensitive to the leader objective function, that is, there exist x, y, w such that y, w ∈ Z F(x) and d T y > d T w. Therefore, (x, y) ∈ B \ Ω.
Unfortunately, being LBP insensitive to the leader objective function does not ensure that Π = B. In fact, even if the condition is satisfied, there may exist y ∈ Z I (x) that makes x infeasible and, as a result, Π = B.

Definition 4 LBP is independent of the leader constraints if, for any
This implies that Z I (x) = ∅ for all x such that there exists y with the property that (x, y) ∈ B. A sufficient, but not necessary, condition for LBP to be independent of the leader constraints is that L = 0, that is, the leader constraints do not depend on the follower variables.

Theorem 4 Π = Ω if and only if LBP is independent of the leader constraints and insensitive to the leader objective function.
Proof Suppose that the problem is not independent of the leader constraints. Then, there exists x such that both Z I (x) and Z F(x) are non-empty. It follows that no pair including x belongs to Π , whereas B contains at least one pair for x, because Z F(x) = ∅. Hence, Π = B and, by Theorem 2, Π = Ω. Suppose now that LBP is independent of the leader constraints, but not insensitive to the leader objective function. Then, there exist x, y, w such that y, w ∈ Z F(x) and d T y < d T w. It follows that (x, y) ∈ B\Π and, hence, by Theorem 2, Π = Ω. Assume now that LBP is independent of the leader constraints and insensitive to the leader objective function. Therefore, Z I (x) = ∅ for the leader solutions . Hence, Π = B and, by Theorem 2, Ω = Π .

When the policy does matter
Contrary to what one may think, f = 0 does not imply that the policy can be ignored. If f = 0, then Z (x) = P(x) and optimistic LBP reduces to optimize the leader objective function over P, that is, to solve HPR. In fact, for a given x, any solution y ∈ P(x) is equivalently good for the follower and ties are broken according to the leader preferences. Then, the leader can choose both x and the corresponding y ∈ P(x) at the same time, while the follower plays no role. On the contrary, in the pessimistic case, we still have to solve a bilevel problem. Indeed, the follower solutions in P(x) are equivalent for the follower, but not for the leader, who cannot pick the preferred one. However, if LBP is independent of the leader constraints, then pessimistic LBP can be solved by solving an optimistic bilevel problem.
Theorem 5 If f = 0 and LBP is independent of the leader constraints, then pessimistic LBP is equivalent to the optimistic LBP where f is replaced by −d.
Proof Assumption f = 0 implies that Z (x) = P(x). The independence of the leader constraints guarantees that, for any x ∈ P x , Z I (x) = ∅. In fact, if x ∈ P x , there exists y ∈ P(x) with the property that (x, y) ∈ P, that is, (x, y) satisfies the leader constraints. Since P(x) = Z (x), then y ∈ Z F(x) = ∅ and, as a consequence, Z I (x) = ∅. The pessimistic setting requires to compute a follower solution in Y p (x), that is, a solution maximizing d T y over P(x). Consider now the optimistic bilevel problem where f = 0 is replaced by f = −d. Its follower problem computes the solutions minimizing f T y, that is, maximizing d T y, over P(x).
Hence, any optimistic problem with f = −d that is also independent of the leader constraints can be seen as an equivalent optimistic reformulation of a corresponding pessimistic problem with f = 0.

Single-level reformulations for optimistic LBP
Here, we present valid inequalities that describe B, thus leading to a (possibly non-compact) single-level reformulation of optimistic LBP according to Theorem 1. A description of B can be obtained by adding to P some valid inequalities to eliminate integer pairs (x, y) ∈ P \ B. We consider two alternative sets of inequalities that reach this same purpose: the optimistic follower optimality cut (see Sect. 3.2) and our version of the no-good cuts introduced in DeNegre and Ralphs (2009)

Binarization
To avoid additional assumptions on the problem parameters, we binarize some of the general integer variables of the problem, if any. Let x and u x be lower and upper bounds on the leader variables. Both for the formulation that uses the follower optimality cuts and for the one that uses our version of the no-good cuts, for each i ∈ N , auxiliary variables are added to the problem, together with the following constraints.
That is, variable x k i is the k-th binary variable in the binarization scheme for original integer variable x i . Since x i may take values in { x i , . . . , u x i } and the integer value is computed from the binarized variables using the first equation, then u x i − x i binary variables are needed to represent all the possible values that In this way, x is expressed as the sum of its lower bound and a set of auxiliary binary variables, while the precedence constraints are used to reduce symmetry. This particular binarization scheme (Roy 2007) is proved to have better theoretical properties than other schemes (Bonami and Margot 2015;Dash et al. 2018). In the following, when we say that x takes integer value s, since the binary variables are set to one in lexicographic order by constraints resents the index of the last binary variable x k i taking value 1 when x i =x i . The follower optimality cuts are valid inequalities including the original y variables and the artificial variables Our version of the no-good cuts needs that the general integer y variables are binarized as well, and they are inequalities including the artificial vari- introduced to binarize the follower variables y, with y and u y being lower and upper bounds on the follower variables.

Optimistic follower optimality (OFO) cuts
Consider an integer solution (s, η) ∈ P, where η is not optimal for FOLL(s). Such a pair, which belongs to P but not to B, violates the optimistic follower optimality cut (1) for s.
Definition 5 Let s ∈ P x , the corresponding follower optimality (OFO) cut is the inequality below.
M O is a sufficiently large value, whose purpose is to guarantee that the constraint is automatically satisfied for integer x = s (see Sect. 5.5). Note that FOLL(s) is feasible (and bounded) for any s ∈ P x , as this implies that there exists y such that (s, y) ∈ P, that is, P(s) = ∅.
Consider now the model that optimizes the leader objective function and whose feasible region oSLF is obtained by adding to P an OFO cut for every s ∈ P x . The description of oSLF is non-compact, as it may include an exponential number of OFO inequalities. Let B s ⊆ B be the set including the Theorem 6 The OFO inequalities are valid for B ⊇ Ω and, according to Theorem 1, optimistic LBP can be solved by the (possibly non-compact) single-level problem below.
Proof If x = s, the OFO inequality reduces to f T y ≤ opt(s), which is valid for B s , as it forces to choose a follower solution y, whose follower objective cost is optimal for the given leader choice s. For integer x = s, the bigM part makes the inequality automatically satisfied and, hence, valid for B. In the binarized setting, to ensure that x i = s i for some i, we must ensure that either x i is decreased by at least one (at least x l1 s i i becomes 0), or x i is increased by at least one (at least x l1 s i +1 i becomes 1). Since oSLF includes an OFO inequality for any leader solution in P x , according to the reformulation of optimistic LBP in Theorem 1, we can solve optimistic LBP by optimizing the leader objective function over oSLF.

Binarized no-good cuts
Suppose now that the follower variables are integer. An integer pair (s, η) ∈ P with η not optimal for FOLL(s) can be eliminated by the corresponding no-good cut (2).

Definition 6 For an integer pair
The original definition of the no-good cuts for general integer variables requires x and y to satisfy a slack condition (see step 2 of Algorithm 1 in DeNegre and Ralphs 2009), which makes them valid only for problems with purely integer parameters (constraints coefficients and right-hand sides). The version with the binarized variables (2) that we propose extends their application to problems where some parameters are not integer. The original inequalities can be simplified when the variables are binary and, in that case, the definition in DeNegre and Ralphs (2009) coincides with the one we give here. Let oSLG be the set obtained by adding to P a no-good inequality for any (s, η) ∈ P \ B. These inequalities can be, in general, exponentially many. Using the same argument of Theorem 6, it is easy to see that the binarized no-good cuts are valid for B ⊇ Ω and that according to the reformulation of optimistic LBP in Theorem 1, optimistic LBP can be solved by the (possibly non-compact) single-level problem below.

A comparison between oSLF and oSLG
Trivially, the no-good cuts eliminate a single infeasible solution (x, y) at a time, whereas the follower optimality cuts remove all the solutions (x, y) such that y / ∈ Z (x) at the same time. Hence, oSLG has, in general, more constraints than oSLF. We now compare oSLF and oSLG from a polyhedral perspective, thus investigating the relative strength of the follower optimality and of the no-good inequalities. The results we derive are independent of binarization, as all the theorems refer to binary problems, where no binarization is needed. We assume that the reader is familiar with the definition of formulation for an integer problem and with the criteria to investigate different formulations for the same problem (Wolsey 1998). In what follows, we use symbols PoSLF and PoSLG to denote the polyhedra corresponding to the sets oSLF and oSLG defined above, when the integrality restrictions on the variables are removed.
Theorem 7 Neither PoSLF is stronger than PoSLG nor the opposite holds.
Proof Consider the bilevel programming problem below.
while the no-good cuts associated with solutions not optimal for the follower problem when x = 0 are reported below.
If the variables are binary, any OFO inequality is a binary knapsack. A binary knapsack is an inequality where v are binary variables. Sets C + ⊆ {i : q i > 0} and C − ⊆ {i : q i < 0} such that i∈C + q i > t − i:q i <0,i / ∈C − q i define a cover. The corresponding cover inequality i∈C + v i − i∈C − v i ≤ |C + | − 1 is valid for the convex hull of the integer feasible solutions of the knapsack polyhedron (Balas 1975). For an OFO constraint, a cover inequality has the form i∈X + In fact, as soon as a variable in {i : s i = 1} is not selected or a variable in {i : s i = 0} is selected, then the knapsack is automatically satisfied by the bigM part and no cover can exist. Hence, any cover must have X + = {i : s i = 1} and X − = ∅. Consider set oSLFcov, that includes, for each OFO inequality, also its cover inequalities and denote by PoSLFcov the corresponding polyhedron, when the integrality restrictions on the variables are removed.

Theorem 8 PoSLFcov is stronger than both PoSLF and PoSLG.
Proof Being PoSLFcov better than PoSLF follows from the results on the knapsack problem. We focus here on PoSLFcov being better than PoSLG. We first prove that there is no fractional solution satisfying the inequalities in PoSLFcov but not the no-good cuts in PoSLG, that is, PoSLFcov is at least as good as PoSLG. Then we provide an example where PoSLFcov is strictly better than PoSLG, completing the proof. Suppose that it is not true that PoSLFcov is at least as good as PoSLG, that is, assume that there exists a fractional solution (w, z) that satisfies the inequalities in PoSLFcov, but it violates the no-good cut for a pair (s, η). This means that Then, the elements of the left-hand side of the inequality above can be rewritten as follows: (i) i: is a cover of the OFO cut for s, the corresponding cover inequality is violated by (w, z), contradicting the assumption we made and confirming that PoSLFcov is at least as good as PoSLG. If the no-good inequality is valid, then (s, η) / ∈ B, which means that f T η > opt(s). Since i:η i =0 f i η i = 0, then it must hold that i: Hence, X + , X − , Y + , Y − is a cover whose inequality is violated. Therefore, there exists no solution that satisfies the cover inequalities but not the no-good cuts. On the contrary, there may exist solutions satisfying the no-good cuts but not the cover inequalities. A counterexample is the following. Consider again the problem in the proof of Theorem 7. Solution (0, ν) = (0, (1, 1/2, 1/2, 1)) satisfies the no-good cuts, but it violates the cover inequality y 1 + y 2 ≤ 1 for the OFO cut for x = 0 corresponding to sets X − = Y − = X + = ∅, Y + = {1, 2}.

Strengthening the optimistic cuts
OFO and no-good cuts can be strengthened by eliminating some variables from them. To do that, we must prove that the considered inequality remains valid for B, independently of the value of the eliminated variables. In Theorem 9, we provide conditions guaranteeing that some constraints can be strengthened. All the conditions are independent of binarization.

Theorem 9
The following results hold: A simple case where ∃z ∈ Z (x) ∩ P(μ) is when P(x) ⊆ P(μ).

Definition 7 A leader variable
A variable that is either positively or negatively redundant is said coherent.
When we increase a positively redundant variable or decrease a negatively redundant one, the feasible region of the follower problem enlarges. Hence, all solutions μ obtained from x increasing a positively redundant variable or decreasing a negatively redundant one have the property that P(x) ⊆ P(μ). We must prove that f T y ≥ opt(s) is valid for B μ for any μ obtained from s by arbitrarily setting the values for the removed variables. The removal allows to arbitrarily increase the positively redundant variables or decrease the negatively redundant ones. Since the increase (decrease) in a positively (negatively) redundant leader variable enlarges the feasible region, that is, P(s) ⊆ P(μ), it follows that the strengthening is correct.
The above strengthening assumes that P(s) ⊆ P(μ) for any considered μ and, hence, that Z (s) ⊆ P(μ). However, we only need that one optimal solution y ∈ Z (s) belongs to P(μ), to obtain a correct strengthening. In fact, if y ∈ P(μ) for any μ, then opt(μ) ≤ f T y = opt(s).

Definition 8 A leader variable x i is singular if it does not share follower constraints with other leader variables.
Given an integer vector z, let z i+ be the vector having z i+ j = z j for i = j and z i+ i = z i + 1. In the same way, z i− is the vector with z i− j = z j for j = i and z i− i = z i − 1. Let s be a leader solution and let y be an optimal solution of FOLL(s).

Theorem 11
The following strengthening is correct for the OFO cuts: if x i is singular, s i = u x i − 1 and y ∈ P(s i+ ), then remove x l1 s i +1 i ; if x i is singular, s i = x i + 1 and y ∈ P(s i− ), then remove x l1 s i i . Proof We must prove that for any leader solution μ obtained from s arbitrarily increasing (decreasing) x i for which x l1 s i +1 i (x l1 s i i ) has been removed, it holds that y ∈ P(μ) and, hence, opt(μ) ≤ opt(s). Let k be a follower constraint with a nonzero coefficient for singular variable x i . If k is satisfied by (s i+ , y) and x l1 s i +1 i was removed (by (s i− , y) and x l1 s i i was removed), then it remains satisfied for any modifications of the other leader variables, which do not appear in the constraint. Hence, if y ∈ P(s i+ ) (y ∈ P(s i− )), it also belongs to P(μ).
For the no-good cuts, we can also remove some y variables.

Theorem 12
The following strengthening is correct for the no-good cuts: if f i ≥ 0, then remove y Proof If f i ≥ 0, for any solution ν obtained by increasing y i with respect to η i , we have that f T ν ≥ f T η. The same happens when f i ≤ 0 and y i is decreased. Therefore, if η is not optimal for FOLL(s), ν is not optimal as well.

Single-level reformulations of pessimistic LBP
We now define a single-level non-compact reformulation of the pessimistic problem by adding to P some valid inequalities that eliminate the pairs (x, y) ∈ P \Π . Since the property outlined in Theorem 1 does not hold anymore, we must exclude from P: Pairs (x, y) where y / ∈ Z (x) can be eliminated using the same cuts applied in the optimistic framework (optimistic follower optimality or no-good cuts). If a leader solution x has the property that Z I (x) = ∅, then no (x, y) is feasible for the pessimistic problem. Those solutions are eliminated by the restricted no-good inequalities (see Sect. 4.1). Pairs (x, y), where Z I (x) = ∅ but y ∈ Z F(x)\Y p (x), are treated using either the pessimistic follower optimality cuts, introduced here for the first time (see Sect. 4.2), or the no-good cuts, used here for the first time in a pessimistic setting (see Sect. 4.3). We assume that the binarization technique described in Sect. 3.1 is applied to the x variables and, when the no-good cuts are used, also to the y variables that must be restricted to be integer.

Restricted no-good inequalities
Let (s, η) ∈ B be an integer pair such that η ∈ Z (s), but Z I (s) = ∅. Then, the current leader solution can be eliminated by inequality (3).
Definition 9 Let s ∈ P x be an integer leader solution having Z I (s) = ∅, then the corresponding restricted no-good cut is the inequality below.
Restricted no-good cuts are trivially valid for Π , as they impose to modify at least one entry, with respect to an infeasible solution s, for an integer vector x to be feasible. To verify if Z I (s) = ∅, one can solve the problem below, where ϕ i is a binary variable that takes value 1 when leader constraint i can be violated and 0 otherwise, while δ i is the corresponding amount of violation.

FEAS(s) max
M F i is a sufficiently large value, whose purpose is to guarantee that the corresponding constraint is automatically satisfied when ϕ i = 0 (see Sect. 5.5). Set Φ includes the indices of the leader constraints and δ ∈ R |Φ| . Constraints (4) and (5) guarantee that y ∈ Z (s). When ϕ i = 1, the bigM part of constraint (6) by (s, y) and Z I (s) = ∅. If ϕ i = 0, then δ i = 0 by constraints (7), while constraint (6) is deactivated by the bigM part. If FOLL(s) is feasible, then FEAS(s) is feasible either, as any solution y ∈ Z (s) can be completed by δ = ϕ = 0. Moreover, FEAS(s) is not unbounded, as all the variables in the objective function are bounded from above.

Pessimistic follower optimality (PFO) cuts
Let (s, η) be an integer pair in B, such that Z I (s) = ∅, but η / ∈ Y p (s). Checking if y ∈ Y p (s), amounts to solve the worst case problem below.

WORST(s) max d T y
Hy ≥ r − Gs Since P(s) is bounded, problem WORST(s) cannot be unbounded. Since η ∈ Z (s), WORST(s) cannot be infeasible. Moreover, since Z I (s) = ∅, (s, y) ∈ Z F(s) for any feasible y. Denote by optw(s) the optimal value of WORST(s). If d T η < optw(s), then η / ∈ Y p (s) and solution (s, η) can be eliminated by inequality (8).
Definition 10 Let s ∈ P x be an integer leader solution such that Z I (s) = ∅, then the pessimistic follower optimality (PFO) cut corresponding to s is the inequality below.
M P is a sufficiently large value, whose purpose is to guarantee that the constraint is automatically satisfied for integer solutions x = s (see Sect. 5.5).
Consider the problem obtained by optimizing the leader objective function over the set pSLF, built by adding to P an OFO cut for every integer x ∈ P x ; a restricted no-good cut for every integer x ∈ P x having Z I (x) = ∅; a PFO cut for every integer x ∈ P x with Z I (x) = ∅.

Theorem 13 The PFO inequalities are valid for Π and pessimistic LBP can be solved by the (possibly non-compact) single-level problem below.
min Proof When x = s, the PFO inequality reduces to d T y ≥ optw(s), which forces to choose a solution y ∈ Y p (s). When there exists i ∈ N such that x i = s i , then the PFO cut is deactivated by the bigM part. The OFO inequalities are valid for B ⊇ Π by Theorem 6 and the restricted no-good cuts are also trivially valid. Consider an integer pair (x, y) ∈ P. If y / ∈ Z (x), then (x, y) is eliminated by an OFO cut and it does not belong to pSLF. If y ∈ Z (x) but Z I (x) = ∅, then (x, y) is excluded by a restricted no-good cut. If y ∈ Z (x), Z I (x) = ∅ and y / ∈ Y p (x), then (x, y) is cut by a PFO inequality. Hence, any remaining pair in P that has not been cut either by an OFO or by a PFO or by a restricted no-good inequality belongs to Π . It follows that pSLF corresponds to a (possibly noncompact) single-level reformulation of pessimistic LBP.

Pessimistic no-good cuts
Although the no-good inequalities have been used, so far, only in the optimistic setting, they are general purpose cuts that eliminate a single solution, independently of the reason of the infeasibility. Therefore, they can be used in the pessimistic framework as well. If (s, η) is an integer pair in B with Z I (x) = ∅ and η / ∈ Y p (s), it can be cut away by inequality (9). (s, η) ∈ B be an integer pair with Z I (x) = ∅ and η / ∈ Y p (s), then the corresponding no-good cut is the inequality below.

Definition 11 Let
We call optimistic no-good cuts the inequalities (2) generated to eliminate (x, y) / ∈ B and pessimistic no-good cuts, the ones (9) that are added to forbid (x, y) with y / ∈ Y p (x). Using the same argument in the proof of Theorem 13, it is not difficult to see that the pessimistic no-good cuts are valid for Π . If optimistic, pessimistic and restricted no-good inequalities are added to P for any (x, y) / ∈ Π , we obtain set pSLG, which has the property that pessimistic LBP can be solved by solving the (possibly non-compact) single-level problem below.
It is easy to see that the relationship between the PFO cuts and the pessimistic no-good inequalities is the same we described in the optimistic setting for the OFO and the optimistic nogood cuts (see Sect. 3.4).

Strengthening the pessimistic cuts
Strengthening the pessimistic cuts is much harder than strengthening the optimistic inequalities. Below, we illustrate some conditions that may allow to strengthen the pessimistic cuts.

Theorem 14
The following results hold: The conditions above do not automatically translate into strengthening procedures, as there is no easy way to detect when there exists y ∈ Y p (x) ∩ Z (μ). However, when f = 0, it follows that P(x) = Z (x) for any x ∈ P x . Therefore, a sufficient condition is P(x) ⊆ P(μ). Under this assumption and using Definitions 7 and 8 in Sect. 3.5, we could derive results similar to the ones for the optimistic case. We omit a detailed description, because their practical application is limited by assumption f = 0 (see also Sect. 2.4).

The branch-and-cut algorithm
We can now define a branch-and-cut algorithm to solve optimistic and pessimistic LBP. Depending on the cuts that are used (follower optimality or no-good) and on the setting (optimistic or pessimistic), we can define four versions of the algorithm: Fo: optimistic setting with OFO cuts; Go: optimistic setting with no-good cuts; Fp: pessimistic setting with restricted no-good, OFO and FPO cuts; Gp: pessimistic setting with restricted, optimistic and pessimistic no-good cuts.
We assume that a reader is familiar with the branch-andbound algorithm, which is a popular technique for solving integer programming problems (Wolsey 1998). Branch-andcut (Padberg and Rinaldi 1991) is an improved branch-andbound exploration of the solution space, which starts from a suitable initial formulation of the problem and where each node of the tree is processed according to a user-defined policy that adds to the problem user-defined cuts to eliminate some unwanted solutions. The initial formulation we use is the HPR problem (see Sect. 2), and we illustrate in Sect. 5.1 which is the relation between HPR and LBP and what happens when the assumption that the variables have finite bounds is removed. Our policy to process the nodes and generate cuts (or separation) is described in Sect. 5.2, Sect. 5.3 and summarized in Algorithms 1, 3 for the optimistic and the pessimistic problem, respectively.
Since the cuts we use define a single-level reformulation of LBP, we do not need dedicated branching rules, branching on integer solutions or accessing the current basis, as in Bard and Moore (1990); DeNegre and Ralphs (2009); Fischetti et al. (2017); Wang and Xu (2017); Xu and Wang (2014). This leads to a simpler algorithm that can easily be integrated with the standard behavior of a solver. In the following, branch simply means applying any general purpose branching rule. The binarization procedure described in Sect. 3.1 is applied to the x variables, for Fo and Fp, and also to the y variables, for Go and Gp. Also note that binarization is not required in the problems that are solved in the separation phase.

The high point relaxation
The HPR problem consists of minimizing the leader objective function over P (see Sect. 2). Hence, it can be modeled as follows.
HPR min c T x + d T y Ax + Ly ≥ b Gx + Hy ≥ r If G = 0, optimistic LBP is not a bilevel problem anymore, as it can be solved by first solving the follower problem (whose feasible region is independent of x) to compute the corresponding optimal value k and, then, solving HPR with the additional constraint f T y ≤ k. This is not true, in general, for the pessimistic case. Indeed, if L = 0 and the problem is reformulated in a single-level fashion as described above, is not guaranteed that the produced optimal x corresponds to an empty Z I (x), as it is required for a pessimistic solution to be feasible. However, a similar approach can be adopted for the pessimistic case as well, if G = 0 and L = 0, that is, not only is the feasible region of the follower problem independent of the leader variables, but also is the feasible region of the leader problem independent of the follower variables.
Since Π ⊆ B ⊆ P and Ω ⊆ B ⊆ P, if HPR is infeasible, then LBP is trivially infeasible either. Neither HPR nor LBP can be unbounded, if all the variables have finite bounds. However, LBP can be infeasible when HPR is not. This happens when, for any x such that P(x) = ∅, Z (x) = Z I (x) and, in the pessimistic setting, also when Z I (x) = ∅ for any x such that P(x) = ∅, even if Z F(x) is not empty either.
If we relax the assumption that the variables have finite bounds, the situation becomes much more complicated. Suppose to remove the bounds on the y variables, but to keep them on the x ones. As a consequence, HPR, LBP, FOLL(x) and WORST(x) may become unbounded and we have the possibilities illustrated in Theorem 15, which generalizes the results in Xu and Wang (2014). Note that the pessimistic problem can never be unbounded. In fact, as we prove below, if the problem of maximizing d T x over Z F(x) is unbounded, then Y p (x) = ∅ for any x and LBP is infeasible. The same happens to optimistic LBP, if ties are broken ax ante (see Sect. 1) and the problem of minimizing Proof The initial assumptions exclude cases where LBP is infeasible, independently of the removal of the bounds on the y variables.
1. Suppose that Lγ ≥ 0 and d T γ < 0 for some γ ∈ R. For any x, y λ with (x, y) ∈ P and λ > 0, vector w = y+λγ is such that (x, w) belongs to P and value d T w can become arbitrarily negative. Hence, HPR is unbounded. On the contrary, if d T γ ≥ 0 any time that Lγ ≥ 0 and Hγ ≥ 0, this cannot happen. 2. Suppose that f T γ < 0 for some γ ∈ R. For any x, y, λ with y ∈ P(x) and λ > 0, vector w = y + λγ belongs to P(x) as well. If so, f T w can become arbitrarily negative and FOLL(x) is unbounded for any x with P(x) = ∅. As a consequence, Z (x) = ∅ and LBP is infeasible in both the optimistic and the pessimistic case. If ties are broken ex ante, we are optimizing over Ω. Assume that ∃γ ∈ R such that f T γ = 0, Lγ ≥ 0 and d T γ < 0. Hence, for any x, y, λ such that y ∈ Z F(x) and λ > 0, vector w = y+λγ has the property that w ∈ Z F(x) and d T w can become arbitrarily negative. It follows that the problem of minimizing d T y over Z (x) is unbounded, Y o (x) = ∅ for any x, Ω = ∅ and optimistic LBP is infeasible. If neither of the two possibilities verifies, then it is easy to see that the problem admits a finite optimum. In fact, if f T γ > 0 for all γ ∈ R, then the follower problem cannot be unbounded and Z (x) is always a bounded set. Since LBP cannot be unbounded and we assumed that there exists at least one x with Z F(x) = ∅, then LBP admits a finite optimum. 3. For the proof of optimistic LBP being infeasible when ∃γ ∈ R with f T γ < 0, see point 2. If ties are broken ex post, we are optimizing over B. It f T γ > 0 for all γ ∈ R, the follower problem cannot be unbounded. Since we assumed that Z F(x) = ∅ for at least one x, then optimistic LBP cannot be infeasible. Assume that f T γ ≥ 0 for all γ ∈ R and that ∃γ ∈ R such that f T γ = 0, Lγ ≥ 0 and d T γ < 0. Hence, for any x, y, λ such that y ∈ Z F(x) and λ > 0, vector w = y + λγ has the property that w ∈ Z F(x), (x, y) ∈ B and d T w can become arbitrarily negative. It follow that optimistic LBP is unbounded. If f T γ > 0 for all γ ∈ R and f T γ = 0 for all γ ∈ R such that Lγ ≥ 0 and d T γ < 0, then w does not belong to Z F(x) for any λ and the problem cannot be unbounded. 4. If f T γ < 0 for some γ ∈ R, then pessimistic LBP is infeasible (see point 2). Suppose that ∃γ ∈ R such that f T γ = 0, Lγ ≥ 0 and d T γ > 0. Hence, for any x, y, λ such that Z I (x) = ∅, y ∈ Z F(x) and λ > 0, vector w = y + λγ has the property that w ∈ Z F(x) and d T w can become arbitrarily large. It follows that Y p (x) = ∅ for any x, Π = ∅ and, hence, pessimistic LBP is infeasible. If neither of the two possibilities verifies, then it is easy to see that the problem admits a finite optimum.
It follows from Theorem 15 that, when HPR is unbounded, LBP can be infeasible or it can admit a finite optimum in both the optimistic and the pessimistic case. The same happens when HPR admits a finite optimum. It ties are broken ex post, optimistic LBP can also be unbounded, when HPR is unbounded. In what follows, we come back to the original assumption that all the variables have finite bounds (see Sect. 2.1).
Algorithm 1 Processing a node of the branch-and-bound tree for Fo and Go 1: solve the linear relaxation of the problem at the node and let (x, y) be the current solution 2: if c T x + d T y > U B then 3: prune the node and exit 4: else 5: if (x, y) is not integer then 6: branch and exit 7: else 8: solve FOLL(x), let y * be a solution in Z (x) 9: if f T y = f T y * then 10: exit and return feasible solution (x, y) 11: else 12: add the OFO cut for x (or the no-good inequality for (x, y)) 13: go back to step 1

The optimistic case
The procedure for processing a node of the branch-and-bound tree in the optimistic case is the following. The linear relaxation of the current problem is solved. Let (x, y) be the computed solution and assume that the node has not been pruned, that is, (x, y) has a better value than the best upper bound UB known for the problem so far. If (x, y) is fractional, we branch. Otherwise, the solution is processed by the following separation phase. We solve FOLL(x) to test if (x, y) belongs to B. The follower problem can never be infeasible at this stage, as P(x) = ∅ for all (x, y) ∈ P. If f T y > opt(x), then (x, y) is eliminated by adding to the current formulation the corresponding OFO (or no-good) inequality. After that the new optimal solution of the linear relaxation is computed and processed. If there is no violated inequality, (x, y) belongs to B and, hence, it is feasible for the optimistic problem, according to Theorem 1. This means that the current node can be closed and that the current solution can be used to update the best upper bound, if the case. This procedure is summarized in Algorithm 1. Before they are added to the current formulation, the generated cuts are strengthened according to the criteria defined in Sect. 3.5. Note that every time that the separation phase is used, a feasible solution is also generated. In fact, either (x, y) ∈ B or (x, y * ) ∈ B, where y * is the computed optimal solution of FOLL(x).
Sometimes it is useful to process the fractional solutions as well, instead of just branching on them. When a fractional solution does not belong to the integer hull of the current formulation, it can be eliminated by a general purpose cut for mixed integer problems (e.g., a Chvatal-Gomory cut). The follower optimality cuts can also be used on some fractional vertices. Indeed, they only require the integrality of the current x to define FOLL(x) and the violated inequality, if any, while the y variables can be fractional, as only value f T y matters. Then, the follower optimality cuts can be used any time that the current x is integer, even if y is not. Assume now that x is fractional. Under some assumptions, we can define an equivalent integer leader solution μ with the property that if f T y > opt(μ), then the OFO inequality for μ also eliminates fractional vertex (x, y). Suppose that any fractional component of x corresponds to a binary coherent variable (see Definition 7 in Sect. 3.5).
Theorem 16 Let μ be the integer leader solution obtained by rounding x i to 1, if G i ≤ 0, and to 0, if G i ≥ 0. If violated by (μ, y), the OFO inequality for μ is also violated by (x, y) as well.
Proof When we strengthen the OFO inequality for μ according to Theorem 10, it does not contain any leader variable x i that was fractional in solution x. Then, if the inequality is violated by (μ, y), it is violated by (x, y).
The improved separation algorithm that uses the OFO inequalities also on some fractional solutions, as described above, is summarized in Algorithm 2.
Algorithm 2 Processing a node of the branch-and-bound tree for Fo: the improved procedure for binary coherent x solve the linear relaxation of the problem at the node and let (x, y) be the current solution if c T x + d T y > U B then prune the node and exit else if x is not integer then let μ be the solution obtained by rounding x i to 1 if G i ≤ 0 and to 0 if G i ≥ 0 solve f oll(μ) and let w be a solution in Z (μ) if f T y > f T w then add the OFO cut for μ go back to step 1 else branch and exit else solve FOLL(x) and let y * be a solution in Z (x) if f T y > f T y * then add the OFO cut for x go back to step 1 else if y is integer then exit and return feasible solution (x, y) else branch and exit Algorithm 3 Processing a node of the branch-and-bound tree for Fp and Gp solve the linear relaxation of the problem at the node and let (x, y) be the current solution if c T x + d T y > U B then prune the node and exit else if (x, y) is not integer then branch and exit else solve FOLL(x) and let y * be a solution in Z (x) if f T y = f T y * then solve FEAS(x) and let v be the optimal value if v > 0 then add the restricted no-good inequality for x go back to step 1 else solve WORST(x), let w be a solution in Y p (x) if d T y = d T w then exit and return feasible solution (x, y) else add the PFO cut for x (or the pessimistic no-good cut for (x, y)) go back step 1 else add the OFO cut for x (or the optimistic no-good inequality for (x, y)) go back to step 1

The pessimistic case
The policy for processing a node of the branch-and-bound tree in the pessimistic case, is the following. The linear relaxation of the problem corresponding to the current node is solved. Let (x, y) be the computed solution and assume that the node has not been pruned and that the solution is integer, otherwise we branch. Then, (x, y) is processed by the following separation phase. We solve FOLL(x) to verify if (x, y) ∈ B and, if this is not the case, we add to the problem an OFO cut or an optimistic no-good inequality. If (x, y) ∈ B, we must check if Z I (x) = ∅, by solving FEAS(x). If there exists a violated restricted no-good inequality, it is added to the formulation. If no restricted no-good cut is generated, we solve WORST(x), to test if y ∈ Y p (x). If d T y = optw(x), then y ∈ Y p (x), that is, the current solution is feasible for the pessimistic problem. The node can be closed and, if the case, (x, y) can be used to update UB. Otherwise, a PFO or a pessimistic no-good inequality is added to the formulation. The procedure described above is summarized in Algorithm 3. Each time that separation is used, either a feasible solution is produced ((x, y) or (x, w), with w being the optimal solution of WORST(x)) or it is proved that x cannot be completed by any suitable follower solution and must be discarded (a restricted no-good inequality is added).
The result in Theorem 16 holds for the restricted no-good inequalities and the PFO cuts as well. However, it requires some more strict assumptions for the pessimistic policy. For this reason, this improved separation scheme is not used in the experiments for the pessimistic setting.

Preprocessing
Before solving the problem, some variables can be fixed according to the result below, which generalizes the one in Fischetti et al. (2017).

Proof
The proof for f i = 0 can be found in Fischetti et al. (2017). If L i = 0, then y i does not appear in the leader constraints. If f i = 0 and H i ≥ 0 (H i ≤ 0), we can increase (decrease) the value of y i , without affecting the optimality or the feasibility of the follower problem. In a pessimistic perspective, the follower is supposed to choose the worst possible solution from the leader point of view, among the optimal ones. Then, if d i ≥ 0 (d i ≤ 0), the follower is assumed to choose the largest (smallest) possible value for y i . Hence, we can fix y i to its upper (lower) bound. In the optimistic problem, the follower is assumed to choose the best solution for the leader, then we can set y i to the upper (lower) bound if d i ≤ 0 (d i ≥ 0).
To reduce the size of the search tree, it is useful to derive a globally valid upper bound K on the optimal follower value. A simple way to compute such an upper bound is to maximize f T y over {(x, y) : Gx + Hy ≥ r : x i integer for i ∈ N , y i integer for i ∈ F}. Suppose now that all the leader variables are coherent and let x be the leader solution where    Proof When opt j is computed, the variables in Δ − ∪ Γ take the worst possible value, while the others do not change. Then, for any solution where the variables in Δ − ∪ Γ take any value while the others do not change, the optimal solution of the follower problem is no larger than opt j and the OFO cut is valid if this value is used as bigM. If some variable i > j changes, the cut is deactivated by M O si .

The experiments
We implemented and tested, on a set of benchmark instances, the algorithms defined in Sect. 5.

The instances
The optimistic test-bed includes 390 instances, belonging to the following sets of known instances from the literature: S1o: Fifty instances with integer leader and follower variables, originally introduced in DeNegre (2011) and downloaded from Fischetti et al. (2019); S2o: Hundred problems with integer leader variables and mixed integer follower variables, originally introduced in Xu and Wang (2014) and provided by the authors of Lozano and Smith (2017); S3o: Sixty large instances with integer leader variables and mixed integer follower variables, originally introduced in Fischetti et al. (2017) and downloaded from Fischetti et al. (2019); S4o: Hundred instances with integer leader and follower variables, originally derived in Wang and Xu (2017) from the ones of S2o; S5o: Eighty clique interdiction instances, originally introduced in Tang et al. (2016) and downloaded from Tang et al. (2019).
A more detailed description of these instances can be found in the above mentioned papers. Most of the above instances cannot be used for the pessimistic problem, as the optimistic and the pessimistic framework produce the same solution, because either they have L = 0 and f = −d, which are sufficient conditions for Theorem 4 to hold (S5o) or because the follower problem admits a unique solution in most of the cases (Lozano and Smith 2017) (S1o, S2o, S3o, S4o). Therefore, the pessimistic test-bed includes the 263 instances below: S1p: The (three) instances of S1o where the optimistic and the pessimistic framework have different solutions; S2p: Hundred instances with integer leader variables and mixed integer follower ones, originally introduced and provided by the authors of Lozano and Smith (2017); S3p: Hundred instances with integer leader variables and mixed integer follower ones, originally introduced and provided by the authors of Lozano and Smith (2017); S4p: Sixty large instances with integer leader variables and mixed integer follower variables, obtained from S3o by randomly setting f i to 1 or 0 with probability α = 0.15 and 1 − α, respectively.
For the instances in S2o, S3o, S2p, S3p and S4p, the number of the follower variables is equal to the number of the leader variables and 50% of the follower variables are required to be integer. The number of leader and follower constraints is 0.4 times the number of the leader variables. Instances in S4o follow the same scheme, but all the follower variables are required to be integer. For the instances in S2o, S4o, S2p, S3p the number of the leader variables ranges from 10 to 460, while for the problems in S3o and S4p it goes from 500 to 1000. Since S2o and S3o contain continuous lower level variables, they cannot be solved by Go, which requires all the follower variables to be integer. For the same reason, Gp can be used only on S1p.

The implementation details
The experiments run on a 2 cores Intel i7-6600U @ 2.6 GHz with 20GB RAM. The algorithms are implemented in C/cpp under Linux, using CPLEX 12.6 with the following setting: time limit of 3600 s; no probing, cuts and heuristics. For the instances of S1o, whose bounds on the variables are quite poor, we recomputed them in preprocessing for any variable to be binarized. The improved version for the optimistic method illustrated in Algorithm 2 is used only on S5o.
We compare our results with the ones reported in the literature for the algorithms in Fischetti et al. (2017); Lozano and Smith (2017); Wang and Xu (2017); Xu and Wang (2014), denoted by A1, A2, A3, and A4, respectively. A1 is tested on a cluster consisting of Intel XEON E52670v2 @ 2.5 GHz with 12GB RAM, using cplex 12.8 and multi-thread (4 threads); the results for A2 and A4 were obtained on a 2 cores Intel i7-3537U @ 2 GHz with 8GB RAM, using cplex 12.6 within an algorithm coded in Java under Windows; A3 is implemented in MATLAB using TOMLAB/CPLEX on a desktop computer with 3GB RAM and a 2.4 GHz CPU. For a comparison the machines, see (Passmark Software: http://www. cpubenchmark.net), a website that produces scores to evaluate the CPU performances.

General statistics
General statistics on the computational behavior of Fo and Fp can be found in Figs. 1, 2 and 3. These figures report how the running time, the separation time and the number of branch-and-bound nodes generated by Fo and Fp change, when the size of the instances change. Data are obtained by running Fo and Fp on the 60 instances of S4p. The x axis reports the number of the leader variables in the considered instances. The number of the follower variables and of the constraints can be derived as illustrated in Sect. 6.1. The plot corresponding to each x value refers to the average value on 10 instances of the same size. In Figs. 1 and 2, the y-axis reports times in seconds. In Fig. 3, the y-axis corresponds to number of generated branch-and-bound nodes.
The solution time doubles at every increasing in the number of the leader (and follower) variables by 200, both in the optimistic and in the pessimistic setting. The separation time is quite limited with respect to the solution time for the optimistic setting, whereas the pessimistic setting requires, in general, larger separation times (see also Sect. 6.5). The number of branch-and-bound nodes is less dependent on the size of the instances, presenting a similar trend for the optimistic and the pessimistic setting. The pessimistic setting requires, on average, about six times the computational effort needed to solve the optimistic version of the problem on the same instances.

Comparison with other approaches on the optimistic test-bed
We discuss the results of Fo, Go, A1, A2, A3, and A4 on the instances of the optimistic test-bed defined in Sect. 6.1, which are reported in Tables 2, 3, 4. For each algorithm and instance of S1o, we report the solution time (in seconds), if the problem has been solved to optimality, or symbol, if the problem has not been solved to optimality by the considered approach. For S2o-S5o, each line of the corresponding tables represents a class that contains ten instances of the same size and average solution times (in seconds) are reported. The best results are in bold. The times used in the tables for Fo and Go are overall times and include input, preprocessing and solution time. The times for the other approaches are the ones given in the corresponding papers. The results for A4 are taken from Lozano and Smith (2017), where the algorithm was reimplemented. No total times for A3 are given in Wang and Xu (2017), and we had to derive them from the available information, as discussed in the comment to Table 3. Table 2 presents a comparison between Fo, Go and A1 on S1o. Both Fo and A1 are able to solve all the instances, whereas Go could solve only 48 of the 50 instances in the given time limit. Row avg times or solved instances indi-     cates the average time spent to solve the instances in the set, if all the instances have been solved to optimality, or the number of solved instances, when this is not the case. Row best approach reports the total number of instances where each approach results to be the best one. Fo is the best approach in most cases, in particular on instances 20-20-50-0110-10-10, 20-20-50-0110-15-5 and 20-20-50-0110-15-6, which resulted to be the hardest ones in this set, requiring more time to be solved that any other instance in S1o by all the considered approaches. Most of the instances in this set are easy for all the tested algorithms.
In Table 3, we present the results on S2o obtained by Fo, A1, A2, and A4, as well as the ones on S4o obtained by A3, Go and Fo. Column #lead is the number of the leader variables of the considered group of instances. Times for A3 are obtained by summing the times to find and certify the optimal solution (last two values of each line of Table 5 in Wang and Xu (2017)) and considering, for each class, only the best option among the six versions of the algorithm. For different groups, the best option may correspond to different approaches. For S2o, Fo is the best algorithm on almost all the considered 10 classes, largely outperforming A2 and A4 and being four times faster than A1. For S4o, both Go and Fo outperform A3. Fo is the best algorithm, being about nine times faster than Go, which is about eight times faster than A3. Note that in Wang and Xu (2017), the authors report a comparison between A3 and the algorithms in Bard and Moore (1990); DeNegre and Ralphs (2009); Xu and Wang (2014), showing that A3 outperforms all those algorithms. It follows that Fo outperforms the algorithms in Bard and Moore (1990); Xu and Wang (2014) as well and that Go with the binarized no-good inequalities outperforms the original approach in DeNegre and Ralphs (2009), where the cuts are defined using a slack condition.
In Table 4, we present the results on S3o and S5o obtained by Fo and A1. For S3o, Fo is the best approach, being able to solve all the instances in faster computing times. For the clique interdiction instances in S5o, v is the number of nodes in the graph and d is the density of the edges (the percentage with respect to the total number). See Mattia (2021), Tang et al. (2016) and references therein for more details on the clique interdiction problem. We do not report the results of Go on this set, as it can solve none of the instances. S5o represents the worst set for an algorithm whose cuts are entirely based on the optimal value of the follower problem, like Fo.
In fact, in these problems, the leader objective function (c = 0 and d = −f) drives the y solutions exactly in the opposite direction with respect to the follower optimum, leading to the generation of a large number of pairs not belonging to B. On the contrary, approaches whose cuts and/or branching rules are derived using different strategies, are expected to be more effective on such instances. The experiments confirm this intuition: A1 runs better than Fo, being twice as fast, although Fo outperform the dedicated approach in Tang et al. (2016), where these instances have been originally proposed.

Comparison with other approaches on the pessimistic test-bed
We discuss the results of Fp, Gp and A2 on the instances of the pessimistic test-bed defined in Sect. 6.1. The results are reported in Table 5. For each algorithm and instance of S1p, we report the solution time (in seconds), if the problem has been solved to optimality, or symbol, if the problem has not been solved to optimality by the considered approach. Rows avg times or solved instances and best approach have the same meaning of the corresponding rows in Table 2. For S2p and S3p, each line of the table represents a class that contains ten instances of the same size and average solution times (in seconds) are reported. The best results are in bold. The times used in the table for Fp and Gp are overall times and include input, preprocessing and solution time. The times for A2 are taken from Lozano and Smith (2017). For S1p, Fp performs better than Gp, which is not able to solve all the instances. It also outperforms A2 on S2p and S3p, being about seven times faster. Figure 2 shows that Fp can also solve the much larger instances in S4p in reasonable computing times. Confirming what discussed in Lozano and Smith (2017); Xu and Wang (2014), Figs. 1, 2 and 3 shows that the pessimistic problem is, in general, harder than the optimistic one; it requires larger solution and separation times and the corresponding search tree enumerates more nodes than in the optimistic case on the same instances. In fact, the reformulation of the optimistic problem via Theorem 1 allows to check a given solution only once before declaring it feasible or cutting it, while the pessimistic version of the problem cannot benefit from this result. For this reason, each solution that is generated during the algorithm may need to be tested three times before a decision on it is made. In addition, the strengthening techniques that can be used in the optimistic case (see Sect. 3.5) further reduce the computing times for the optimistic problem.

Conclusions
We studied bilevel programming problems where both the leader and the follower variables can be integer. Necessary and sufficient conditions for the optimistic and pessimistic policy to be equivalent were provided. We introduced a new family of inequalities, the follower optimality cuts, which allowed to reformulate the bilevel problem as a noncompact single-level problem. The strength of these cuts with respect to the no-good inequalities was discussed. Finally, we devised a branch-and-cut algorithm and presented a computational testing showing that the proposed approach outperforms the other approaches in the literature on most of the well-known sets of benchmark instances considered in the experiments.
Data availability Enquiries about data availability should be directed to the authors.

Conflict of interest
The author has no conflict of interest concerning this study.
Human and animal rights This paper does not contain any studies with human participants or animals performed the author.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.