1 Introduction

Water utilities are facing unprecedented operational challenges from growing water demand, ageing water infrastructure and more stringent environmental standards. Moreover, utilities are required to continuously improve the quality of service and satisfy customers’ expectations for a cost-efficient operation. Consequently, novel technologies and operational innovation are urgently needed to achieve adaptive, optimal and intelligent management of water supply networks.

The hydraulic pressure in pipes is a critical control variable for water supply networks because high pressure is closely related to leakage losses and burst frequency [26]. Significant benefits can be realized by continuously controlling the operational pressure, under stochastic changes in demand, close to a minimum level as defined by regulations [32]. For over two decades, the subdivision of water distribution networks into small sectors, called District Metered Areas (DMAs), has been successfully applied in the pursuit of low-cost leak reduction methods by facilitating simplistic demand metering and pressure control [23]. The sectorization of water networks is implemented through the installation of closed boundary valves in order to form small metered areas (sectors); consequently, the flow into these sectors can be accurately measured and the pressure can be reduced to continuously maintain the minimum pressure requirements at a critical point. While this practice has allowed an efficient leakage management, it has severely reduced the redundancy in network connectivity thus affecting the system resilience [43] and water quality [3]. Recent work on water distribution networks with dynamically adaptively network topology [43] pioneers a hybrid mode of operation that integrates the benefits of leakage reduction and management provided by sectorised networks with the extra benefits of improved network connectivity, redundancy and resilience. Dynamically adaptive networks are segregated into small sectors during periods of low demand (e.g. at night) in order to maximize the detection and pre-localisation of leaks. The sectors are then aggregated in order to achieve an optimal pressure and effective resilience management during the remaining daily operation. This hybrid mode of operation provides unique control opportunities to achieve short and long-term operational gains.

To benefit from these emerging and advanced control schemes, the retrofitting of existing networks with control valves, which provide advanced forms of flow and pressure modulation [41, 43], requires the formulation and solution of both design and operational optimization problems. In the present work we focus on the mathematical optimization for network pressure management, minimizing average zone pressure through the optimal placement and operation of pressure control valves.

The placement of control valves is a challenging design problem, even for a relatively small and simple network as shown in Fig. 1a. The number of possible combinations of locations for \(n_v\) valves in a network with \(n_p\) pipes is equal to \(\left( {\begin{array}{c}n_p\\ n_v\end{array}}\right) =\frac{n_p!}{n_v!(n_p-n_v)!}\).

In Fig. 1b, we show an operational network model with over 250 control valves and about 110,995 links, serving approximately 1,000,000 customers in South West England. Searching for the optimal combination of control valve locations is combinatorially infeasible for operational networks. The application of scalable mathematical programming tools adapted to the special structure of this problem is therefore required to facilitate the cost–benefit analysis, adoption and implementation of advanced pressure management solutions.

Fig. 1
figure 1

a Network model from [40] with 22 junctions, 3 reservoirs, 37 links, 7770 possible configurations for the placement of 3 valves. b Operational network (South West England) investigated by the authors: 106,804 junctions, 32 water sources, 110,995 pipes and \({\approx }\)250 PRVs with the intention to significantly increase these in the next 5 years

The problem formulation combines binary variables (that denote whether a valve is placed on a link or not) with continuous variables representing nodal pressures and pipe flows. Valve control is embedded in the optimal placement problem, since the operational settings of a valve are defined by the pressure at its outlet. The use of physically feasible models for operational water distribution networks involves the formulation of hydraulic equations that lead to nonlinear constraints. Together with the discrete decision variables, these result in a nonconvex optimization problem with a large number of integer variables, belonging to the class of mixed integer nonlinear programming (MINLP). These problems are particularly challenging since they combine the difficulties of handling nonconvex nonlinear constraints with the complication of optimizing within a space of discrete variables. Although mixed integer nonlinear programs are intractable in most general cases, the growing importance of large scale mixed integer problems in several engineering applications has motivated recent research in various solution methods. For an extensive survey of these emerging methods see [5, 17].

Optimal valve placement problems have been studied using both mathematical programming and heuristic methods. Genetic algorithms and meta-heuristic approaches have been applied [1, 29, 30]. Nonetheless, these methods have various limitations. Firstly, such heuristics do not guarantee convergence to optimal solutions (not even to local optima). Secondly, they require a large number of function evaluations of objective and constraints in order to achieve good quality solutions. Therefore, with applications in advanced dynamic control schemes too, it is important to study reliable and effective mathematical methods for optimal valve placement and control. A mathematical formulation of the problem was first proposed in [20], where a mixed integer linear programming method was applied to a linear approximation of the original MINLP problem. A direct solution for optimal valve placement with the MINLP solver BONMIN has been proposed in [15], considering a single steady-state snapshot of daily network operation. However, a faithful representation of real world water distribution systems requires the inclusion of multiple demand scenarios. These were considered in the recent work in [13], where a direct solution using BONMIN was compared with a penalty method. Since the mixed integer problem in study involves only binary variables, it can be equivalently reformulated as a mathematical program with complementarity constraints (MPCC). In fact, each binary variable can assume just one of the two complementary values, 0 or 1. The MPCC was solved in [13] using a penalty method, coupled with a heuristic rounding scheme.

In this manuscript, we build upon ideas outlined in [13] and present a rigorous analysis of the penalty method and its theoretical properties together with an evaluation of alternative practical algorithmic implementations, neither of which were discussed in [13]. In the penalty method, the optimization is performed in a feasible set obtained by dropping the complementarity constraints, which are then embedded into the objective function by penalizing complementarity violations. A sequence of penalty problems can then be solved using standard NLP solvers and the solutions will converge to a stationary point of the original MPCC problem. In addition, we propose the application of a relaxation method . In this case, we solve a sequence of relaxed problems with ‘regular’ approximations for the complementarity constraints; these approximate feasibility sets are described using a relaxation parameter. The relaxed problems can be solved with standard NLP solvers. The relaxed feasible sets converge to the pathological feasibility set of the MPCC, hence the sequence of solutions will converge to a stationary solution of the original problem. Convergence properties under suitable assumptions have been discussed in a more general form for both penalty methods [21, 35] and relaxation methods [34, 37]. We survey, evaluate and tailor these results to the optimal valve placement problem, and show that the convergence to at least stationary points is guaranteed.

In addition, we propose algorithmic implementations of both methods and apply them to a case study. Since the optimization problem is nonconvex, like most nonlinear programming algorithms, under suitable assumptions all the methods can guarantee convergence only to local minimum points and the quality of the solutions will depend on the initial points. We take this into account when comparing different approaches and therefore we consider a solution qualitatively good if it is obtained with various random initial guesses and provides an average zone pressure close to the best known solution.

2 Problem formulation

In the present formulation we model a water distribution network with \(n_0\) water sources (e.g. reservoirs or tanks), \(n_n\) nodes and \(n_p\) pipes, as a directed graph (VE), with \(n_n+n_0\) vertices and \(2n_p\) links, since we consider bidirectional positive flows. This means that link j and link \(j+n_p\) correspond to the same physical pipe. Moreover, we include in the formulation \(n_l\) different demand scenarios.

Let \(k \in \{1,\ldots ,n_l\}\) be a time step. We define the vectors of unknown pressure heads and flows as \(p^k=[p^k_1,\ldots ,p^k_{n_n}]^T\) and \(q^k=[q^k_1,\ldots ,q^k_{2n_p}]^T\), respectively. Each node i has known elevation and demand \(e_i\) and \(d_i^k\). Moreover, hydraulic heads at the water sources are known and denoted by \({h_0^k}_i\) for each \(i=1,\ldots ,n_0\). Let link j have flow \(q_j\) going from node \(i_1\) to node \(i_2\). The friction head loss across the pipe can be represented by either the Hazen–Williams (HW) or Darcy–Weisbach (DW) formulae. In DW models the relation between friction head loss and flow is defined by an implicit semi-empirical equation, which involves non-smooth terms, and it can be numerically calculated through an iterative process [27, Section 2.2.2]. This complicates the use of these models in a smooth mathematical optimization framework. Similarly, HW formula is semi-empirical and is given by \(HW_f(q^k_j)=r_j (q^k_j)^{n}\), where \(r_j\), the resistance coefficient of the pipe, is defined by \(r_j=\frac{10.67 L_j}{C_j^{n} D_j^{4.871}}\) with \(n=1.852\) and \(L_j,C_j,D_j\) denote the length, roughness and diameter of the pipe, respectively. The HW formula involves a non-smooth exponential function since the corresponding Hessian is unbounded around the origin. Therefore, the HW formula is difficult to handle as a constraint for most nonlinear programming solvers that rely on second-order information.

In the framework of mathematical optimization for water distribution networks, the use of nonlinear programming techniques requires a smooth function that closely approximates the head loss curve over a range of flows. In particular, an approximation of the HW head loss formula with a piecewise function is proposed in [10], using a quintic polynomial approximation near zero. As noted in [16], such approach introduces computational complexities due to the high order polynomial function. Moreover, in the present framework, the use of a piecewise approximation would require the introduction of a large number of binary variables in addition to those needed to model the placement of valves. On the other hand, various explicit approximations of the DW head loss formula can be found in literature. In particular, a smooth and asymptotically consistent approximation was presented in [11]. A smooth quadratic friction loss approximation for both HW and DW models, determined by a minimization of the integral of relative errors, was considered in [15]; the analysis reported in [16] have showed that, in practice, the use of polynomial quadratic friction loss formulae does not affect significantly the distribution of network pressures and flows. In the present manuscript, we consider a quadratic polynomial \(h_f(q):=aq^2+bq\) where, in contrast to the cited works, we choose the coefficients of \(h_f\) so that the integral of absolute errors is minimized. This is because we are mainly interested in the absolute violation of optimization constraints.

In the case of a HW model, the quadratic approximation is determined as follows. Let \(j \in \{1,\ldots ,2n_p\}\) be a link in our water distribution network model and let the corresponding maximum allowed flow velocity magnitude in the link be \({V_j^{max}}>0\). We set \({Q_j^{max}}:=~\frac{\pi D_j^2}{4}{V_j^{max}}\) and minimize the integral  \(J_{Q_j^{max}}(a,b):=\int _{0}^{{Q_j^{max}}} (aq^2+bq-r_j q^{n} )^2 dq\). After few steps of calculations, it is possible to find coefficients \(a^*\) and \(b^*\) that minimize the above integral. Although not discussed here, we can also find a quadratic fit for DW head loss—see [16]. For both DW and HW models, once a quadratic approximation for head losses is identified, the optimization problem to minimize average zone pressure can be formulated as follows [15]:

$$\begin{aligned} \min&\sum ^{n_l}_{k=1}\frac{1}{W}\sum ^{n_n}_{i=1}w_ip^k_i \end{aligned}$$
(1a)
$$\begin{aligned} \text {subject to: } \quad&A_{12}^Tq^k - d^k =0, \quad \forall k=1,\ldots ,n_l, \end{aligned}$$
(1b)
$$\begin{aligned}&Q(q^k)\bigl (-A_{12}p^k -A_{12}e - A_{10}h_0^k -h_f(q^k) \bigr ) \ge 0, \quad \forall k=1,\ldots ,n_l, \end{aligned}$$
(1c)
$$\begin{aligned}&-A_{12}p^k -A_{12}e-A_{10}h_0^k - h_f(q^k) - {M^k z} \le 0, \quad \forall k=1,\ldots ,n_l, \end{aligned}$$
(1d)
$$\begin{aligned}&{z_{j}+z_{n_p+j}}\le 1, \quad \forall j=1,\ldots ,n_p, \end{aligned}$$
(1e)
$$\begin{aligned}&{\sum _{j=1}^{2n_p} z_{j}} = n_v, \end{aligned}$$
(1f)
$$\begin{aligned}&p^k_{min} \le p^k \le p^k_{max}, \quad \forall k=1,\ldots ,n_l, \end{aligned}$$
(1g)
$$\begin{aligned}&0 \le q^k \le Q^{max},\quad \forall k=1,\ldots ,n_l, \end{aligned}$$
(1h)
$$\begin{aligned}&{z} \in \{0,1\}^{2n_p}, \end{aligned}$$
(1i)

where the objective is the minimization of average zone pressure (AZP) at each demand scenario, expressed with the weighted sum of nodal pressures (1a), where weights \(w_i\) are defined by \(w_i:=\sum _{j \in I(i)}\frac{L_j}{2}\), with I(i) the set of indices of links that are connected to node i. Moreover, we set \(W:=\sum _{i=1}^{n_n}w_i\).

The optimization problem is primarily subject to hydraulic constraints, in order to ensure the physical feasibility of the solutions. Therefore, mass and energy conservation laws have to be satisfied at each demand pattern, these are expressed by (1b) and the couple (1c), (1d), respectively. While (1b) expresses the conservation of flow at each junction node, (1c) and (1d) consider the head loss across all links. The matrices \(A_{12}^T \in {\mathbb {R}}^{n_n \times 2n_p}\) and \(A_{10}^T \in {\mathbb {R}}^{n_0 \times 2n_p}\) are the node-edge incidence matrices for the \(n_n\) nodes and the \(n_0\) water sources, respectively. Moreover, \(Q \in {\mathbb {R}}^{2n_p \times 2n_p}\) is the diagonal matrix of flows \(q^k\), i.e. \(Q(q^k)_{j,j}:=q^k_j\), and \(h_f(q^k):=[h_f(q^k_1) \; ... \; h_f(q^k_{2n_p})]^T\). Finally, \(M^k \in {\mathbb {R}}^{2n_p \times 2n_p}\) is a diagonal matrix of sufficiently big constants, see below for a possible choice.

Let \(i_1 \xrightarrow {j}i_2\), with \(q^k_j > 0\) and \({z_j}=0\). Then the corresponding rows in (1c) and (1d) are equivalent to the Bernoulli equation: \(p^k_{i_1}+e_{i_1}-p^k_{i_2}-e_{i_2}=h_f(q^k_j)\). The placement of a valve on the pipe j disables (1d) provided \(M^k_{jj}\) is big enough; the resulting constraint is \(p^k_{i_1}+e_{i_1} - p^k_{i_2}-e_{i_2}-h_f(q^k_j) \ge 0\). Therefore, the head loss across the link will be greater than or equal to the friction loss, this means that the pressure at the downstream node will be further reduced by the action of the valve compared to just the friction loss. When \(q^k_j=0,\) both constraints are disabled, since in this case there is no flow in link j at time k. As shown in [39], a solution where flows in both directions \(i_1 \rightarrow i_2\) and \(i_2 \rightarrow i_1\) are strictly positive is infeasible for constraints (1b)–(1i). Note that constraints (1c) and (1d) are nonconvex, as it is possible to verify directly from the definition of convex function [9, Definition 3.1.1].

Finally, the big-M constants \(M^k_{j,j}\) are chosen to be as tight as possible based on the characteristics of each link. Given \(i_1 \xrightarrow {j}i_2\), let \((h^k_{max})_{i_1}\) and \((h^k_{min})_{i_2}\) be the maximum and the minimum possible hydraulic heads at node \(i_1\) and \(i_2\), then we set \(M^k_{j,j}:={(h^k_{max})}_{i_1}-{(h^k_{min})}_{i_2}\).

Additional constraints also arise from explicit physical, performance and economic constraints of the optimization problem. Inequality (1e) and equality (1f) state that just 1 valve is allowed on each pipe and that the total number of valves in the network is \(n_v\), respectively. The minimum and maximum allowed pressures at each junction node are specified by (1g), with \((p^k_{max})_i= \max _{l\in \{1,\ldots ,n_0\}}({h^k_0}_l) - e_i\), for each node i and time step k. Moreover, in (1h) we specify positive flow rates and fix the maximum flow in each pipe to \(Q_j^{max}\). Finally, we have the binary constraint (1i). Problem (1) is nonconvex MINLP with \(n_l(n_n+2n_p)\) continuous variables and \(2n_p\) binary variables. In addition, the problem formulation includes \(4n_pn_l\) nonlinear constraints and \(n_l(3n_n+4n_p)+n_p+1\) linear constraints.

3 Reformulation of MINLPs as mathematical programs with complementarity constraints

Problem (1) is a nonconvex MINLP, with polynomial nonlinear constraints and \(N=n_l(n_n+2n_p)+2n_p\) variables. The vector of unknowns is given by \(x=[p^T,q^T,{z}^T]^T \in {\mathbb {R}}^N\). It is difficult to handle this class of optimization problems, which combine nonconvex nonlinear constraints with discrete decision variables. Moreover, when considering a real-world operational network model like the one in Fig. 1b, the dimension of the optimization problem becomes very large (more than 8 million variables and 13 million constraints, if \(n_l=24\)). This is the main motivation for the investigation of possible scalable mathematical programming methods for the solution.

Problem (1) can be written in a more compact form. We define the set of indices corresponding to the binary variables as \(B:=\big \{n_l (n_n+2n_p)+1,\ldots ,n_l(n_n + 2n_p)+2n_p \big \}.\) Similarly, let I and E represent the index sets for the rows of inequalities and equalities in (1), respectively. Then (1) becomes:

$$\begin{aligned} \begin{aligned} \min _{x \in {\mathbb {R}}^N} \quad&f(x) \, \\ \text {subject to}\quad&g_{i}(x) \ge 0, \quad \forall i \in I \\&h_i(x) = 0, \quad \forall i \in E,\\&x_j \in \{0,1\}, \quad \forall j \in B, \end{aligned} \end{aligned}$$
(2)

where \(g(x)=(g_i(x))_{i \in I}\) is the vector whose components are the rows of inequalities (1c), (1d), (1e), (1g) and (1h). Analogously, \(h(x)=(h(x)_i)_{i \in E}\) is the vectors whose components are the rows of equalities (1b) and (1f). Since its integer constraints are binary, Problem (2) can also be reformulated as a mathematical program with complementarity constraints (MPCC):

$$\begin{aligned} \begin{aligned} \min _{x \in {\mathbb {R}}^N} \quad&f(x) \\ \text {subject to}\quad&g_i(x) \ge 0, \quad \forall i \in I, \\&h_i(x) = 0, \quad \forall i \in E,\\&0 \le x_j \perp 1-x_j \ge 0, \quad \forall j \in B. \\ \end{aligned} \end{aligned}$$
(3)

We note that the feasible region of Problem (3) is disjoint and in [4] it is recommended to avoid such pathological case in the formulation of complementarity constraints. However, various techniques to handle nonconvex, ill-conditioned problems without a strict relative interior are adopted by advanced nonlinear programming solvers—see as example [42]. Although these algorithmic modifications are not sufficient to deal with general badly posed problems, the analysis reported in Sect. 6 shows that, in practice, an MPCC reformulation represents a valid alternative to standard MINLP techniques for the solution of optimal valve placement and operation in water distribution networks.

In nonlinear optimization, in order to guarantee the convergence of standard solution methods to stationary points, usually linear independence constraints qualification (LICQ) is required. For an exhaustive survey on nonlinear programming see [31]. The feasible set of a mathematical program with complementarity constraints has a special structure which results in the violation of standard constraints qualifications. Various methods have been proposed in order to deal with this pathological characteristic. Since we are interested in the class of MPCC problems represented by (3) we review all the definitions and results in a particular form, using the same notations as Problem (3). For a more general discussion we refer the reader to [21, 35,36,37] and the references therein.

It is necessary to introduce a suitable constraints qualification for (3). Although multiple variants of standard constraints qualifications exist for the MPCC framework, here it suffices to consider MPCC–LICQ for our purposes.

First of all we define the following set of indices: \(I_g(x):=\{ i \in I \, | \, g_i(x)=0 \}, I_0(x):= \{ j \in B \, | \, x_j=0 \}, I_1(x):= \{ j \in B \, | \, x_j=1 \}\). In the following, given \(j \in B\), we denote with \({\mathbf {e}}_j \in {\mathbb {R}}^N\) the jth column of the identity matrix.

Definition 1

[35, Definition 2.3] A feasible point \(x^\star \) for (3) is said to satisfy MPCC–LICQ if the gradients \(\bigl \{\nabla g_i(x^\star ) \, \big | \, i \in I_g(x^\star ) \bigr \} \cup \bigl \{ \nabla h_i(x^\star ) \, \big | \, i \in E \bigr \} \cup \bigl \{ {\mathbf {e}}_j \, \big | \, j \in B \bigr \} \) are linearly independent.

As noted in [38], MPCC–LICQ is a generic condition for mathematical programs with complementarity constraints. In particular, it is possible to prove that Problem (3) satisfies the above constraint qualification at every feasible point except a set of measure zero, once a small perturbation is applied to the optimization constraints; for the sake of brevity we omit the proof which can be found in the Appendix to [33].

For a general MPCC, multiple stationarity conditions can be formulated, the main ones being C-stationarity, M-stationarity, B-stationarity and strong stationarity [35, 37]. Given the definitions presented in [37], we have that C-stationarity, M-stationarity and strong stationarity differ only on the index set where both complementary terms are active—in our case we have \(I_0(x)\cap I_1(x)=\emptyset \). Consequently, for every problem with binary constraints, these three stationarity conditions are equivalent. Moreover, if MPCC–LICQ holds, B-stationarity and strong stationarity are equivalent [35]. Therefore, all stationarity concepts are equivalent to strong stationarity for Problem (3), once an arbitrarily small perturbation of the constraints is applied. In what follows we will refer to a strong stationary point simply as stationary.

Definition 2

Let \(x^\star \) be a feasible point for (3). \(x^\star \) is said to be stationary if there exist multipliers \(\lambda ^\star =(\lambda ^\star _i)_{i \in I}, \, \mu ^\star =(\mu ^\star _i)_{i \in E}, \, \gamma ^\star =(\gamma ^\star _j)_{j \in B}, \nu ^\star =(\nu ^\star _j)_{j \in B}\) such that

$$\begin{aligned} \begin{aligned} \nabla f(x^\star ) -&\sum _{i\in I} {\lambda ^\star _i \nabla g_i(x^\star )} - \sum _{i \in E}{\mu ^\star _i \nabla h_i(x^\star )} - \sum _{j \in B}{\gamma ^\star _j {\mathbf {e}}_j} + \sum _{j \in B} {\nu ^\star _j {\mathbf {e}}_j}=0\\ \text { and }\quad&\lambda ^\star _i \ge 0, \quad \forall i\in I_g(x^\star ),\quad \gamma ^\star _j = 0, \quad \forall j \in I_1(x^\star ),\quad \nu ^\star _j=0, \quad \forall j \in I_0(x^\star ). \end{aligned} \end{aligned}$$

We have the following result on necessary conditions for local optimality, for a proof see [35] or [36] .

Theorem 1

[35, Theorem 2.4] A local minimizer \(x^\star \) for Problem (3) satisfying MPCC–LICQ is stationary. Moreover, the multipliers \((\lambda ^\star ,\mu ^\star ,\gamma ^\star ,\nu ^\star )\) are unique.

In the next sections we present a rigorous mathematical framework for penalty and relaxation methods. Moreover, we propose their algorithmic implementations for the solution of our optimization problem.

4 Penalty method

Using the same notation of Problem (3) we introduce the following penalty function \( \Psi (x)= \sum _{j \in B} x_j(1-x_j)\). For \(\rho >0\) fixed, consider the nonlinear program PEN(\(\rho \)):

$$\begin{aligned} \begin{aligned} \min _{x \in {\mathbb {R}}^N}\quad&f(x)+\rho \Psi (x) \\ \text {subject to}\quad&g_{i}(x) \ge 0, \quad \forall i \in I, \\&h_i(x) = 0, \quad \forall i \in E,\\&0 \le x_j \le 1, \quad \forall j \in B. \\ \end{aligned} \end{aligned}$$
(4)

Penalty approaches have been studied in [21, 35], while in [13] a similar method was applied to optimal valve placement in water distribution networks, coupled with a rounding scheme to improve the convergence of their algorithm. The overall strategy consists in the solution of a sequence of penalty problems, for increasing values of parameter \(\rho \). The claim is that for \(\rho \) sufficiently large this sequence will converge to a solution of Problem (3).

The following result states necessary conditions for the convergence of the penalty approach, the proof is a particular case of Theorem 2.1 in [21].

Theorem 2

(Stationarity of the limit point) Suppose that for each \(k \in {\mathbb {N}}, x^k\) is a stationary point of PEN(\(\rho ^k\)), i.e. \(x^k\) satisfies the KKT conditions for this nonlinear problem. Moreover, assume that \(\lim _{k \rightarrow \infty } \rho ^k=+\infty \) and \(\lim _{k \rightarrow \infty }x^k= x^\star \), with \(x^\star \) feasible point for Problem (3) that satisfies the MPCC–LICQ. Then \(x^\star \) is stationary for Problem (3), in the sense of Definition 2.

In view of Theorem 2, if a sequence of stationary points for penalized problems converges to a feasible point of Problem (3) which satisfies MPCC–LICQ, then this is also a stationary point. However, it is not automatically true that every limit point is feasible. In the following we focus on this issue.

The next Lemma was proved in [21] in a more general form.

Lemma 1

Let \((x^k)_{k \in {\mathbb {N}}}\) be a sequence of stationary points of PEN(\(\rho ^k\)) with \(\rho ^k \rightarrow + \infty \). Assume that \(\lim _{k \rightarrow + \infty } x^k={\bar{x}}\). Let \({\hat{x}}\) be a feasible point for Problem (3) that satisfies MPCC–LICQ. Then there exists \(\varepsilon >0\) such that if \({\bar{x}} \in B({\hat{x}},\varepsilon ), {\bar{x}}\) is feasible for Problem (3).

Therefore, if the iterations arrive sufficiently close to a feasible point that satisfies MPCC–LICQ then the limit point is feasible. We need also the following lemma, for the sake of brevity we omit the technical proof, which can be found in [21].

Lemma 2

Let \(x^\star \) be a strict local minimum for Problem (3). Let \(r>0\) be such that \(x^*\) is the only minimum for (3) in \(B(x^*,r)\). Then, there exists \(\rho (r)>0\) such that PEN(\(\rho \)) has a local minimum in \(B(x^*,r)\), for all \(\rho > \rho (r)\).

Finally, under standard assumptions on the strict local minimum \(x^*\), one can prove that if the sequence \((x^k)_k\) of the penalty method is sufficiently close to \(x^*\) then \(x^k \rightarrow x^*\). Before proceeding to the next Theorem, we include the following definition.

Definition 3

Let \(x^*\) be a stationary point for (3) with associated multipliers \((\lambda ^\star ,\mu ^\star ,\gamma ^\star ,\nu ^\star )\). We say that the strong second-order sufficient condition (MPCC–SSOSC) holds at \(x^*\) if

$$\begin{aligned} u^T \Bigg ( \nabla ^2f(x^*)-\sum _{i\in I} {\lambda ^\star _i \nabla ^2 g_i(x^\star )} - \sum _{i \in E}{\mu ^\star _i \nabla ^2 h_i(x^\star )}\Bigg ) u>0 \end{aligned}$$

for all vectors \(u \ne 0\) such that

$$\begin{aligned} \begin{aligned}&\nabla g_i(x^*)^Tu=0, \quad \forall i \text { such that }\lambda _i>0\\&\nabla h_i(x^*)^Tu=0, \\&u_j=0, \quad \forall j \in I_0(x^*) \text { and } \gamma ^*_j \ne 0, \\&u_j=0, \quad \forall j \in I_1(x^*) \text { and } \nu ^*_j \ne 0. \end{aligned} \end{aligned}$$

We are now in a position to state the final result of this Section, which is presented in [21] in a more general form - we review its proof here for the sake of completeness.

Theorem 3

Assume that \(x^\star \) is a stationary point for Problem (3) at which MPCC–LICQ and MPCC–SSOSC hold. Let \(S(\rho )\) be the set of stationary points of PEN(\(\rho \)). Then there exists an \(r>0\) and \(\rho (r)\) such that \(S(\rho ) \cap B(x^\star ,r) \ne \emptyset \), for all \(\rho >\rho (r)\).

Moreover, if \((x^k)_{k \in {\mathbb {N}}} \subset B(x^\star ,r)\) is a sequence of stationary points of PEN(\(\rho ^k\)) with \(\rho ^k \rightarrow +\infty \), then \(\lim _k x^k=x^\star \).

Proof

We observe that \(x^*\) satisfies the upper level strict complementarity condition (ULSC) given in [36], because \(I_0(x^*)\cap I_1(x^*)=\emptyset \). Therefore, under the assumptions of the Theorem, \(x^*\) is a strict local minimum and a locally unique stationary point of Problem (3)—see [36, Theorem 11] . As a consequence, there exists \(\delta _1>0\) such that \(x^*\) is the unique stationary point of (3) in \(B(x^*,\delta _1)\). Furthermore, since MPCC–LICQ holds at \(x^*\) there exists \(\delta _2>0\) such that MPCC–LICQ holds at any point \(x \in B(x^*,\delta _2)\) feasible for Problem (3). Now let \(r=\min (\delta _1,\delta _2)\). Then by Lemma 2 \(\exists \rho (r)>0\) such that \(S(\rho ) \cap B(x^\star ,r) \ne \emptyset \), for all \(\rho >\rho (r)\).

Consider a sequence \((x^k)_{k \in {\mathbb {N}}}\subset B(x^*,r) \) of stationary points of PEN(\(\rho ^k\)) with \(\rho ^k \rightarrow + \infty \). Then \((x^k)\) has a limit point \({\bar{x}} \in B(x^*,r)\) that is feasible for Problem (3) and satisfies MPCC–LICQ. Hence by Theorem 2 we have that \({\bar{x}}\) is stationary for (3). Since \(x^*\) is the unique stationary point in \(B(x^*,r)\) then \({\bar{x}}=x^*\). \(\square \)

In view of the above results, we implement a Matlab algorithm for the solution of Problem 3 with the penalty approach, see Algorithm 1. The initial guess \(g_0\) is selected randomly and the complementarity violation is evaluated as follows:

$$\begin{aligned} \texttt {Vio}(x)=\max _{j \in B}\bigl ({\min {(x_j,1-x_j)}\bigr )} \end{aligned}$$
(5)

At each step of the algorithm we solve PEN(\(\rho ^k\)) using the nonlinear solver for large scale sparse optimization problems IPOPT [42]. The stopping criteria is motivated by Lemma 1 and is defined in terms of the complementarity violation, so that the end point is sufficiently close to a feasible solution of Problem (3).

figure a

5 Relaxation method

Alternative approaches for the solution of Problem (3) include relaxation methods. In the present work we focus on the technique proposed in [37]—for a general analysis see [18, 34, 35]. A theoretical study of the present relaxation approach is included in [37] as appendix, however for completeness is reviewed here.

For \(t>0,\) consider the nonlinear program REL(t)

$$\begin{aligned} \begin{aligned} \min _{x \in {\mathbb {R}}^N} \,&f(x) \\ \text {subject to: }&g_i(x) \ge 0, \quad \forall i \in I, \\&h_i(x) = 0, \quad \forall i \in E, \\&0 \le x_j \le 1, \quad \forall j \in B, \\&\sum _{j \in B}x_j(1-x_j) \le t. \end{aligned} \end{aligned}$$
(6)

Let the feasible set of REL(t) be denoted by \({\mathbf {F}}(t), t \ge 0\). It is evident that the feasible set of Problem (3) coincides with \({\mathbf {F}}(0)\). The following Lemma is proved in [18], in a slightly more general form. We omit the details for the sake of brevity.

Lemma 3

The following relations hold:

  1. (a)

    \(\bigcap _{t>0}{\mathbf {F}}(t)={\mathbf {F}}(0)=\text {feasible set of}\) (3)

  2. (b)

    \({\mathbf {F}}(t_1) \subseteq {\mathbf {F}}(t_2), \quad \forall t_1 \le t_2\)

In contrast to the penalty method, in view of the above lemma we do not need additional conditions to guarantee that a limit point of a sequence of solutions of relaxed problems with \(t \rightarrow 0\) is feasible for Problem (3). The following result is proved in [18] and it is a direct consequence of Lemma 3.

Theorem 4

  1. (a)

    Let \({\bar{t}}>0\) and assume that \({\bar{x}}\) is a feasible point for Problem (3) which is also a local minimum for REL(\({\bar{t}}\)). Then \({\bar{x}}\) is a local minimum of REL(t), for all \(t \in [0,{\bar{t}}]\).

  2. (b)

    Let \({\bar{t}}>0\) and assume that \({\bar{x}}\) is feasible point for Problem (3) which is also a global minimum of REL(\({\bar{t}}\)). Then \({\bar{x}}\) is a global minimum of REL(t), for all \(t \in [0,{\bar{t}}]\).

  3. (c)

    Let \((t^k)_{k \in {\mathbb {N}}} \subset (0,+\infty )\) such that \(t^k \xrightarrow {k \rightarrow +\infty } 0\) and let \((x^k)_k\) be a sequence of global minima of the problems REL(\(t_k\)). Furthermore let \({\bar{x}}\) an accumulation point of \((x^k)_k\). Then \({\bar{x}}\) is a global minimum of Problem (3).

In view of the above Theorem we can find a global solution for Problem (3) by generating a sequence of global solutions of relaxed problems. However, most NLP solvers are designed to find only stationary points. Hence in the following we focus on the set of stationary points of problems (3) and REL(t). The following results are proved in [37] in a more general form, for completeness we derive them in our framework.

Lemma 4

[37, Lemma 8.1] Suppose that MPCC–LICQ holds at a feasible point \({\bar{x}}\) for Problem (3). Then there exist a neighbourhood U of \({\bar{x}}\) and \({\bar{t}}>0\) such that for all \(t \in (0,{\bar{t}})\) the standard LICQ holds at every feasible point \(x \in U\) for REL(t).

Proof

The linear independent family of the active constraint gradients for Problem (3) is given by

$$\begin{aligned} \bigl \{ \nabla g_i({\bar{x}})\, | \, i \in I_g({\bar{x}}) \bigr \} \cup \bigl \{ \nabla h_i({\bar{x}})\, | \, i \in E \bigr \} \cup \bigl \{ {\mathbf {e}}_j \, | \, j \in I_0({\bar{x}}) \bigr \} \cup \bigl \{ {\mathbf {e}}_j \, | \, j \in I_1({\bar{x}}) \bigr \}\nonumber \\ \end{aligned}$$
(7)

We define \(I_{++}(x):=\bigl \{ j \in B \, \big | \, x_j>0,\,1-x_j>0 \bigr \}\). Note that by continuity of g, there exists a neighbourhood U of \({\bar{x}}\) and \({\bar{t}}>0\) such that for all \(t \in (0,{\bar{t}})\) and \(x \in U\cap {\mathbf {F}}(t)\) we have:

$$\begin{aligned} \begin{aligned}&I_g(x) \subseteq I_g({\bar{x}}), \; I_0(x) \subseteq I_0({\bar{x}}), \; I_1(x) \subseteq I_1({\bar{x}}),\\&I_0(x) \cup \big (I_{++}(x)\cap I_0({\bar{x}}) \big ) \subseteq I_0({\bar{x}}), \\&I_1(x) \cup \big (I_{++}(x) \cap I_1({\bar{x}}) \big )\subseteq I_1({\bar{x}}). \end{aligned} \end{aligned}$$
(8)

If \(\sum _{j \in B} x_j(1-x_j)<t\) then the family of active constraint gradients is

$$\begin{aligned} \bigl \{ \nabla g_i(x)\, | \, i \in I_g(x) \bigr \} \cup \bigl \{ \nabla h_i(x)\, | \, i \in E \bigr \} \cup \bigl \{ {\mathbf {e}}_j \, | \, j \in I_0(x) \bigr \} \cup \bigl \{ {\mathbf {e}}_j \, | \, j \in I_1(x) \bigr \} \end{aligned}$$
(9)

Therefore the assertion of the Lemma is verified as a direct consequence of the continuity of the derivatives of g and h.

Now suppose that \(\sum _{j \in B} x_j(1-x_j)=t\). In this case the family of active constraint gradients for REL(t) is given by

$$\begin{aligned} \begin{aligned}&\bigl \{ \nabla g_i(x)\, | \, i \in I_g(x) \bigr \} \cup \bigl \{ \nabla h_i(x)\, | \, i \in E \bigr \} \\&\cup \bigl \{ {\mathbf {e}}_j \, | \, j \in I_0(x) \bigr \} \cup \bigl \{ {\mathbf {e}}_j \, | \, j \in I_1(x) \bigr \} \cup \Bigl \{ \sum _{j \in B} (1-2x_j){\mathbf {e}}_j \Bigr \} \end{aligned} \end{aligned}$$
(10)

We have that

$$\begin{aligned} \begin{aligned} \sum _{j \in B} (1-2x_j){\mathbf {e}}_j&=\sum _{j \in I_0(x)}{\mathbf {e}}_j - \sum _{j \in I_1(x)}{\mathbf {e}}_j +\sum _{j \in I_{++}(x)}(1-2x_j){\mathbf {e}}_j \\&=\sum _{j \in I_0(x)}{\mathbf {e}}_j - \sum _{j \in I_1(x)}{\mathbf {e}}_j \\&\quad +\sum _{j \in I_{++}(x) \cap I_0({\bar{x}})}(1-2x_j){\mathbf {e}}_j + \sum _{j \in I_{++}(x) \cap I_1({\bar{x}})}(1-2x_j){\mathbf {e}}_j, \end{aligned} \end{aligned}$$

Then consider the following family:

$$\begin{aligned} \begin{aligned}&\bigl \{ \nabla g_i(x)\, | \, i \in I_g(x) \bigr \} \cup \bigl \{ \nabla h_i(x)\, | \, i \in E \bigr \} \cup \bigl \{ {\mathbf {e}}_j \, | \, j \in I_0(x) \bigr \} \cup \bigl \{ {\mathbf {e}}_j \, | \, j \in I_1(x) \bigr \} \\&\cup \bigl \{ {\mathbf {e}}_j \, | \, j \in I_{++}(x)\cap I_0({\bar{x}}) \bigr \} \cup \bigl \{ {\mathbf {e}}_j \, | \, j \in I_{++} \cap I_1({\bar{x}}) \bigr \} \end{aligned}\nonumber \\ \end{aligned}$$
(11)

Since (8) holds, then vectors in (11) are linearly independent if \(x \in U\cap {\mathbf {F}}(t)\) and \(t \in (0,{\bar{t}})\), since the gradients of g and h are continuous. This implies that also the family (10) is linearly independent when \(x \in U\cap {\mathbf {F}}(t)\) and \(t \in (0,{\bar{t}})\). \(\square \)

Theorem 5

[37, Theorem 8.4] Let \((t^k)_{k \in {\mathbb {N}}}\) be sequence of positive numbers with \(t^k\searrow 0\). Let \((x^k)_{k \in {\mathbb {N}}}\) be a sequence such that \(x^k\) is a stationary point of \(REL(t^k)\), for all \(k \in {\mathbb {N}}\).

Assume that \(x^k \xrightarrow {k \rightarrow +\infty }x^*\), with \(x^*\) satisfying MPCC–LICQ. Then \(x^*\) is a stationary point for REL(0).

Proof

First of all, note that from Lemma 3 \({\bar{x}}\) is feasible for REL(0). Moreover, in view of Lemma 4, for k sufficiently large, \(x^k\) satisfies standard LICQ for REL(\(t^k\)). This implies that since \((x^k)\) is a sequence of stationary points of REL(\(t^k\)), there exist multipliers \((\lambda ^k,\mu ^k,\xi ^k,\eta ^k,\delta ^k)\) such that the following KKT conditions hold:

$$\begin{aligned} \begin{aligned} \nabla f(x^k) =&\sum _{i \in I} \lambda ^k_i \nabla g_i(x^k) + \sum _{i \in E} \mu ^k_i \nabla h_i(x^k) \\&+\sum _{j \in B} \left( \xi ^k_j-\delta ^k\left( 1-x^k_j\right) \right) {\mathbf {e}}_j - \sum _{j \in B} \left( \eta ^k_j - \delta ^k x^k_j\right) {\mathbf {e}}_j,\\ \text {with}\quad&\lambda ^k_i g_i(x^k)=0, \quad \forall i \in I, \qquad \xi ^k_j x^k_j =0, \quad \forall j \in B,\\&\eta ^k_j \left( 1-x^k_j\right) =0, \quad \forall j \in B, \\&\delta ^k \left( \sum _{j \in B}x^k_j\left( 1-x^k_j\right) -t^k \right) =0,\\&\lambda ^k \ge 0, \; \xi ^k \ge 0, \; \eta ^k \ge 0, \; \delta ^k \ge 0. \end{aligned} \end{aligned}$$
(12)

Now consider the sequence of multipliers \((\lambda ^k,\mu ^k,\gamma ^k,\nu ^k)\) with

$$\begin{aligned} \gamma ^k:= \xi ^k-\delta ^k(1-x^k), \quad \nu ^k:=\eta ^k-\delta ^kx^k. \end{aligned}$$

Assume that \((\lambda ^k,\mu ^k,\gamma ^k,\nu ^k)\) is unbounded. Then \(\beta ^k:=||(\lambda ^k,\mu ^k,\gamma ^k,\nu ^k)|| \rightarrow +\infty \) as \(k \rightarrow +\infty \). Since \(\big |\big |\big (\frac{\lambda ^k}{\beta ^k},\frac{\mu ^k}{\beta ^k},\frac{\gamma ^k}{\beta ^k},\frac{\nu ^k}{\beta ^k}\big ) \big | \big |=1\), possibly passing to a subsequence, we can conclude that \((\lambda ^k,\mu ^k,\gamma ^k,\nu ^k)/\beta ^k \rightarrow ({\bar{\lambda }},{\bar{\mu }},{\bar{\gamma }},{\bar{\nu }}) \ne 0 \). In this case, if we divide the first equation in (12) by \(\beta ^k\), then as \(k \rightarrow +\infty \) it converges to

$$\begin{aligned} 0=\sum _{i \in I}{\bar{\lambda }}_i\nabla g_i(x^*) + \sum _{i \in E}{\bar{\mu }}_i \nabla h_i (x^*)+\sum _{j \in B}{\bar{\gamma }}_j{\mathbf {e}}_j - \sum _{j \in B}{\bar{\nu }}_j {\mathbf {e}}_j, \end{aligned}$$

with \({\bar{\lambda }}_i g_i(x^*)=0, \; \forall i \in I, \quad {\bar{\gamma }}_jx^*_j=0, \; \forall j \in B, \quad {\bar{\nu }}_j\left( 1-x^*_j\right) =0, \; \forall j \in B\).

Therefore

$$\begin{aligned} 0=\sum _{i \in I_g(x^*)}{\bar{\lambda }}_i\nabla g_i(x^*) + \sum _{i \in E}{\bar{\mu }}_i \nabla h_i (x^*)+\sum _{j \in I_0(x^*)}{\bar{\gamma }}_j{\mathbf {e}}_j - \sum _{j \in I_1(x^*)}{\bar{\nu }}_j {\mathbf {e}}_j \end{aligned}$$

which contradicts the assumption of MPCC–LICQ at \(x^*\). Hence we can conclude that, possibly passing to a subsequence, there exist \((\lambda ^*,\mu ^*,\gamma ^*,\nu ^*)=\lim _k (\lambda ^k,\mu ^k,\gamma ^k,\nu ^k)\) such that

$$\begin{aligned} \begin{aligned}&\nabla f(x^*) = \sum _{i \in I} \lambda ^*_i \nabla g_i(x^*) + \sum _{i \in E} \mu ^*_i \nabla h_i(x^*) + \sum _{j \in B} \gamma ^* {\mathbf {e}}_j - \sum _{j \in B} \nu ^*{\mathbf {e}}_j,\\&\lambda \ge 0, \qquad \lambda ^*_i g_i(x^*)=0, \quad \forall i \in I,\\&\gamma ^*_j x^*_j=0, \quad \forall j \in B,\qquad \nu ^*_j\left( 1-x^*_j\right) =0, \quad \forall j \in B, \end{aligned} \end{aligned}$$

\(\Rightarrow (x^*;\; \lambda ^*,\mu ^*,\gamma ^*,\nu ^*)\) satisfies the conditions for MPCC to stationarity stated in Definition 2. \(\square \)

Note that, in contrast to the penalty method, we do not need to prove that the limit point is actually feasible for Problem (3). This is the result of a different approach: the feasible set is restricted at each iteration, converging to the region of feasibility of the original problem, while in the penalty approach it remains constant.

We implement Algorithm 2 for the solution of Problem (3) with the relaxation method outlined above. We use IPOPT [42] for the solution of the relaxed problems and Vio(x) is the same function defined in (5). Since REL(0) fails to satisfy standard constraint qualifications, REL(t) can be difficult to solve for nonlinear solvers when t becomes very small. Therefore, by putting a minimum bound on it within the algorithm, we prevent t from becoming too small in value so that it does not cause pathological behaviour. Nonetheless, at practical tolerances set for the NLPs, t does not seem to get too small to negatively impact convergence.

figure b

In summary, we have presented two different approaches for the solution of Problem (3). Both approaches employ the iterative solution of smooth nonlinear programs. Nonetheless, both penalty and relaxation approaches are local methods and so converge to stationary points of Problem (3). If the nonlinear constraints are nonconvex, which is the case in our problem, then the convergence is highly influenced by initial guess and algorithmic parameters. Finally, we note that water distribution networks have a particularly sparse structure, which is retained by constraints (1b)–(1d). Consequently, the nonlinear programs within penalty and relaxation methods can be efficiently solved using tailored sparse techniques, offering scalable approaches for large scale water networks.

6 Case study

The developed mathematical programming approaches for the solution of the optimization problem (1) have been evaluated using a benchmarking network model. All computations were performed within MATLAB 2015a-64 bit for Windows 7, installed on a 2.50GHz Intel\(^\circledR \) Xeon(R) CPU E5-2640 0 with 18 Cores. The solver BONMIN (v1.8.3) [7] with the branch-and-bound algorithm B-BB is used for the direct solution of the MINLP original problem. The continuous relaxed sub-problems within the branch-and-bound algorithm are solved using the interior point solver for large scale nonlinear optimization IPOPT (v3.12.3) [42]. Similarly, all the NLP subproblems considered by penalty and relaxation methods were solved using IPOPT. Both BONMIN and IPOPT were implemented in Matlab through the interface provided by OPTI Toolbox [12]. Moreover, in the implementation of IPOPT we directly supply the solver with sparse gradients and Jacobians, in order to take advantage of the very sparse structure of our problem. IPOPT has an option to approximate the Hessian of the optimization problem by a limited-memory quasi-Newton method (L-BFGS) [25]. We have tested the behaviour of the solvers with an exact sparse Hessian and an L-BFGS approximation. It was noted that better performances were obtained using L-BFGS approximation, both in terms of number of iterations and quality of solutions. In our opinion, this possibly counter-intuitive behaviour is caused by the nonconvexity and ill-conditioning of the optimization problem: the regularization actuated by L-BFGS is more effective than the one directly operated by IPOPT on the exact Hessian matrix. The linear systems within IPOPT were solved using the linear solver MA57 [14], which is well suited for sparse linear systems.

The selected benchmarking network model has been studied in [1, 13, 15, 24, 29, 30, 40], see Fig. 1a. The network has 22 nodes, 37 pipes and 3 reservoirs. Details on pipes’ characteristics, nodal demands and reservoirs’ levels are presented in [13, 24]. We set minimum pressure at each node to 30 m and maximum velocity in each pipe to 1 m/s. Moreover, in contrast to [15], we include in our formulation dynamic demand scenarios, one for each hour of the day. The resulting optimization problem has 2304 continuous variables and 74 binary variables, with 529 equality constraints and 3589 inequality constraints. Since, as stated in Sect. 2, some nonlinear constraints in (1) are nonconvex, gradient-based nonlinear programs solvers will converge only to local minima. The same holds for BONMIN, whose branch-and-bound algorithm is based on the solution of continuous relaxations of the original MINLP through the solver IPOPT. In order to provide global optimality bounds, it is necessary to consider convex relaxations of the nonlinear constraints in (1). While possible relaxation approaches have been outlined in [19, 22], the study of tailored convex relaxations for the problem of optimal placement and operation of control valves in water distribution networks is outside the scope of this manuscript and will be investigated in future work.

We modify few BONMIN options in order to improve the quality of the local solutions in the considered nonconvex case, as suggested in [8, Section 5.3]. In particular, we set the options allowable_gap and allowable_fraction_gap to negative values , dynamic_def_cutoff_decr to ’yes’ and num_resolve_at_root to 50. All other BONMIN options were set to the default values by the advanced settings interface provided by OPTI Toolbox.

Moreover, for numerical reasons, we implement the non-normalized version of the objective function in Problem (1), since it seems to fit better with the scale of the problem: \(\sum ^{n_l}_{k=1}\sum ^{n_n}_{i=1}w_i p^k_i\). In the recent work [13], a problem formulation with multiple demand conditions for the same network was provided, but BONMIN was reported to fail in the case where 24 demand patterns were considered. Also in our initial implementation BONMIN failed to converge sometimes or took a long computational time to converge. We have overcome these numerical issues by substituting (only when BONMIN was used) the constraint (1f) with the inequality \(\sum _{j=1}^{2n_p} v_{j} \le n_v\), which results in an equivalent optimization problem from the engineering perspective since having fewer control valves can never improve performance. Moreover, the direct implementation of sparse gradients and Jacobians has improved the performance of the solver, allowing convergence to good local optimal solutions for this network.

Table 1 Optimal placement of control valves with BONMIN

Table 1 summarizes the results. The optimal control valve placements are obtained starting from random initial guesses. Most of the locations coincide with those reported in [2, 30], where a control valve placement problem was solved by coupling genetic algorithms and the hydraulic simulator EPANET with the objective to minimize leakage. This observation demonstrates that our framework can be effectively applied to optimal placement of control valves where the leakage management is explicitly formulated. Furthermore, we tested the application of BONMIN with default options, using the same random starting points employed for the first set of simulations. The solver found the same local optima as in Table 1, when tailored options for nonconvex problems were employed. Concerning computational performance, CPU time was less when default BONMIN options were used, for \(n_v=1,2,3\). On the other hand, in the cases of 4 and 5 control valves, convergence times with the two different options’ choices were comparable. These CPU times may appear counterintuitive, since using BONMIN options for nonconvex problems require the solution of more continuous relaxations by IPOPT. However, this behaviour can be explained by the specific topological characteristics for this case study. In fact, as reported in [30], multiple location combinations with similar performance can be found in the case of \(n_v=4,5\), while this is not observed for \(n_v=1,2,3\). This suggests that the nonconvexity of the problem is enhanced for \(n_v=4,5\) and the choice of tailored options helps BONMIN to achieve convergence to the (local) optima. In Table 1, we report the computational time for BONMIN to converge with and without the additional tailored options for nonconvex problems.

Table 1 also shows the minimum AZP achieved when optimally placing a different number of control valves. When we move from an optimal configuration with \(n_v\) control valves to an optimized set of \(n_v +1\) control valves, AZP has a slight decrement. Nonetheless, note that a relatively small decrease in AZP can result in significant reduction in the amount of leakage [26]. An in depth cost–benefit analysis is a crucial stage for the evaluation of advanced and optimal control schemes and, as shown in this benchmarking study, our mathematical optimization framework can provide effective tools to support the design process for smart water distribution networks.

With reference to the numerical results reported in [13], where BONMIN was reported to fail when considering the optimal placement of more than 2 control valves, we demonstrate that the computational performance of our implementation is reasonable for the presented case study. Nonetheless, more issues can arise when dealing with large scale networks (i.e. with thousands of nodes and pipes), where the number of integer variables grows, the algorithmic complexity of branch and bound becomes impractical [17]. This is the main motivation for the study of possible alternative scalable approaches. We apply reformulation methods outlined in Sects. 4 and 5, respectively, to the case study. Average zone pressure is minimized for an extended period operation with 24 demand scenarios.

With reference to Algorithms 1 and 2, different parameters choices were tested. The penalty method was implemented with \(\alpha =10^{-2}\) and \(\beta =1.1\) as proposed in [13]; in what follows, we refer to Algorithm 1 with these settings as PEN1. In a similar fashion to the approach considered in [28], we also tested the choices \(\alpha =1\) and \(\beta =10\), which we denote as PEN2. The relaxation method was implemented with \(c=10^{-4}\). In order to assess the reliability of the proposed algorithms, we used 100 randomly generated initial guesses for NLPs. The best AZP values were obtained with the solutions provided by BONMIN; therefore, we compare the solutions of reformulation approaches with those of BONMIN. For each number of control valves, we define the following. Let \(\text {AZP}_\text {best}\) be the average zone pressure corresponding to the best solution found by us and let \(\text {AZP}_\text {PEN1}(k), \text {AZP}_\text {PEN2}(k), \text {AZP}_\text {REL}(k)\) correspond to the solutions obtained with initial point \(x_0(k)\) using PEN1, PEN2 and REL, respectively. Then let \(\forall \delta \in [0,1]\):

$$\begin{aligned} \begin{aligned} \text {P}_\text {PEN1}(\delta ):=\bigg |\bigg \{ k \in \{1,\ldots ,100\} \, \big | \, \text {AZP}_\text {PEN1}(k)-\text {AZP}_\text {best}<\delta \bigg \}\bigg |, \\ \text {P}_\text {PEN2}(\delta ):=\bigg |\bigg \{ k \in \{1,\ldots ,100\} \, \big | \, \text {AZP}_\text {PEN2}(k)-\text {AZP}_\text {best}<\delta \bigg \}\bigg |, \\ \text {P}_\text {REL}(\delta ):=\bigg |\bigg \{ k \in \{1,\ldots ,100\} \, \big | \, \text {AZP}_\text {REL}(k)-\text {AZP}_\text {best}<\delta \bigg \}\bigg |, \end{aligned} \end{aligned}$$

be the percentages of solutions whose AZP is less than \(\delta \) above the best one. In Fig. 2 we report the plots of \(\text {P}_\text {PEN1}, \text {P}_\text {PEN2}, \text {P}_\text {REL}\) as functions of \(\delta \), for all number of control valves considered in our experiment. The three reformulation algorithms have consistently converged to a solution whose AZP is less than 1 m above the best solution. Given the level of uncertainty in the modelling of water distribution networks, we can consider the application of reformulation methods sufficiently accurate. In particular, the Fig. 2d, e confirm that for this case study the problem has the properties already observed in [30]: when \(n_v=4,5\) there exist multiple stationary points with similar AZP. In these cases, PEN2 is shown to be the most reliable, since in all instances it has converged to a solution whose AZP is less than 0.2 m above the best solution. On the other hand, the relaxation method seems to be the more sensitive to the choice of the initial guess.

Fig. 2
figure 2

Reliability of the reformulation methods for the placement of different number of control valves, with multiple starting points. On the vertical axis we indicate the percentage of solutions whose AZP is less than \(\delta \) above the best one. a \(n_v=1\), b \(n_v=2\), c \(n_v=3\), d \(n_v=4\), e \(n_v=5\)

We also tested the three algorithms starting from a point which satisfies hydraulic equations for the case study network without control valves. When \(n_v=1,2,3\) the three methods converge to the best solution, while for \(n_v=4,5\) they all converge to solutions whose AZP values are less than 0.5 m above the best solution. Given the level of uncertainty present in the modelling of water distribution networks, these performances suggest that all three algorithms (PEN1, PEN2 and REL) can be effectively used in practice. Average computational time required to converge by the reformulation approaches was significantly smaller than that of BONMIN. Moreover, as shown in Table 1, the computational cost of the branch and bound approach grows more than linearly as the number of control valves is increased. On the other hand, the reformulation methods do not show this behaviour since their complexity does not depend on the number of control valves but the network model size; the CPU times in Table 1 for different number of control valves are comparable. The two penalty approaches had very similar performances in terms of quality of solutions; however, as shown in Table 2, PEN1 requires more computational time to converge to a local solution. Therefore, for the present case study, Algorithm 1, with \(\alpha =1, \beta =10\) seems to be the best choice. The good performance of the penalty approach is in line with the results in [6], where the coupling of penalty methods and interior point algorithms was shown to be effective for a library of mathematical programs with complementarity constraints. Moreover, the penalty method can probably be improved by better update strategies for the penalty parameter \(\rho .\) As shown in [28], in order to improve the robustness of the algorithm the increment of \(\rho \) should be included within the inner iterations of the interior point method, updating the parameter only if the complementarity violation of the current IPOPT barrier iterate is larger than a fixed bound.

Table 2 Average computational time required by the three tested algorithms to converge to local optimal solutions, compared with the application of BONMIN when default options are implemented

The study in [28] has proved that, when an interior point method is applied, there is a one-to-one correspondence between KKT points of PEN(\(\rho \)) and REL(t). Nonetheless, the solution of penalized and relaxed nonlinear programs presents different numerical and practical challenges. In fact, we observe that the feasible sets of relaxed nonlinear programs considered in our manuscript are disconnected and their interior is progressively reduced to the empty set as the relaxation parameter goes to zero - see Problem (6). This affects the reliability and robustness of interior point solvers like IPOPT. In comparison, the feasible set of penalized problems is not modified whereas their objective functions are updated as the penalty parameter increases. These characteristics of the feasible sets involved in relaxation and penalty algorithms can explain the differences in terms of reliability and robustness between relaxation and penalty methods reported in Fig. 2. The investigation of algorithmic modifications to adapt these nonlinear programming methods for design problems in large scale water distribution networks is the subject of further work.

7 Conclusions

We have presented a rigorous mathematical framework for the optimal placement and operation of control valves in water distribution networks, addressing the problem of minimizing average zone pressure. The resulting optimization problem belongs to the class of mixed integer nonlinear programs. We have implemented and evaluated a direct mixed integer solver, using a benchmarking water network. The solver has provided good quality solutions but can have infeasibly large computational time when dealing with large scale networks, where the number of binary variables grows. Therefore, we have investigated two alternative approaches in order to solve the optimization problem: the penalty and relaxation methods. We have presented a theoretical formulation for both methods, with a review of principal convergence results and algorithmic properties We have then applied these to the optimal placement and operation of control valves in a water network. The methods demonstrated good performance in terms of quality of the solutions and significant reduction in CPU time compared to standard branch and bound approaches. We tested different strategies for tuning the parameters within the penalty approach, improving the performance in comparison to previous solutions. In addition, we considered the relaxation approach for the solution of mathematical programs with complementarity constraints in the framework of optimal control valve placement and operation in water supply systems.

The penalized and relaxed problems we have presented have sparse nonlinear structures. These problems were efficiently solved using tailored techniques for large sparse nonlinear programs. The presented work informs the application of penalty and relaxation approaches to the optimization problem for large-scale operational water distribution networks.