A fast algorithm for quadratic resource allocation problems with nested constraints

We study the quadratic resource allocation problem and its variant with lower and upper constraints on nested sums of variables. This problem occurs in many applications, in particular battery scheduling within decentralized energy management (DEM) for smart grids. We present an algorithm for this problem that runs in $O(n \log n)$ time and, in contrast to existing algorithms for this problem, achieves this time complexity using relatively simple and easy-to-implement subroutines and data structures. This makes our algorithm very attractive for real-life adaptation and implementation. Numerical comparisons of our algorithm with a subroutine for battery scheduling within an existing tool for DEM research indicates that our algorithm significantly reduces the overall execution time of the DEM system, especially when the battery is expected to be completely full or empty multiple times in the optimal schedule. Moreover, computational experiments with synthetic data show that our algorithm outperforms the currently most efficient algorithm by more than one order of magnitude. In particular, our algorithm is able to solves all considered instances with up to one million variables in less than 17 seconds on a personal computer.


Introduction 1.Resource allocation problems and energy management
The resource allocation problem is a classical and well-researched problem in the optimization and operations research literature.The objective of the resource allocation problem is to divide a fixed amount of resource (e.g., time, money, energy) over a set of activities while minimizing a given cost function (or maximizing a given utility function).In the most studied version of this problem, the cost functions are quadratic, which leads to the following formulation of the so-called quadratic resource allocation problem (QRAP): QRAP: min where a ∈ R n >0 , R ∈ R, l, u ∈ R n , and N := {1, . . ., n}.The problem QRAP has been studied extensively over the last decades due to its wide applicability in, among others, engineering, finance, and machine learning (see also the surveys in [32,33]).As a consequence, many efficient algorithms have been developed for this problem and its generalizations.
In this article, we study an extension of QRAP, namely the QRAP with lower and upper constraints on nested sums of variables (QRAP-NC).This problem can be formulated as follows: QRAP-NC: min where N j := {1, . . ., j} for j ∈ N \{n}, L, U ∈ R n−1 , and we define L n = U n = R for convenience.Note that if L j = U j for some j ∈ N n−1 , we may split up the problem QRAP-NC into two smaller instances of QRAP-NC that involve the variables x 1 , . . ., x j and x j+1 , . . ., x n respectively.Thus, we assume without loss of generality that L j < U j for all j ∈ N n−1 .Moreover, we may assume that L 1 = l 1 , U 1 = u 1 , and L j ≥ L j−1 + l j and U j ≤ U j−1 + u j for j ∈ N n−1 \{1}.
The problem QRAP-NC has numerous applications in, among others, machine learning, telecommunications, and speed optimization problems (see also the overviews in [1,42]).Our particular motivation for studying QRAP-NC is its application in decentralized energy management (DEM) for smart distribution grids.In DEM, the goal is to optimize the joint energy consumption of multiple devices within, e.g., a neighborhood.In a DEM system, devices optimize their own consumption locally but this local optimization is coordinated to obtain certain global objectives (hence the term "decentralized").In the context of DEM, we are interested in optimization of storage devices such as electrical batters and heat buffers.Energy storage devices plays an important role in DEM systems since they are quite flexible in their energy usage and are thus suitable to compensate for peak consumption or production of energy in the distribution grid (see, e.g., [36,29,46]).
One important example of a device-level optimization problem within DEM is the scheduling of a battery within a neighborhood.We consider the situation where the charging and discharging of the battery has to be scheduled over a set N of equidistant time intervals, each of length ∆t.Given the power profile p := (p i ) i∈N of the neighborhood, the goal is to determine for each time interval i ∈ N the charging power x i of the battery during this interval so that the combined battery and neighborhood profile is flattened as much as possible.Aiming for this goal reduces the stress put on the grid and the risk of blackouts.The (physical) restrictions of the battery are given by a minimum and maximum charging rate X min and X max and a capacity C. Given the amount of energy present in the battery (the state-of-charge (SoC)) at the start and end of the scheduling horizon, denoted by S start and S end , we can formulate the resulting device-level optimization problem as follows (see also [41]): BATTERY: min Note that this is an instance of QRAP-NC by applying the variable transform y := p + x.
An important feature within the DEM paradigm is that device-level problems have to be solved locally.This means that the corresponding device-level optimization algorithms are executed on embedded systems with limited computational power (see, e.g., [5]) that are located within, e.g., households.Since these algorithms are called multiple times with the DEM system as a subroutine, it is important that these algorithms are very efficient.Therefore, efficient and tailored device-level optimization algorithms are crucial ingredients for the real-life implementation of DEM systems.In particular, for the optimization of storage devices, this means that fast and tailored algorithms to solve QRAP-NC are crucial.For more background on DEM, we refer to [39,12].

Background and contribution
There is a rich literature on solution approaches for QRAP-NC with only upper nested constraints on sums of variables, i.e., only nested constraints of the form i∈Nj x i ≤ U j , j ∈ N n−1 are given.This case has been studied mainly in the context of convex optimization over submodular constraints (see, e.g., [18,19,43]).However, the literature on the general case of QRAP-NC is limited.The authors in [41] propose an infeasibility-guided divide-and-conquer algorithm, to which we shall refer in this article as ALG inf .This algorithm solves a relaxation of the problem where the nested constraints are ignored and, subsequently, splits up the problem into two smaller instances of QRAP-NC at the variable for which the lower or upper nested constraint is violated most in the solution to the relaxation.The worst-case time complexity of this algorithm is O(n 2 ).Furthermore, [42] proposes a decomposition-based algorithm, hereafter referred to as ALG dec , that solves QRAP-NC in O(n log n) time.This algorithm decomposes QRAP-NC into a hierarchy of QRAP subproblems whose single-variable bounds are optimal solutions to QRAP subproblems further down in the hierarchy.Currently, this is the most efficient algorithm for QRAP-NC.
As mentioned before, we are interested in algorithms for QRAP-NC that are fast in practice.Although the decomposition-based algorithm ALG dec has a good worst-case time complexity, we observe several disadvantages of this approach that may make it less favorable in practice than its worst-case time complexity suggests: 1.Each level of recursion within ALG dec solves a series of instances of QRAP whose parameters are determined by optimal solutions to multiple instances of QRAP on earlier levels.Since each instance is solved from scratch, much time is spent on initializing the subproblems.
2. ALG dec achieves for each level of recursion an O(n) time complexity by solving the QRAP subproblems using an O(n) time algorithm such as the ones in [27].These O(n) time algorithms repeatedly call linear-time algorithms such as [6] to find the median of a set.However, these median-find algorithms are relatively slow in practice due to a big constant factor in their complexity [6].Moreover, they are significantly more difficult to implement than simple sorting or sampling-based strategies [26,2].
To alleviate these issues, we propose in this article a new algorithm for QRAP-NC, called ALG seq , which has the same time complexity as ALG dec , namely O(n log n), but in contrast requires only relatively simple and fast subroutines to attain this complexity.As a consequence, this algorithm is both faster in practice and easier to implement than ALG dec .These are generally more important criteria for the actual adaptation of a given algorithm than the polynomial worst-case time complexity [30].Our algorithm builds upon the monotonicity results for QRAP-NC derived in [42] and solves a sequence of QRAP subproblems that have a sequential nested structure rather than the divide-and-conquer structure of both ALG dec and ALG inf .More precisely, for each j ∈ N , the j th subproblem involves only the first j variables x 1 , . . ., x j .As a consequence, our approach can solve its first j subproblems without any knowledge on the parameters involving indices higher than j, whereas both ALG inf and ALG dec require all problem parameters to be known a priori.This makes our algorithm particularly useful in situations where problem parameters arrive over time.This is, e.g., the case when each variable denotes a decision for a specific time slot and all parameters related to this time slot become available only during or at the start of this time slot.Moreover, due to the nested structure, each input and bookkeeping parameter is accessed within a relatively small time period instead of frequently throughout the entire course of the algorithm.This is beneficial for caching since this increases the number of times a value can be accessed quickly from a cache instead of relatively slowly from the main memory.
We attain the O(n log n) complexity using an efficient implementation of double-ended priority queues [28,8] for several bookkeeping parameters.This data type supports insertion of arbitrary elements and finding and deletion of minimum and maximum elements in at most O(log n) time.Our approach requires O(n) of such operations, which leads to an overall time complexity of O(n log n).Double-ended priority queues can be implemented using specialized data structures such as min-max heaps [4] or by a simple coupling of a standard min-heap and max-heap (see also [8]).The latter heaps are one of the most basic data structures and many efficient implementation exist for different programming languages [9].Thus, we can achieve the time complexity of O(n log n) using relatively simple data structures, as opposed to ALG dec , where a more involved implementation of a linear-time median algorithm is required.
Our algorithm for QRAP-NC also leads to efficient and fast algorithms for instances of QRAP-NC where we replace each term 1 2 x 2 i ai by a i f ( xi ai ) for each i ∈ N with a given convex function f .Such a structure is present in many applications considered in the literature, in particular in most of the applications surveyed or evaluated in [1,42].We obtain such efficient algorithms by a reduction result in [37], which states that any optimal solution to an instance of QRAP-NC is also optimal for this instance when we take as objective function i∈N a i f ( xi ai ).As a consequence, our algorithm solves also such problems in O(n log n) time.This leads to faster algorithms for a wide range of practical problems, including the vessel speed optimization problem [31,24] and processor scheduling with agreeable deadlines [23,16].
We evaluate the performance of our algorithm ALG seq and compare it to the state-of-the-art algorithms ALG inf and ALG dec .For this evaluation, we use both synthetic instances and realistic instances of the battery scheduling problem BATTERY using real power consumption data as input.With regard to the realistic instances, we compare our approach to a tailored implementation of ALG inf using DEMKit, an existing simulation tool for DEM research [21].Within DEMKit, the battery scheduling problem is used as a subroutine within a distributed optimization framework that coordinates the energy consumption of multiple devices [17].Our results indicate that the number of tight nested constraints in an optimal solution greatly influences which algorithm is faster for a given problem instance.In particular, ALG seq is on average faster than ALG inf , except when the percentage of tight nested constraints is relatively low (less than 2%).Moreover, the execution time of ALG seq is more stable than that of ALG inf , which makes our algorithm more suitable for use in DEM systems that employ a high level of parallelism (see, e.g., [20]).With regard to the synthetic instances, we study the scalability of ALG seq , ALG inf , and ALG dec .Our results indicate that both our algorithm ALG seq and ALG inf are at least one order of magnitude faster than ALG dec and that ALG seq is on average almost twice as fast as ALG inf .In particular, ALG seq solves instances with up to one million variables in less than 17 seconds.
The outline of the remainder of this article is as follows.In Section 2, we present a simple procedure to solve QRAP, which forms an important ingredient for our eventual approach for solving QRAP-NC.In Section 3, we present an initial sequential algorithm ALG seq for solving QRAP-NC with an O(n 2 ) worstcase time complexity.Based on this algorithm, we derive in Section 4 an O(n log n) time algorithm for this problem.In Section 5, we evaluate the performance of this algorithm and compare it to the state-of-the-art.Finally, we provide our conclusions in Section 6.

A breakpoint search algorithm for QRAP
In this section, we discuss a simple approach to solve QRAP that belongs to the class of so-called breakpoint search methods [27,33] that structurally search for the optimal Lagrange multiplier corresponding to the resource constraint (1).This approach forms an important ingredient of our O(n log n) time algorithm for QRAP-NC in Section 4.
We start by considering the Lagrangian relaxation of QRAP: where δ ∈ R is the Lagrange multiplier corresponding to the resource constraint (1).We denote the optimal solution to this problem by x[δ] := (x i [δ]) i∈N .Since the objective function of this problem is separable, the optimal solution to QRAP[δ] is given by Observe that x i [δ] is a continuous piecewise linear non-decreasing function of δ.More precisely, x i [δ] is constant for δ ≤ li ai , linear with slope a i for δ ∈ [ li ai , ui ai ], and again constant for δ ≥ ui ai (see also Figure 1).For each i ∈ N , we call the points where x i [δ] has "kinks", i.e., where x i [δ] is non-differentiable, the breakpoints of x i [δ].We denote these breakpoints for i ∈ N by α i and β i respectively, i.e., α i := li ai and β i := ui ai , where we refer to α i as the lower breakpoint of x i [δ] and to β i as the upper breakpoint of x i [δ].We denote the multiset of lower breakpoints by A := {α i | i ∈ N } and the multiset of upper breakpoints by B := {β i | i ∈ N }.The reason for defining A and B as multi sets is so that we can readily associate each breakpoint value in the set with one index in N .
Note that also the sum z[δ] := i∈N x i [δ] is continuous, piecewise linear, and non-decreasing.Moreover, it has 2n breakpoints, namely those of all terms x i [δ].Thus, the multiset of breakpoints of z[δ] is given by A ∪ B. Feasibility of the original problem QRAP implies that there exists a value δ for the Lagrange multiplier δ such that z[ δ] = R, meaning that x[ δ] is optimal not only for QRAP( δ) but also for the original problem QRAP.Note that this multiplier is not necessarily unique: in general, there may exist an interval Our approach to find the value δ consists two steps.First, we aim to find two consecutive breakpoints δ 1 and δ 2 such that δ 1 ≤ δ ≤ δ 2 .Since z is non-decreasing, this is equivalent to finding two consecutive breakpoints δ 1 and δ 2 such that z . For this, we may consider all breakpoints in A ∪ B in non-decreasing order until we have found the first, i.e., smallest, breakpoint δ such that δ < δ.In detail, for each candidate breakpoint δ, we compute z[δ] and if z[δ] > R, we set δ 2 := δ and δ 1 as the previously considered breakpoint.To compute z[δ] efficiently, we keep track of the sums and update these values each time a new breakpoint has been considered (see Table 1).
Type of δ Update P (δ) Update Q(δ) Table 1: Updating the bookkeeping sums P (δ) and Q(δ) when searching the breakpoints in nondecreasing order.
In a second step, given the consecutive breakpoints δ 1 and δ 2 with δ ∈ [δ 1 , δ 2 ], we determine δ and x[ δ].Note that, since x[δ] is non-decreasing, we have for each i ∈ N : Thus, given δ 1 and δ 2 , we know whether a given variable x i [ δ] equals its lower bound l i , its upper bound u i , or is strictly in between these bounds.To find x i [ δ] for those variables that are strictly in between their bounds, note that, by definition of , from which we can directly compute x i [ δ] by x i [ δ] = a i δ.Algorithm 1 summarizes the sketched approach.To efficiently compute the minimum breakpoint δ k , we can implement the multisets A and B as sorted lists.As a consequence, each iteration of the algorithm takes O(1) time.Since the maximum number of iterations is 2n (one for each breakpoint), the overall complexity of this approach is O(n log n) due to the initial sorting of the breakpoints.If this sorting is if δ i is lower breakpoint (δ i = α i ) then 15: A := A\{α i } 17: else 18: end if

21:
end if 22: until multiplier δ has been found 23: return Optimal solution x := x[ δ] given (for example if the breakpoints have already been sorted in a previous run of the algorithm), the time complexity of the algorithm reduces to O(n).
We conclude this subsection with two observations that are crucial for the efficiency of our algorithm for QRAP-NC presented in the following section: 1. Instead of searching the breakpoints in non-decreasing order, we may also search them in nonincreasing order and continue the search until we find the first, i.e., largest breakpoint δ 1 such that δ 1 < δ.
2. Solving two instances of QRAP that differ only in the value of R in the resource constraint (1) can be done simultaneously in one run of Algorithm 1.This is because the multisets of the breakpoints for these two instances of QRAP are the same.Thus, we can modify Algorithm 1 such that it continues the breakpoint search after the optimal multiplier for the smallest of the given values of R has been found.Note that, essentially, the optimal multiplier for a given value R serves as the starting candidate for the optimal multiplier for instances with a higher value of R.This is in fact one of the two crucial observations for our approach for solving the QRAP subproblems, which we discuss further in Section 4.2.
3 An initial sequential algorithm for QRAP-NC In this section, we present our initial sequential algorithm for the problem QRAP-NC.This algorithm solves the problem as a sequence of 2n−1 instances of QRAP whose single-variable bounds (2) are optimal solutions to previous QRAP subproblems.For this, we consider a sequence of restricted subproblems where we take into account only a subset of the variables.More precisely, we define for each j ∈ N and C ∈ R the following subproblem: QRAP-NC j (C) : min Throughout this article, we denote the optimal solution to this subproblem by x j (C) := (x j i (C)) i∈Nj , where we use the brackets (•) instead of [•] to emphasize the distinction of this solution from an optimal solution x[δ] of the Lagrangian relaxation QRAP(δ) of QRAP.Note that this optimal solution is unique since the objective function of the corresponding problem is strictly convex and all constraints are linear.Moreover, observe that the n th subproblem QRAP-NC n (R) is equal to the original problem QRAP-NC.
The key ingredient to our algorithm is that we can replace the nested constraints ( 5) by specific singlevariable constraints without changing the optimal solution.By doing this, we transform an instance of QRAP-NC into an equivalent instance of QRAP.More precisely, we show that each subproblem QRAP-NC j (C) yields the same optimal solution as the following instance of QRAP: QRAP j (C) : min where the bounds x j−1 (L j−1 ) and x j−1 (U j−1 ) in (7) are the optimal solutions of the problems QRAP-NC j−1 (L j−1 ) and QRAP-NC j−1 (U j−1 ) respectively.Note that the single-variable bounds for x j in (8) are the same as those of the original subproblem QRAP-NC j (C).The validity of this transformation is proven by Lemmas 1-3.First, Lemma 1 shows that the optimal solution x j (C) to the subproblem QRAP-NC j (C) is non-decreasing in C. Subsequently, Lemma 2 uses this property to show that when adding the alternative single-variable bounds (7) to the problem formulation of QRAP-NC j (C), the optimal solution x j (C) to QRAP-NC j (C) is not cut off.Finally, Lemma 3 shows that the alternative single-variable bounds (7) are stronger than the nested constraints (5).
Proof.This proof is based on the proof of Theorem 2 in [42] and given in Appendix A.1.
Proof.Let x := (x j+1 1 (C), . . ., x j+1 j (C)) be the vector of the first j components of the optimal solution to the problem QRAP-NC j+1 (C).Since x is feasible for all nested constraints (5) for k ∈ N j , this vector is also the optimal solution to QRAP-NC j (A) where A := i∈Nj x i , i.e., we have x = x j (A).
Lemma 3. If for a given j ∈ N n−1 and vector y ∈ R j we have Proof.The sum of the inequalities Since x j (L j ) and x j (U j ) are feasible for QRAP-NC j (L j ) and QRAP-NC j (U j ) respectively and k ≤ j, we have L k ≤ i∈N k x j i (L j ) and i∈N k x j i (U j ) ≤ U k and the result of the lemma follows.
Lemma 2 implies that, given optimal solutions x j−1 (L j−1 ) and x j−1 (U j−1 ), we can replace the nested constraints (5) in QRAP-NC j (C) by the single-variable bounds (7) without cutting off the optimal solution to QRAP-NC j (C).Moreover, since these single-variable bounds are stronger than the nested constraints by Lemma 3, adding these constraints does not change the optimal objective value.It follows directly that any optimal solution to QRAP j (C) is also optimal for QRAP-NC j (C).
Based on Lemmas 1-3, the following approach can be used to solve QRAP-NC.We successively solve the subproblems QRAP j (L j ) and QRAP j (U j ) from j = 1 to n−1 and finally the subproblem QRAP n (R), whereby in each step we use the optimal solutions to the preceding subproblems QRAP j−1 (L j−1 ) and QRAP j−1 (U j−1 ) as input.Note that each of the subproblems is an instance of QRAP.This approach is summarized in Algorithm 2.
Algorithm 2 An initial sequential algorithm for QRAP-NC.
Compute optimal solutions x j (L j ) and x j (U j ) to QRAP j (L j ) and QRAP j (U j ) respectively 6: end for 7: Compute optimal solution x n (R) to QRAP n (R) 8: return Optimal solution x := x n (R) Since each subproblem QRAP j (•) can be solved in O(n) time [10], the worst-case time complexity of Algorithm 2 is O(n 2 ).However, linear-time algorithms for QRAP such as [10] attain their linear time complexity by employing linear-time algorithms for median finding, which are, as already mentioned, in general slower than simple sorting-or sampling-based approaches [26,2].Note, that also the O(n log n) time algorithm in [42] attains its worst-case time complexity by using such slow linear-time algorithms as a subroutine.
In the next section, we propose an algorithm to solve QRAP-NC in O(n log n) time that, as opposed to the algorithm in [42], does not require linear-time median-finding algorithms.Instead, it only requires a simple data structure for double-ended priority queues to store several bookkeeping parameters.
We conclude this section with two remarks that may be of independent interest: 1.It can be shown that Lemmas 1-3 also hold for the case where the variables are integer-valued, i.e., x ∈ Z n (see also Theorem 5 in [42]), given that all parameters a, L, U , l, and u are also integer-valued and nonnegative.As a consequence, when solving each subproblem QRAP j (•) with integer variables, Algorithm 2 computes an optimal solution to QRAP-NC with integer variables.The worst-case time complexity of this algorithm is O(n 2 ) since each QRAP j (•) subproblem with integer variables can be solved in O(j) time [25].
2. Lemmas 1-3 can be generalized to the case where the objective function is the sum of separable convex cost functions f i , i.e., where we replace each term 1 2 ai by a convex function f i (x i ).For this more general problem, this leads to a sequential algorithm that is very similar to Algorithm 2. However, initial computational tests indicated that both this algorithm and Algorithm 2 are in practice much slower than both ALG inf and ALG dec .

A fast O(n log n) time algorithm for QRAP-NC
The sequential algorithm derived in the previous section does not match the best known time complexity of the algorithm in [42].However, we show in this section that we can implement Algorithm 2 such that its time complexity reduces to O(n log n) without requiring a linear-time median finding algorithm.Instead, we only require a data type that supports insertion of elements and the finding and removing of minimum and maximum elements in O(log n) time such as a double-ended priority queue.
The key to efficiency in our approach is that we do not explicitly compute the solution to each QRAP subproblem.Instead, we only compute an optimal Lagrange multiplier corresponding to the resource constraint (6) that characterizes the entire optimal solution to this subproblem.Subsequently, we use these multipliers to reconstruct the optimal solution to the original problem QRAP-NC using two sets of simple recursive relations that can be executed in O(n) time.In order to compute the Lagrange multipliers without explicitly storing intermediate solutions, we exploit the special structure of these multipliers and of a specific algorithm for solving QRAP.
First, in Section 4.1, we introduce some of the used notation.Second, in Section 4.2, we derive an efficient approach for computing the optimal Lagrange multipliers of the subproblems QRAP j (L j ) and QRAP j (U j ).Based on these optimal Lagrange multipliers, we derive in Section 4.3, two simple recursions to compute the optimal solution x to QRAP-NC.Finally, in Section 4.4, we present an O(n log n) algorithm for QRAP-NC and discuss an implementation that attains this worst-case time complexity.

Notation
We introduce the following notation concerning the subproblems QRAP j (L j ) and QRAP j (U j ) that we use throughout the remainder of this article.We denote for j ∈ N the lower and upper single variable bounds ( 7) and ( 8) of QRAP j (C) with C ∈ [L j , U j ] by lj := ( lj i ) i∈Nj and ūj := (ū j i ) i∈Nj , where lj i := x j−1 i (L j−1 ) and ūj i := x j−1 i (U j−1 ) for i < j, and lj j := l j and ūj j := u j .Furthermore, we denote by α j := (α j i ) i∈Nj and β j := (β j i ) i∈Nj the lower and upper breakpoints for the QRAP j (C) subproblem.We call the breakpoints corresponding to i = j, i.e., α j j and β j j , initial breakpoints since QRAP j (C) is the first subproblem, i.e., with lowest index j, in which we have to compute breakpoint values for the variable x j .Note that we can compute these breakpoints directly as α j j := lj aj and β j j := uj aj by definition of the subproblem QRAP j (C).
Furthermore, let κ j and λ j denote the optimal Lagrange multipliers for the subproblems QRAP j (L j ) and QRAP j (U j ) respectively and define κ := (κ j ) j∈N and λ := (λ j ) j∈N , where we set κ 1 := α 1  1 and λ 1 := β 1 1 .If the optimal Lagrange multiplier for a given subproblem QRAP j (L j ) is not unique, we define without loss of generality κ j as the maximum optimal Lagrange multiplier.Analogously, we define λ j as the minimum optimal Lagrange multiplier of subproblem QRAP j (U j ).Note that x j (L j ) = x j [κ j ] and x j (U j ) = x j [λ j ] by definition of the subproblems QRAP j (L j ) and QRAP j (U j ) and of κ j and λ j .Finally, for a given subproblem QRAP j (C), we define the set of its lower breakpoints as A j := {α j i | i ∈ N j } and the set of its upper breakpoints as Recall that in Section 2 we defined breakpoint sets as multi sets for convenience when solving QRAP.However, for our approach for a fast algorithm for QRAP-NC, it is crucial that the breakpoint sets do not contain duplicate elements.Therefore, in this section and the remainder of this article, we regard A j and B j as ordinary sets.

Computing the optimal Lagrange multipliers of the subproblems
The goal of this subsection is to derive an efficient approach for computing the optimal Lagrange multiplier of each QRAP subproblem in Algorithm 2 without explicitly calculating any of the intermediate optimal solutions x j (L j ) and x j (U j ) for j ∈ N .If we would follow the latter strategy, i.e., if we solve each pair of subproblems QRAP j (L j ) and QRAP j (U j ) from scratch, e.g., using Algorithm 1, we would have to explicitly compute the breakpoint sets for each pair of subproblems.This leads to O(n 2 ) computations and thus forms an efficiency bottleneck within this algorithm.
We show that we can apply the breakpoint search procedure in Algorithm 1 for solving the subproblems such that each breakpoint set A j+1 can be obtained from the previous set A j in O(1) amortized steps, i.e., the total number of steps required to carry out this construction for all j ∈ N n−1 is O(n).This can be done because of two intermediate results that we show in this subsection.First, the number of distinct values that the breakpoints can take is not O(n 2 ) but O(n).We obtain this result by unveiling a useful relation between breakpoints of consecutive subproblems, i.e., between α j , β j and α j+1 , β j+1 .Second, when constructing the breakpoint sets, each distinct breakpoint value is included in or removed from a breakpoint set at most twice during the entire procedure.For this, it is important that we solve each lower subproblem QRAP j (L j ) by considering the breakpoints in non-decreasing order and each upper subproblem QRAP j (U j ) by considering the breakpoints in non-increasing order.Together, these two results imply that the construction of the breakpoint sets requires in total O(n) additions and removals of breakpoint values.By using an appropriate data structure such as double-ended priority queues for maintaining the breakpoint sets, each of these steps can be executed in O(log n) time, which leads to an overall O(n log n) complexity for computing the optimal Lagrange multipliers κ and λ.
The outline of the remainder of this subsection is as follows.First, in Section 4.2.1, we analyze the relation between breakpoints of consecutive subproblems and show that the number of distinct breakpoint values is O(n).Subsequently, in Section 4.2.2, we use this information and the structure of Algorithm 1 to construct the breakpoint sets for each subproblem from those of the predecing subproblems.Finally, in Section 4.2.3, we discuss how the updating of the bookkeeping parameters within the breakpoint search procedure must be adjusted when applying this procedure to the subproblems QRAP j (L j ) and QRAP j (U j ).

Relation between consecutive breakpoints
We first show how we can efficiently obtain the breakpoint set of a given subproblem QRAP j+1 (C) based on the breakpoint set and optimal Lagrange multipliers of the preceding subproblems QRAP j (L j ) and QRAP j (U j ).We establish for a given j ∈ N n−1 and i < j the following relation between the subsequent lower breakpoints α j i and α j+1 i : Summarizing, we can determine α j+1 i from the previous breakpoints α j i and β j i and the optimal Lagrange multiplier κ j as follows: Analogously, we obtain the following expression for the upper breakpoint β j+1 i in terms of the previous breakpoints α j i and β j i and the optimal Lagrange multiplier λ j : Note that it follows from these relations that α j i ≤ α j+1 i and β j i ≥ β j+1 i for each j ∈ N n−1 .Moreover, note that the only values that the breakpoints can take are those of the initial breakpoints α j j and β j j or of the optimal Lagrange multipliers in κ and λ.Thus, the number of distinct values among all breakpoints is limited by 4n.

Constructing consecutive breakpoint sets
As observed at the end of Section 2, we can solve a given QRAP subproblem by searching its breakpoints either in non-decreasing or non-increasing order.In particular, we can solve all lower subproblems QRAP j (L j ) by searching the breakpoints in non-decreasing order and all upper subproblems QRAP j (U j ) by searching the breakpoints in non-increasing order.When doing this, note that for solving the upper subproblem QRAP j (U j ) we can use as breakpoint sets the sets that "remain" from the breakpoint search for the lower subproblem.More precisely, instead of the sets A j and B j that we also use as breakpoint sets for the lower subproblem QRAP j (L j ), we can use the sets {α j i ∈ A j | α j i ≥ κ j } and {β j i ∈ B j | β j i ≥ κ j } respectively.This is because κ j ≤ λ j and thus in the breakpoint search for the upper problem QRAP j (U j ) no breakpoints smaller than κ j need to be considered.
We define the sets Ãj and Bj as the sets of lower and upper breakpoints that remain to be considered after solving the subproblems QRAP j (L j ) and QRAP j (U j ) in the way described in the previous paragraph, i.e., we have We call these sets the remaining breakpoint sets of the subproblems QRAP j (L j ) and QRAP j (U j ).In the following, we relate these two remaining breakpoint sets to the breakpoint sets of the next two subproblems, i.e., to the sets A j+1 and B j+1 .For this, we focus on the relation between the lower remaining breakpoint sets Ãj and the lower breakpoint set A j+1 ; the relation between the upper remaining breakpoint set Bj and the upper breakpoint set B j+1 is analogous.
For each i ∈ N j , we consider four cases for the value of α j+1 i : 1.If κ j ≤ α j i ≤ λ j , it follows from Equation ( 9) that α j+1 i = α j i .Thus, all values in Ãj act as breakpoint values for the next subproblems, i.e., Ãj ⊆ A j+1 .
3. If α j i < κ j and β j i ≤ κ j , it follows from Equations ( 9) and ( 10) that x j i (L j ) = ūj i and β j+1 i = β j i = α j+1 i respectively.Thus, This means that α j i = β j i = β j i and lj i = ūj i = ūj i for all j > j.Thus, in all remaining subproblems, the lower and upper breakpoints of i coincide and x j i (C) = ūj i for any j > j and L j ≤ C ≤ U j , regardless of the values of the future optimal Lagrange multipliers κ j and λ j .This means that we can remove this index (variable) from the breakpoint search.
4. Finally, if α j i > λ j , it follows from Equations ( 9) and ( 10) that α j+1 i = α j i and β j+1 i = α j i respectively.Thus, Analogously to the case α j i ≤ β j i < κ j , it follows that lj i = ūj i = lj i and x j i (C) = lj i for all j > j and L j ≤ C ≤ U j .Thus, also in this case we can remove the index i from the breakpoint search.
These four cases imply that we can construct A j+1 from Ãj as follows: Analogously, we can construct B j+1 from Bj as follows: The above constructions show how the breakpoint sets evolve over the course of the algorithm.First, in this construction, at most 4n additions of breakpoint values to a breakpoint set occur.Second, during the breakpoint search procedure of Algorithm 1, breakpoints are only removed and not added.This means that updating the breakpoint steps can be done in O(n) steps, i.e., by O(n) additions and removals of breakpoint values.

Updating bookkeeping parameters
In order to to efficiently compute the sums z j [δ] := i∈Nj x j i [δ] for a given breakpoint δ, we define the following bookkeeping parameters analogously to those in the breakpoint search procedure for QRAP in Algorithm 1: Each breakpoint value κ j and λ j in a given breakpoint set acts as a collective breakpoint for one or multiple activities.As a consequence, within the breakpoint search procedure, they have the same function as the "regular" initial lower and upper breakpoint values α i i and β i i .Thus, when a breakpoint value of the form κ j or λ j has been considered, we require an efficient update of the bookkeeping sums P j (κ j ), Q j (κ j ) or P j (λ j ), Q j (λ j ) respectively.In the case of κ j , we update P j (κ j ) by subtracting from this value the sum of the lower bounds lj i of those activities i whose lower breakpoint equals κ j , i.e., for which α j i = κ j .The sum of these values is ai for each i ∈ N j and we have that α j i = κ j if and only if α j i = κ j for all j ∈ {j , . . ., j}.Analogously, we update the bookkeeping sum Q j (κ j ) by adding to this value the sum of the parameters a i for those i with α j i = κ j .This sum is i<j: Thus, the updates take the form P j (κ j ) − Q j (κ j )κ j and Q j (κ j ) + Q j (κ j ).
The updates for the case of λ j , i.e., for P j (λ j ) and Q j (λ j ), are analogous to those for the case of κ j .Table 2 provides an overview of the updates of the bookkeeping sums for both these cases for each of the four breakpoint values types α i i , β i i , κ j , and λ j .
In QRAP j (L j ) (non-decreasing search) In QRAP j (U j ) (non-increasing search) Type of δ P j (δ) Table 2: Updating the bookkeeping sums P j (δ) and Q j (δ) when searching the breakpoints in nondecreasing order (QRAP j (L j )) and non-increasing order (QRAP J (U j )).

Recovering the optimal solution to QRAP-NC
In the previous section, we found an efficient way to compute the optimal Lagrange multipliers κ j and λ j for the QRAP subproblems QRAP j (L j ) and QRAP j (U j ).In this section, we show how we can use these values to compute the optimal solution x n (R).For this, we first determine which nested constraints are tight in x n (R) and use this information to reconstruct the individual terms x n i (R) for i ∈ N .To this end, for each j ∈ N n−1 , let j denote the smallest index larger than or equal to j such that one of its corresponding nested constraints is tight in x n (R).More precisely, Furthermore, let V j denote the value of the tight nested constraint corresponding to the index j and χ j the corresponding multiplier, i.e., V j ∈ {L j , U j } and χ j ∈ {κ j , λ j }.More precisely, The main result in this subsection is that the values χ j act as optimal Lagrange multipliers for the resource constraint (6) in the subproblem QRAP n (R).As a consequence, given these values, we can calculate x(R) directly using a relation similar to the Lagrangian relaxation solution in Equation (3).
To show this result, we prove Lemmas 4 and 5. First, Lemma 4 shows how we can iteratively compute χ from the optimal multipliers κ and λ using a simple recursive relation.Second, Lemma 5 shows how we can calculate x n (R) from χ using a relation similar to that in Equation (3).
Lemma 5.For each i ∈ N , we have Proof.See Appendix A.3.
Note that, starting from χ n = κ n and using Lemma 4, we can compute the values χ j recursively as Thus, given the optimal Lagrange multipliers κ and λ, we can compute x in O(n) time as x n (R) using the two relatively simple recursions in Equations ( 11) and ( 12).

An O(n log n) time algorithm for QRAP-NC
In the previous two subsections, we derived an efficient approach to compute the optimal Lagrange multipliers κ and λ for the QRAP j (L j ) and QRAP j (U j ) subproblems and to compute from these multipliers the optimal solution x.In this subsection, we combine these two ingredients to formulate a fast and efficient algorithm for QRAP-NC (Algorithm 3).More precisely, in the first part of this subsection, Section 5.2, we present our algorithm and discuss several of its details regarding the subroutines for computing the optimal Lagrange multipliers of the QRAP j (L j ) and QRAP j (U j ) subproblems.This includes several procedures that deal with corner cases and with the updating of the breakpoint sets and the bookkeeping parameters.In the second part, Section 4.4.2,we focus on the efficiency of the algorithm.In particular, we prove in Lemma 6 that the algorithm has an O(n log n) worst-case time complexity when using an appropriate data structure.

Description of the algorithm
Algorithm 3 captures our approach for solving QRAP-NC.First, in Lines 3-13, the algorithm initializes all problem parameters, the initial breakpoint values and breakpoint sets, and the initial bookkeeping parameters.Throughout the entire algorithm, it maintains four separate sets A, B, K, and L of breakpoint values corresponding to the "source" of the values, i.e., this specifies whether they are one of the initial breakpoint values α i i or β i i or one of the optimal Lagrange multipliers κ j or λ j respectively.Second, in Lines 14-16, the algorithm applies for each j ∈ N \{1} the procedure SolveSubproblems(j) (see Algorithm 4) that computes the optimal Lagrange multipliers κ j and λ j for the two subproblems QRAP j (L j ) and QRAP j (U j ).Finally, using the obtained vectors of optimal Lagrange multipliers κ and λ, the algorithm computes in Lines 17-22 the (alternative) multiplier values χ using the recursion in Equation (12) and from these values the solution x n (R) using Equation (11).
The procedure SolveSubproblems(j) carries out the breakpoint search procedure for the subproblems QRAP j (L j ) and QRAP j (U j ) as described in Section 2 (Lines 38-59).This is done by first initializing the bookkeeping parameters for these breakpoint search procedures in Lines 39-46 and Lines 49-56 and subsequently applying the the procedures LowerSubproblem(j) (Line 47, Algorithm 5) and UpperSubproblem(j) (Line 57, Algorithm 6), which are identical in nature to Lines 5-22 of Algorithm 1.Before carrying out the breakpoint search procedure, two possible corner cases are considered in Lines 1-38 with regard to relation between the to-be-computed multipliers κ j and λ j and their predecessors κ j−1 and λ j−1 .We briefly discuss these corner cases for κ j ; the corner cases for λ j are analogous.
The first corner case occurs when κ j = κ j−1 (Lines 1-9 in SolveSubproblems(j)).This case corresponds to Lines 5-7 in Algorithm 1 , where the currently considered candidate multiplier δ i leads Algorithm 3 An O(n log n) time algorithm for QRAP-NC.

22:
Let breakpoint be δ ≡ κ k 23: Remove κ k from K Let breakpoint be δ ≡ λ k 27: end if 31: until λ j has been determined than C, i.e., z[δ i ] > C. In QRAP-NC j (L j ), this case occurs if and only if L j−1 + x j j [κ j ] > L j , i.e., if and only if ) for all i ∈ N j−1 and L j−1 + max(l j , min(a j χ j , u j )) > L j .In both cases, it is not necessary to carry out the actual breakpoint search to find κ j since either κ j = κ j−1 (the first case) or κ j = (L j − L j−1 )/a j (the second case).
Whether or not one of the above mentioned corner cases occurs partly determines whether or not we have to include the new initial breakpoint values α j j and β j j in the breakpoint search procedure.The algorithm makes this decision in Lines 33-38: α j j and β j j are included only if they are in between the lowest and highest breakpoint values that can be considered in the breakpoint search.This lowest value is κ j if κ j ≤ κ j−1 (when one of the two corner cases for κ j occurs and thus this value has already been determined) and κ j−1 otherwise (when breakpoint search is required to find κ j ).Analogously, the highest value is λ j if λ j ≥ λ j−1 and λ j−1 otherwise.

Time complexity
We now establish the worst-case time complexity of Algorithm 3 by means of the following lemma: Lemma 6. Algorithm 3 can be implemented such that its worst-case time complexity is O(n log n).
Proof.Observe that, throughout the algorithm and all its procedures, all operations have a total time complexity of O(n) except for four operations on the sets A, B, K, and L of to-be-considered breakpoints.For each of these breakpoint sets, say D, these are finding the minimum and maximum breakpoint in D (Lines 2 and 18 in Algorithm 4 and Line 2 in Algorithms 5 and 6), inserting a breakpoint value in D (Lines 13, 29, 34, and 37 in Algorithm 4), and removing the minimum or maximum breakpoint from D (Lines 15 and 31 in Algorithm 4 and Lines 16, 20, 24, and 28 in Algorithms 5 and 6).As we showed in Section 4.2, each breakpoint value is inserted and removed at most once during the course of the algorithm.Moreover, in the worst case, we have to find the minimum and maximum breakpoint value in D a number of n times.Thus, the total number of breakpoint set operations is O(n).If we maintain the breakpoint sets as min-max heaps [4], each of these operations can be executed in O(1) (finding the minimum and maximum) and O(log n) (inserting and removing a breakpoint) time.This means that the total time complexity of all four breakpoint set operations is O(n log n) if we use min-max heaps to store the breakpoint sets.It follows that Algorithm 3 can be implemented such that its worst-case time complexity is O(n log n).
In practice, carrying out the breakpoint set operations might be faster if we use a different data structure than min-max heaps to maintain the breakpoint sets A, B, K, and L. For instance, when n is small, simple arrays might be sufficient for fast insertion and removal of breakpoints, even though this increases the worst-case time complexity to O(n 2 ).On the other hand, [19] suggests to keep the breakpoint sets by means of a so-called disjoint set data structure (see, e.g., [11]).Using such a structure, a sequence of O(n) breakpoint insertions and deletions in sets of size at most n can be done in O(n) time using the algorithm in [14].However, it is unclear whether the algorithm in [14] is fast in practice for two reasons.First, it is complicated and cumbersome to implement compared to other algorithms for insertion and removal operations on disjoint set data structures [15].Second, although the authors mention in the preliminary study [13] that their algorithm outperforms the then state-of-the-art, the literature contains hardly if any studies on its practical performance.Alternatively, one could use other algorithms (e.g., those evaluated in [34]) that have a worse worst-case time complexity but have been shown to be fast in practice.

Evaluation
In this section, we evaluate the performance of our Algorithm 3 as presented in Section 4.4, to which we shall refer as ALG seq for clarity, and compare it with the state-of-the-art algorithms ALG inf from [41] and ALG dec from [42].We carry out two types of experiments.First, we evaluate the performance of our algorithm on realistic instances of the battery scheduling problem BATTERY.For this, we tailor ALG seq to this problem and compare this implementation to a tailored implementation of ALG inf within the simulation tool DEMKit [21].Second, we compare the execution time and scalability of our algorithm and of ALG inf and ALG dec on synthetic instances with sizes ranging from 10 to one million variables.We have implemented all three algorithms in Python (version 3.5) to be able to compare them to the implementation in DEMKit, which is also written in Python.All simulations and computations have been executed on a 2.60 GHz Dell Inspiron 15 with an Intel Core i7-6700HQ CPU and 16 GB of RAM.
In Section 5.1, we describe in more detail the problem instances that we use in the evaluation.Subsequently, in Section 5.2, we discuss several implementation choices and in Section 5.3 we present and discuss the results of our evaluation.

Problem instances
For the comparison of the tailored implementation of our algorithm ALG seq with the tailored implementation of ALG inf within DEMKit, we generate realistic instances of the problem BATTERY.For this, we consider the setting where a battery charging schedule for two consecutive days needs to be computed.This scheduling horizon is divided into 15-minute time intervals, resulting in n = 192.To study the influence of the battery size on the solving time, we consider three scenarios that correspond to three different battery sizes and denote them by Small, Medium, and Large.In these scenarios, the battery capacity is 20 kWh, 100 kWh, or 180 kWh and the (dis)charging rate is 4 kW, 20 kW, or 36 kW respectively.This leads to ∆t = 1  4 and to the values for X min , X max , and D as given in Table 3.Note that this is equivalent to the situation where either 10, 50, or 90 percent of the households have installed a smaller "home" battery with a capacity of 5 kWh and a (dis)charging rate of 1 kW, which corresponds to real-life field tests such as described in [35].We set both the initial and target SoC to a given fraction of the capacity, i.e., S start = S end = sD, where s ∈ {0, 0.1, 0.2, . . ., 1}.For each scenario, we simulate 50 battery schedules of two days.As input for the base load p, we use measurement data of the actual power consumption of 40 households for 100 consecutive days that were obtained in the field test described in [22].Table 3: Parameter choices for the battery scheduling problem for each scenario.
For the scalability analysis, we generate synthetic instances in the same way as in [42].For this, we consider instance sizes n in the set {10, 20, 50, 100, 200, 500, . . ., 10 6 } and for each of these sizes, we generate 10 instances.In each instance, we sample the parameters a, l, and u from the uniform distributions U (0, 1), U (0.1, 0.5), and U (0.5, 0.9) respectively.To generate the nested bounds L and U , we first draw for each i ∈ N two values X i and Y i from the uniform distribution U (l i , u i ).Subsequently, we define for each j ∈ N the values v j := i∈Nj X i and w j := i∈Nj Y i and we set L j := min(v j , w j ) and U j := max(v j , w j ) for j < n and

Implementation details
In both the divide-and-conquer algorithm ALG dec and the infeasibility-guided algorithm ALG inf , we use Algorithm 1 to solve the QRAP subproblems.Note that using this algorithm instead of linear-time algorithms such as in [27] increases the worst-case time complexity of ALG dec and ALG inf by a factor O(log n).However, for practically relevant problems sizes, this procedure is generally faster in practice and easier to implement than the linear-time algorithms in, e.g., [27].
For the double-ended queues needed in ALG seq for the optimal Lagrange multipliers κ and λ, we use the Python container datatype deque.Moreover, we initially implemented the double-ended priority queues for the lower and upper initial breakpoint values (α i i ) i∈N and (β i i ) i∈N as symmetric min-max heaps [3].However, initial tests indicated that using instead a coupled min-heap and max-heap implementation with total correspondence leads to similar or even lower execution times of the overall algorithm.Moreover, the latter data structure is much simpler to implement using the standard Python libary heapq.Therefore, we use this method instead of min-max heaps.In this alternative method, we insert new breakpoints in both the min-heap and the max-heap and use the min-heap to find and delete a minimum breakpoint (in the lower subproblems) and the max-heap to find and delete a maximum breakpoint (in the upper subproblems).Moreover, we assign to each breakpoint a flag that is 1 if the breakpoint has been removed from either of the heaps and 0 otherwise.This prevents that we find a minimum (maximum) breakpoint in the min-heap (max-heap) that was already considered in the other heap and thus has been removed from the breakpoint search.

Results and discussion
In this section, we present and discuss the results of our evaluation.First, we discuss the results of the comparison of the tailored implementation of ALG seq with the tailored implementation of ALG inf within DEMKit.Figure 2 shows the ratios between the execution times of the tailored implementation of ALG inf and that of ALG seq .Moreover, Tables 4-6 contain for each scenario and each initial and target SoC value the mean, maximum, and coefficient of variation (CoV) of the execution times.The CoV is the sample deviation divided by the sample mean and is a suitable measure of the variation between samples when comparing different collections of samples with significantly different sample means.
Tables 4-6 show that the mean execution time of ALG seq is similar in each scenario, whereas that of ALG inf appears to decrease as the battery size increases.This implies that also the ratios between the execution times decrease as the battery size increases, which is confirmed by the boxplots in Figure 2. In Table 5: The mean, maximum, and coefficient of variation of the execution times of the tailored implementation of ALG seq and the tailored implementation of ALG inf within DEMKit for the scenario Medium.
particular, a smaller battery size seems to imply that ALG seq is likely to be faster than ALG inf whereas ALG inf is likely to be faster for larger battery size.The reason for this is that the execution time of ALG inf heavily depends on the number of tight nested constraints in an optimal solution.To support this fact, we plot in Figure 3 boxplots of these numbers.Note that when the initial SoC is 20% or 30% of the battery capacity in the scenario Large, in only 4 of the 50 instances the number of tight constraints was more than 1, meaning that in the remaining 46 instances the optimal solution to the relaxation of  the problem did not violate any of the nested constraints.The relation between the number of tight nested constraints and the ratios is also strongly visible when comparing Figures 2 and 3: the ratios increase as the number of tight constraints increases.From these results, we can derive a "rule of thumb" for the choice of a proper algorithm to use given the expected number of tight nested constraints.To this end, we compute for each number of tight constraints the percentage of instances where the tailored implementation of ALG seq runs faster than the tailored implementation of ALG inf within DEMKit given the optimal solution has this particular number of tight nested constraints (see Table 7).These values suggest that when the number of tight constraints is more than 4 192 ≈ 2.1 percent, our algorithm is faster in more than 50% of the instances.In particular, when the number of tight constraints is Note that this rule-of-thumb is in line with the physical interpretation of tight nested constraints in BATTERY.For this, note that a battery being completely empty or full is equivalent to a nested constraint of BATTERY being tight.When the charging rates of the battery are large, the battery is better able to, at a given moment, flatten large peaks or drops in power consumption.However, the latter is also dependent on whether there is enough space (energy) left in the battery to store (dispatch) this energy, which is more likely when the battery capacity is large.Thus, when adopting a large battery for load profile flattening, it is less likely that it will be completely empty or full.
Although the ratio between the execution times of ALG seq and ALG inf appears to depend significantly on the battery size, the maximum and CoV of the execution times of ALG seq is on average around 1.9 and 3.0 times smaller than that of ALG inf respectively.This means that the execution times of ALG seq are on average more stable than those of ALG inf , regardless of the battery size.For DEM in general and DEMKit in particular, this is beneficial since the coordination and optimization of schedules for different devices is often done in parallel due to the decentralized nature of the coordination (see, e.g., [20]).As a consequence, the execution time of the entire coordination and optimization framework is constrained by the maximum execution time required for solving one (subset of) device-level optimization problem(s).Thus, using ALG seq instead of ALG inf within such a framework may significantly reduce the overall execution time of the framework.
In the following, we present and discuss the results of the scalability evaluation.Figure 4 shows the execution times of the three algorithms ALG seq , ALG inf , and ALG dec , and Table 8 shows for each studied instance size n the mean and CoV of the execution times of the corresponding instances.The added regression lines in Figure 4 are the fitted power laws of the execution times, i.e., for each algorithm we fit the function φ(n) = c 1 • n c2 to the execution times.These lines indicate that the practical execution time of both ALG seq and ALG dec is close to O(n) and that of ALG inf is actually slightly less than O(n).Thus, it can be expected that for very large values of n, more precisely for n > 2.90 • 10 9 , ALG inf is on average faster than ALG seq .However, the CoV for ALG inf are around one order of magnitude larger than those of both ALG seq and ALG dec .This suggests that the execution time of the latter two algorithms is significantly less affected by the choice of problem parameters than ALG inf .This is in line the results of the comparison of the tailored implementation of ALG seq for BATTERY with that of ALG inf within DEMKit.The results in Figure 4 and Table 8 indicate that on average ALG seq is 27.2 times faster than ALG dec and 1.95 times faster than ALG inf .With regard to the performance of ALG dec , we acknowledge that ALG dec and in particular the updating scheme for the single-variable bounds can probably be implemented more efficiently than in the current implementation.To reduce the influence of the overall implementation on the results of this study, we measured the total time that is spent in ALG dec on solving QRAP subproblems and compared this to the execution times of ALG seq and ALG inf .This alternative time represents the time that is minimally required to solve all QRAP subproblems regardless of the implementation of the scheme used to update the single-variable bounds.These measurements indicate that on average around 59% of the total execution time of ALG dec is spent on solving QRAP subproblems.However, this time is still on average 15.9 times more than the execution time of ALG seq and 9.5 times more than that of ALG inf .

Conclusions
We proposed an O(n log n) time algorithm for quadratic resource allocation problems with lower and upper bound constraints on nested sums of variables.As opposed to existing algorithms with the same time complexity, our algorithm can achieve the O(n log n) time complexity using only basic data structures and is therefore easier to implement.In computational experiments, we demonstrate the good practical performance of our approach, both on synthetic data and on realistic instances from the application area of decentralized energy management (DEM) for smart grids.
Our approach builds upon monotonicity arguments that find their origin in the validity of greedy algorithms for convex optimization problems over polymatroids [18,19].Such monotonicity arguments have been primarily studied for resource allocation problems where the objective function is separable, i.e., can be written as the sum of single-variable functions.However, in previous work [38] we prove the validity of similar monotonicity arguments to solve a nonseparable resource allocation problem with so-called generalized bound constraints.Moreover, recent results on the use of interior-point methods for nested resource allocation problems [40,45] suggest that incorporating specific nonseparable terms in the objective function does not increase the complexity of the used solution method.Thus, one interesting direction for future research is to investigate whether one can use monotonicity arguments to derive efficient algorithms for resource allocation problems over nested constraints with nonseparable objective functions.
With regard to the application within DEM systems, we compared our algorithm with an existing implementation of the state-of-the-art algorithm of [41] within a simulation tool for DEM research.One of our objectives was to decide which of these two algorithm is more suitable to use for a given (type of) problem instance.It would be worthwhile to conduct a more thorough comparison and to develop an automated procedure to decide which algorithm is most likely to be faster.Moreover, the nonseparable version of the studied problem mentioned in the previous paragraph is related to energy management of batteries in three-phase distribution networks, where load profile flattening on all three phases together is required to avoid blackouts in these networks [44,22,38].Thus, research in this direction is also relevant in the context of DEM.

A Proofs of Lemmas 1, 4, and 5
A.1 Proof of Lemma 1 Lemma 1.If L j ≤ A ≤ B ≤ U j , we have x j (A) ≤ x j (B) for a given j ∈ N .
Suppose that there exists an index s ∈ N such that x j s (A) > x j s (B).Let r be the largest index with r ≤ s such that k∈Nr−1 x j k (A) ≥ k∈Nr−1 x j k (B), and let t be the smallest index with t ≥ s such that . By definition of r, s, and t, we have that t i=r Moreover, observe that we cannot have r = s = t simultaneously.Indeed, if r = s = t, then we have by definition of r, s, and t that k∈Ns This implies x j s (A) ≤ x j s (B), which is a contradiction.Thus, either r < s or s < t or both.We show that we obtain a contradiction if r < s.The proof for the case where s < t is symmetrical.If r < s, the following holds: • By definition of r and s, we have Thus, x j r (A) < x j r (B).
It follows that x j s (A) ≤ x j s (B).
Proof.We have χ n = κ n = λ n since we defined L n = U n = R and by definition of the solution x n (R) the nested constraints L n ≤ i∈N x n i (L n ) and i∈N x n i (U n ) ≤ U n are tight.We prove the lemma by considering each of its three cases separately for each j < n: 1. We prove this part of the lemma for the case that j is the largest index smaller than j+1 such that χ j+1 ≤ κ j , i.e., χ k+1 > κ k for all k ∈ {j + 1, . . ., j+1 − 1}.Using this result, we show as follows that the other case, i.e., both the situations where either j = j+1 or where there exists an index k > j that it is the largest index in the set {j + 1, . . ., j+1 − 1} such that χ k+1 ≤ κ k , leads to a contradiction.In the former situation, it follows that j + 1 > j = j+1 ≥ j + 1, which is a contradiction.In the latter situation, the lemma applies for k, meaning that i∈N k x n i (R) = L k and thus k = k.However, we also have by definition of j+1 that k = j+1 since j + 1 ≤ k < j+1 .This implies k = j+1 , which is a contradiction.
On the one hand, if V j+1 = L j+1 , we have x n i (R) − j+1 i=j+1 x n i (R) where the inequality follows since i∈N j+1 x n i (R) = L j+1 and by Lemma 2. On the other hand, if V j+1 = U j+1 , we have by Lemma 2 that In both cases, it follows that i∈Nj x n i (R) = L j , from which it follows directly that χ j = κ j .
2. The proof for the case λ j ≥ χ j+1 is analogous to the proof for the case χ j+1 ≤ κ j .
3. Suppose that x j i (L j ) = x n i (L n ) holds for all i < j + 1.By Lemma 2, this implies that x k i (L k ) = x j i (L j ) = x n i (L n ) for all k ∈ {j, . . ., n} and i < j + 1.In particular, we have that x k i (L k ) = lk i for all k ∈ {j + 1, . . ., n}, which implies that κ k ≤ α k i .Furthermore, note that for any k ∈ N there is at least one index i k ≤ k such that α k i k ≤ κ k < β k i k .Otherwise, there exists > 0 such that κ k + is an optimal Lagrange multiplier.It follows from the relation between α k i k and α k +1 i k in Equation ( 9) that α k +1 i k = κ k for any k < n.This implies in particular that α k+1 i k = κ k ≤ α k i k−1 for all k ∈ {j + 1, . . ., n}.It follows that κ j+1 ≤ α j+1 ij = κ j and thus that κ j +1 < χ j+1 = χ j+1 .Since χ j+1 ∈ {κ j+1 , λ j+1 }, we have χ j+1 = λ j+1 , from which it follows that i∈N j+1 x n i (R) = U j+1 .However, this implies that This implies that i∈N j+1 x j+1 i (L j+1 ) = i∈N j+1 x j+1 i (U j+1 ), from which it follows that L j+1 = U j+1 by the monotonicity of x j+1 (•) as proven in Lemma 1.However, this is a contradiction with the assumption that L k < U k for all k < n.Hence, there must be at least one index i such that x j i (L j ) < x n i (R).It follows that L j = i∈Nj x j i (L j ) < i∈Nj x n i (R).To prove that i∈Nj x n i (R) < U j , we can use a similar argument wherein we show that the proposition x j i (U j ) = x n i (R) cannot be true for all i < n.Together, this implies that L j < i∈Nj x n i (R) < U j , from which it follows directly that χ j = χ j+1 = χ j+1 .
A.3 Proof of Lemma 5 Lemma 5.For each i ∈ N , we have Proof.Let J denote the set of indices whose corresponding nested lower or upper constraint is tight in x n (R).More precisely, J := {k j | j ∈ N } ≡ {j 1 , . . ., j q }, where q := |J | and j 1 < • • • < j q .For a given p ∈ {1, . . ., q}, note that since either the lower or upper nested constraint corresponding to j p is tight in the solution x n (R), we have that i∈Nj p x n i (R) = V jp .This implies that the vector (x n i (R)) 1≤i≤jp is the optimal solution to the subproblem QRAP-NC jp (V jp ), i.e., to the problem QRAP-NC jp (V jp ) : min x 2 i a i s.t.
Note that in the optimal solution (x n i (R)) i∈Nj p to this problem, none of the nested constraints (19) for k with j p−1 < k < j p are tight.As a consequence, when deriving the reformulated equivalent problem QRAP jp (V jp ), it follows from Lemmas 2 and 3 that we may replace the single-variable bounds (7) for i with j p−1 < i < j p by the original variable bounds l i ≤ x i ≤ u i .Thus, we can reformulate QPRAP-NC jp (V jp ) to QRAP jp (V jp ) : min  (U jp−1 ), i ∈ {1, . . ., j p−1 }, l j ≤ x j ≤ u j , i ∈ {j p−1 + 1, . . ., j p }.
Recall that χ jp is the optimal Lagrange multiplier for this problem.As a consequence, we can directly compute x jp i (R) for i ∈ {j p−1 + 1, . . ., j p } using Equation (3): The result of the lemma follows since we have χ jp = χ i for each i ∈ {j p−1 , . . ., j p } by definition of j p and χ i .

Figure 1 :
Figure 1: The function x i [δ] for a given i ∈ N .The slope of the line segment for δ ∈ [ li ai , ui ai ] is a i .

Figure 2 : 2 Table 4 :
Figure 2: Boxplots of the execution time of the tailored implementation of ALG inf within DEMKit divided by that of the tailored implementation of ALG seq for the three scenarios.Ratios larger than 1 imply that ALG seq was faster than ALG inf .

Figure 3 :
Figure3: Boxplots of the number of tight constraints in the optimal solutions for the three scenarios.
i∈Nj p x i = V jp , x jp−1 i (L jp−1 ) ≤ x i ≤ x jp−1 i

Table 6 :
The mean, maximum, and coefficient of variation of the execution times of the tailored implementation of ALG seq and the tailored implementation of ALG inf within DEMKit for the scenario Large.

Table 7 :
7192 ≈ 3.6 percent or more, the tailored implementation of our algorithm ALG seq is always faster.Percentage of instances where the tailored implementation of ALG seq is faster than the tailored implementation of ALG inf within DEMKit given the number of tight nested constraints in their optimal solutions.

Table 8 :
Mean and coefficient of variation of the execution times.