Incentivising Demand Side Response through Discount Scheduling using Hybrid Quantum Optimization

Demand Side Response (DSR) is a strategy that enables consumers to actively participate in managing electricity demand. It aims to alleviate strain on the grid during high demand and promote a more balanced and efficient use of (renewable) electricity resources. We implement DSR through discount scheduling, which involves offering discrete price incentives to consumers to adjust their electricity consumption patterns to times when their local energy mix consists of more renewable energy. Since we tailor the discounts to individual customers' consumption, the Discount Scheduling Problem (DSP) becomes a large combinatorial optimization task. Consequently, we adopt a hybrid quantum computing approach, using D-Wave's Leap Hybrid Cloud. We benchmark Leap against Gurobi, a classical Mixed Integer optimizer in terms of solution quality at fixed runtime and fairness in terms of discount allocation. Furthermore, we propose a large-scale decomposition algorithm/heuristic for the DSP, applied with either quantum or classical computers running the subroutines, which significantly reduces the problem size while maintaining solution quality. Using synthetic data generated from real-world data, we observe that the classical decomposition method obtains the best overall \newp{solution quality for problem sizes up to 3200 consumers, however, the hybrid quantum approach provides more evenly distributed discounts across consumers.


I. INTRODUCTION
The rising demand for energy resources and the growing adoption of renewable electricity sources have prompted a search for innovative solutions to optimize energy consumption in order to reduce grid congestion and carbon emissions.Demand Side Response (DSR) [1] has emerged as a promising strategy that focuses on actively managing and adjusting energy consumption patterns in response to grid conditions.Various studies in literature explore DSR, detailing its impact on smart grid technology [2], load scheduling [3], energy economics [4], as well as optimal control and pricing schemes [5].
Price adjustment serves as a straightforward method to influence consumer behavior.With the emergence of smart devices and the electrification of heating and transportation, the response to price incentives can be progressively automated.Typically, DSR is achieved by handing out a dynamic price to all customers simultaneously.However, the diverse usage patterns among consumers may favor alternative dynamic pricing policies.Therefore, we aim to find individual price patterns on a per-customer basis to achieve an optimal load shift.We call the distribution of discounts or penalties to specific customers the Discount Scheduling Problem (DSP).The number of customers to be considered in such a problem, i.e., an urban power grid, can become prohibitively large to be solved by classical resources.
In recent years, Quantum Computing (QC) has garnered significant attention as a potential game-changer in various domains, including optimization.Leveraging the principles of quantum mechanics, quantum optimization algorithms are hypothesized to solve complex optimization problems more efficiently than their classical counterparts.Besides gate-based universal quantum computing, Adiabatic Quantum Computing (AQC) has emerged, which can be shown in general to be equivalent to gate-based approaches [6].Quantum Annealing (QA) [7], [8], a subset of AQC, has been widely adopted for solving optimization problems [9], [10].As the industry leader in quantum annealing hardware, D-Wave's quantum annealer is employed in this work to optimize the DSP.The limited size of current quantum computing hardware forces us to utilize hybrid quantum computing approaches, like Leap, which is a Cloud service offered by D-Wave and is based on internal problem size reduction [11].In this work, we additionally develop a customized hybrid approach that performs a problem-specific decomposition.
This paper aims to investigate the applicability of QA to DSP optimization and benchmarks the performance of hybrid quantum-classical routines against purely classical counterparts.The overall structure is as follows: After giving a concise literature review in Sec.II, we describe the problem formulation and mathematical modeling of the DSP as a Quadratic Integer Programming (QIP) problem in Sec.III.Since the problem should be solvable for a customer count in realistic scenarios, Sec.IV motivates and develops a problem-specific decomposition algorithm for problem size reduction.This decomposition routine proves to be very effective, as the benchmarking of classical and quantumenhanced solvers, in Sec.V shows.Finally, in Sec.VI we discuss the overall summary of the work and the implications of applied quantum computing to large scale optimization problems in industry targeted to increasing renewable energy usage.

II. LITERATURE REVIEW A. RELATED WORK
Recently, quantum computing applications in the power and energy sector [12]- [15] are gaining attention for the development of smart grid technology.Several important problems are addressed using quantum computing, for example power flow calculations [16], [17] or energy grid classification [18].The traditional planning and scheduling tasks in power systems, such as the minimization of generation cost or the maximization of revenue from electricity generation, are generally formulated as combinatorial optimization problems, which are often NP-hard.Using quantum-inspired optimization algorithms is expected to outperform their classical counterparts [13], [19].A wide range of optimization problems can be converted into quadratic unconstrained binary optimization (QUBO) problems [20], which can be efficiently solved with the Quantum Approximate Optimization Algorithm (QAOA) [21] using gate-based universal quantum computers or using D-Wave quantum annealers.In the literature, there exist multiple quantum computing approaches towards unit commitment [22]- [25] and other mixed integer problems [26], using quantum-inspired ADMM [27] or Benders' decomposition methods [28].Quantum annealing approaches are also used for community detection in electrical grids [29], peer-to-peer energy trading [30] or coalition structure optimization [31], [32].Several research studies benchmark the performance of classical algorithms vs. hybrid quantum-classical algorithms such as Leap on large-scale instances.These include transport robot scheduling [33], job shop scheduling [34], power network partition [35] and SAT problems [36].
As one of this work's main contributions is developing a problem-specific decomposition method to solve large instances of the DSP on currently available hardware, we give a brief overview of combinatorial problem decomposition algorithms in the context of quantum optimization here.Divide-and-conquer approaches have been used for various problem instances, such as the MaxClique problem [37]- [40], Minimum Vertex Cover [40], [41], Community Detection [42] and MaxCut [42], [43].They all combine the splitting of the problem into sub-problems using problemrelated methods.In special cases, such as Ref. [43], quantum optimization is further utilized in recombining the solution because of the special Z 2 symmetry of MaxCut solutions.Quantum Local Search (QLS) [44] takes local sub-problems of a graph-based problem and iteratively improves a global solution vector.Although applicable to any graph-based problem, QLS has been specifically tested for the Maximum Independent Set problem.The recent emergence of distributed quantum computing has led to the development of decomposition algorithms that still allow for a limited amount of quantum information exchange between the optimization of the sub-problems [45], [46], which was successfully demonstrated for the Maximum Independent Set problem.Apart from problem-specific methods, general QUBO decomposition methods have been devised, like QBSolv [47].Here, subsets of binary variables of the full QUBO are selected as sub-problems, which are solved on a quantum annealer, while in parallel, a classical heuristic optimizes the original problem.During the process, solutions to the sub-problems will incrementally improve the current solution state of the heuristic.

B. INTRODUCTION TO QUANTUM ANNEALING
Quantum annealing (QA) is a heuristic for solving combinatorial optimization problems, first proposed in 1998 by Kadowaki and Nishimori [9].QA utilizes the adiabatic theorem to find the unknown ground state of an Ising Hamiltonian H Ising , whose minimal energy state corresponds to the solution of a target problem.
With H Init being the initial Hamiltonian, the annealing pro- Normalization constants for the penalty terms.
cess can be described by the following dynamic Hamiltonian: where σ x,z are Pauli matrices operating on qubit i, and h i and J i,j are the qubit biases and coupling strengths, which encode the specific problem.A(s) and B(s) are known as the annealing schedule, with s ∈ [0, 1].At s = 0, A(s) ≫ B(s), while A(s) ≪ B(s) for s = 1.As we increase s from 0 to 1, the system undergoes a gradual change from H Init to H Ising .The adiabatic theorem of quantum mechanics states that if that evolution happens slowly enough and the system is initialized in the trivial ground state of H init , then the state will remain in the ground state of the momentary Hamiltonian [48].Eventually, at s = 1, the state will be in the ground state of the H Ising .Finding the ground state of the Ising model is isomorphic to QUBO [20], therefore, measuring the final state will reveal the solution to an NP-hard optimization task.
In quantum annealing, this transition speed will typically be faster than required for the adiabatic theorem, due to practical considerations.Nevertheless, experimental evidence suggests that, depending on the spin glass model, faster evolution times still output the optimal solution with high probability [49].Thus, measuring the output repeatedly will eventually find the correct solution.

III. DISCOUNT SCHEDULING PROBLEM FORMULATION
Given a discrete time horizon of N T steps t, a set of N C customers c with projected consumption data d c,t , and the local CO 2 grid intensity I t [g/kWh], the goal of the DSP is to assign each customer individual discrete discounts z c,t ∈ Z, such that the overall CO 2 emissions are minimized, but the overall consumption remains equal.Furthermore, grid constraints must be satisfied at any timestep, and the overall consumption deviations of a single customer should be kept to a minimum.The discount categories Z are defined as a symmetric set max Z = − min Z = z max , where positive discounts are referred to as penalties, e.g.Z = {−0.5,−0.25, 0, 0.25, 0.5}.
Reducing CO 2 emissions has the advantage of increasing consumption during periods of abundant local renewable energy production.However, any other linear or quadratic function constructed from the discounts can be used as an objective for the DSP (e.g.minimizing the operational costs based on spot market prices).

A. PRELIMINARY CONSIDERATIONS
Since the distribution system operator (DSO) cannot yet automatically influence the consumption of the customer at a certain time, we have to go the detour over price incentives.We assume customers are strictly economically motivated, i.e., they alter their consumption based on price.Of course, the convenience of having access to electricity at all times is more important than saving on the cost, such that, in reality, customers cannot vary their consumption arbitrarily at any given time.However, with the emergence of electric vehicles (EVs) with home charging and heat pumps, automatically varying the load becomes possible.The given discounts then act as a protocol that communicates to a smart home appliance on the customer side when to use electricity and when not, e.g., start or stop charging the EV.
The central assumption of the DSP is that a given discount (or penalty) influences the customers to increase (or decrease) their consumption proportionally.The consumption changes as follows when given a discount z c,t .The constant χ c is the (negative) price elasticity of demand of customer c.I.e., the higher χ c is, the more customer c responds to price incentives (lower its demand if price increases and vice versa).In principle, the price elasticity takes positive values below one, where χ c = 1 means a full reflection of the price change on the load change.
In literature, different estimations of the electricity demand price elasticity have been made, reaching values between 0.65 and 0.85 in residential U.S. customers [50] and 0.8 to 0.9 in Swiss households [51].A metastudy [52] on dynamic pricing reveals that the short-term price elasticity has to be estimated lower than the long-term elasticity.Nevertheless, response to dynamic pricing may be increased by automation of the load of smart devices and other enabling technologies [52].In reality, price elasticity will vary between customers, so we formulate it as a customer-specific value.
The DSO can measure the response of individual customers and adjust the elasticities for a more accurate model.Discrete discounts allow users to change their behavior more distinctly.For instance, providing a small discount to a thousand customers might not necessarily have the intended effect, then supplying only a few customers with moderate discounts can have.Therefore, restricting to a discrete set of categories Z is sound.

B. MATHEMATICAL FORMULATION
Collecting the considerations, we can finally formulate the DSP optimization problem as QIP for minimizing CO 2 production through load shifting.All terms and constraints will VOLUME 4, 2016 be explained separately in the following sub-sections.minimize: such that: Here, C(z) is the cost function to be minimized, and D c is the total power draw of a customer D c = t d c,t .Furthermore, λ i refers to penalty factors and N i to normalization constants employed to keep the impact of the penalty factors independent of problem size and data.The normalization constants are chosen such that the effect of each penalty term is 1 if all discounts are assigned in the worst possible way; see Table 1.
The formulation of the objective and the purpose of all penalty terms and constraints present in the problem statement (5)- (10) will be explained in the following subsections.

1) CO 2 Emission Minimization
The combined CO 2 emission is proportional to the changed consumption ( 4) and serves as the main objective of the minimization formulation.The normalization constant N 0 is chosen to map the range of CO 2 emissions between 0 and 1.Therefore, we utilize the trivial origin configuration P (z = 0) as the maximal value and compute a naive lower bound for the CO 2 emissions to set N 0 = E(0) − E min : which gives all customers the full discount if I t is smaller than the average and the full penalty if I t is larger, respectively.
Note that this lower-bound solution does not satisfy the constraints of the formulation (9), (10).Therefore, it is substantially smaller than the actual best solution.
A perfect equality can generally not be achieved because of the discrete discounts, except for the trivial case z c,t = 0. Therefore, it is represented by the penalty term Eq. ( 6) as a quadratic soft-constraint with penalty factor λ 1 .

3) Discount change penalty
As discussed in Sec.III-A, longer periods with similar discounts exhibit better customer response.We, therefore, employ a penalty function that tries to minimize consecutive discount changes in Eq. (7).The corresponding penalty factor λ 2 will be chosen small (λ 2 < λ 1 ).

4) Discount regularization
We attempt to assign tarif discounts that affect the objective function C by a large enough amount.Suppose a customer consumes an equal amount at two timesteps with I t1 = I t2 .Assigning z c,t1 = −z c,t2 = z max would not change the cost compared to z c,t1 = z c,t2 = 0, but can be given anyways.A small L2-regularization, see Eq. ( 8), ensures that discounts are only given if they benefit the overall goal, with λ 3 ≤ λ 2 .L2-regularization is chosen over L1-regularization since it naturally maps into QUBO.

5) Global consumption deviation constraint
Even though the per-customer consumption deviation is softconstrained (6), the consumption deviation of all customers together can be zero up to numeric precision.Globally, i.e., the combined view of all customers, we do not want any change in overall consumption.Hence, it is a hard constraint, see Eq. ( 9).

6) Power restriction constraint
The momentary change in consumption (power restriction constraint) of all customers combined must be bounded due to grid voltage peaks and therefore the hard-constrained Eq. ( 10) has been introduced.Additionally, for load shifting, we require a time-window where consumption can be increased and decreased.The values for ∆p t can be determined using power flow computations and can, in principle, also be asymmetric.Of course, the presented power restriction is a simplification, but it suffices for an initial investigation of the problem.

C. DISCOUNT ENCODING
Discrete discounts z c,t ∈ Z offer another benefit, which is that they can relatively easily be encoded through binary variables [20].This makes translating the problem formulation into QUBO easier, which is required for employing quantum optimization techniques.
We will focus on integer encoding of the discount set Z: Here, we discretize the range [−z max , z max ] into N K linearly spaced categories.Generally, the range can also be asymmetric but is not considered in this work.Therefore, This range can subsequently be expressed using with Every bit combination x c,t,k results in a valid encoding, making an additional penalty term for encoding obsolete [20].This encoding is very space efficient, allowing an exponential number of categories to be represented with a linearly growing number of qubits.An alternative method for encoding discounts would be one-hot encoding, where N K bits encode every item of Z by only setting one bit to 1 and the other ones to zero.This is a more general framework that allows any Z (not just linearly spaced) to be encoded through binary variables.However, it requires N k binary variables per customer and an additional constraint.In the context of QUBO, that constraint has to be enforced as an additional penalty term.Preliminary experiments have shown advantageous results for integer encoding compared to one-hot encoding.

D. ON CUSTOMER SAVINGS
Given the customers initially receive a flat tariff v 0 [C/kWh], the discount or penalty (v 0 → (1 + z c,t )v 0 ) only affects the consumption that deviates from the projected consumption dc,t − d c,t .Consequently, the customer pays for a specific timestep.The customer's cost change over the optimization horizon can be computed via the sum of momentary price differences through Eq. ( 16): Note that we have used the sum over the changed consumption as the baseline for our comparison, since in any case t d c,t ≈ t dc,t and we only want to compare the cost for the same amount of purchased electricity.
Substituting in the altered consumption from Eq. ( 4), we obtain a change in cost given by The absolute price change is dependent on the flat tariff and the total consumption of the customer.We will therefore look at the relative savings s c = −∆v c / t v 0 dc,t ≥ 0 in the experiments section.As z2 c,t ≥ 0 and χ c ≥ 0, the customer's price change is guaranteed to be ∆v c ≤ 0 negative, so a customer will always save money by complying with the incentives.The savings are exactly zero if the customer does not respond to incentives at all, i.e., χ c = 0.

E. GRID DATA
For the DSP, we require forecasted consumption data d c,t ≥ 0 [kWh] for each customer and predicted grid CO 2 intensity I t of the power generation in the considered region.We use standard load profiles of residential and industrial customers, which we modify by adding noise and shifting in time.Additionally, the load profiles get scaled to resemble various numbers of residents.Moreover, we include photovoltaic (PV) electricity generation by estimating the potential based on roof data of Munich residential areas and simulating the production from historical solar irradiance data.PV production reduces the customers' consumption.Grid infeed, i.e., if more PV is generated than consumed, is not specially considered.The grid CO 2 intensity is taken from the real-world data in Munich. 1 .The data used throughout this text consists of roughly 16000 customers and 76 timesteps, spanning a 19hour period with 15-minute intervals.The CO 2 and solar data are from January 13, 2023.

IV. PROBLEM DECOMPOSITION
The number of integer variables needed to construct the discount matrix is N C × N T .Given a one-day optimization horizon with 15-minute timesteps, each customer requires 96 integer decision variables in the problem.However, as the number of customers will grow quite large 2 , the number of integers grows akin.Even worse, the number of qubits in the quantum formulation is scarce, and every integer must be encoded with Q qubits.Thus, the move to a hybrid quantumclassical optimization scheme seems inevitable.
In this section, we propose a hybrid approach that is based on problem decomposition.Despite the drawback that decomposition increases solution bias, we find that we can manage the hard constraints of the DSP classically in a preprocessing step.This eliminates the need for a costly reformulation of inequality constraints with slack variables.Fig. 1 shows an overview of the steps taken for the decomposition.

A. MOTIVATION 1) Global Solution
Shifting the perspective from the individual customer level to a global scope, where all customers are regarded as a unified entity, we consider the overall consumption D t = c d c,t and the mutable consumption, i.e., the consumption weighted by the individual customer susceptibilities D t = c χ c d c,t .Furthermore, we can express the weighted average of all discounts given per customer-from now on called effective discount-as follows Utilizing the formulation of the effective discount, we can transform the CO 2 production from Eq. ( 11) into The global consumption deviation constraint, Eq. ( 9), and the power restriction constraint, Eq. ( 10) can be expressed solely in terms of the effective discount.Therefore, we represent the global version of the DSP as a linear program minimize: This formulation disregards any per-customer constraints that are still part of the DSP.Nevertheless, it is a useful tool to estimate how much CO 2 reduction is maximally possible with all the hard constraints ( 9), (10) in place.In fact, the solution ζ * t is guaranteed to give an optimal lower bound where Z = {z ∈ Z N C ×N K s.t. ( 9), (10) hold} is the set of feasible discount matrix configurations.The global DSP consists of only N T continuous variables.Thus, it can be quickly and efficiently solved using standard procedures like the Simplex method [53].
Given an optimal effective discount, ζ * t , we can utilize Eq. ( 19) to optimize the integers z c,t for the individual customers per timestep, i.e.
Additionally, we can include the penalty terms from the DSP ( 6)- (8) in the subsequent optimization.However, doing so would yield an optimization problem the same size as the original problem.
Nonetheless, the following section reveals that we can achieve a satisfactory approximation of a continuous effective discount by considering only a limited number of customers.As a result, we can divide the customers into smaller groups or chunks and optimize each chunk separately.

2) Representational Power
In this section, we motivate that Eq. ( 19) can be fulfilled for any arbitrary ζ t with sufficient accuracy given a small constant number of customers.We will focus on a discount range ζ t ∈ [−1/2, 1/2] and five discrete discounts z c ∈ {−1/2, −1/4, 0, 1/4, 1/2}.From the generated consumption data, see Sec.V, we take a random set of customers and compute for all available timesteps.Fig. 2 shows the result with different numbers of customers.The average over all timesteps is plotted, and the error bands indicate a 95% confidence interval.It is evident that even with only ten customers, the relative error remains consistently below 1%.As more customers are added, the error decreases significantly, reaching a negligible level.Therefore, we contend that by maintaining a small, constant number of customers within a chunk (e.g., 20-50 customers), it is possible to obtain a reliable approximation of an effective discount while still considering the per-customer soft-constraints of the DSP.

B. THE FULL DECOMPOSITION ROUTINE
We now assemble the pieces into a full hybrid routine for decomposition, as seen in Fig. 1.The process begins with solving the global DSP (21), followed by dividing customers into chunks.We sort the customers by total consumption and split them into M -sized groups, s.t. the largest customers are arranged in the first chunk, etc.We argue that it is better to have customers with comparable consumption in one chunk because they can counteract each other better than e.g. one industrial customer and 20 single households.For each chunk, we can define sub-problems in which special effective discounts per chunk are introduced in Sec.IV-B1.These sub-problems are of QUBO form and aim to assign discounts to customers such that the overall effect matches an effective discount while making sure, each customer does not deviate from the its total consumption by much.
Since we can solve the sub-problems sequentially, we can enhance the results by incorporating the errors from prior optimizations into the subsequent sub-problems in Sec.IV-B2.Eventually, all the chunks are collected, and a final postprocessing step shown in Sec.IV-B3 is applied to ensure that no constraints are violated.

1) Chunk Problems
The customers are partitioned into M = N c /m mutually exclusive chunks C j , s.t.j C j = {1, . . ., N C } and C i ∩ C j = ∅ ∀i ̸ = j.Note, that we require and expect the chunk size to be chosen, s.t.N C mod m = 0.
Most likely the consumption deviation per chunk is not zero, which, by default, introduces a bias in the consumption deviation soft-constraint (6).Thus, the first goal is to define chunk effective discounts ξ j t with the following properties: 1 where we define an alterable consumption for one chunk D j t = c∈Cj χ c d c,t , similar to the definition of the total mutable consumption.
We define the chunk-effective discount as follows where α t are arbitrarily chosen constants, s.t.
The conditions ( 25) and ( 26) are satisfied with this definition.
The values α t are chosen constant α t = 1/N T , but we have to make sure that ξ j t ∈ [−z max , z max ] ∀t, j.If this is not possible for one timestep t, α t has to be reduced while the remaining αs have to be increased.
The optimization objective is to approximate the following equality with the chunk effective discount as The objective can be reformulated as a least squares error problem to find an optimal z * c,t arg min and is directly in QUBO form after the binary representation of the discounts has been substituted into the formulation.
The previously discussed penalty terms and regularizationsconsumption deviation (6), discount change penalty (7) and discount regularization (8)-can be carried over to this optimization problem.

2) Sequential updating
When the sub-problems are solved in sequence, the error between the true achieved effective discount and the demanded one can be carried over into the next optimization to be corrected.For optimizing ξ j t , the procedure can be adapted as Doing so will significantly improve the overall accuracy of the method.Of course, one has to ensure that the altered ξs do not exceed the bounds [−z max , z max ].
3) Post-processing Finally, we describe a post-processing scheme that refines the result and ensures that the power restriction constraint (10) is held.Algorithm 1 describes the greedy improvement of the solution.
Conceptually, it is quite simple: For each timestep, we extract those customers whose discounts can be increased or decreased while also improving the consumption deviation penalty (6).Then we try all combinations between one increase and one decrease and investigate how the effective discount behaves.If ζ * t is negative, we want the real effective discount to be as close as possible but at least larger than ζ * t .If it is positive, the other way around.Doing so always satisfies constraint (10).We find the combination that matches the requirements the best and update the respective discounts if it achieves an improvement.Otherwise, the timestep is skipped.
Since all possible combinations of up and down moves have to be considered, the complexity of the Algorithm scales at worst with O(N T N 2 C /4).Nevertheless, limiting the possible moves to at most r provides sufficient accuracy, empirically.This then reduces the complexity to O(N T N C + N T r 2 ).

A. EXPERIMENTAL SETUP
To benchmark the performance of solving the DSP, we consider out-of-the-box solvers and our developed decomposition method and evaluate the results using a set of metrics that best represent the different goals described in the DSP formulation.

1) Investigated Solvers
An overview of the considered solvers and settings can be found in Table 2.As a state-of-the-art purely classical base- ; end line, we use Gurobi3 [54].This is compared to D-Wave's LeapHybridCQM solver [11] (called just Leap in the following), which is a quantum-classical hybrid algorithm that uses classical algorithms to optimize the problem while using quantum computers to solve suitable sub-tasks.This has the benefit of solving larger problems than possible directly on current quantum hardware while also supporting more sophisticated optimization models that include hard constraints.Like our decomposition routine, Leap partitions the problem into sub-problems via a proprietary algorithm.However, it follows a general ansatz compared to our problem-specific one.Leap is accessed through D-Wave's Cloud service.Both Leap and Gurobi solve the optimization problem presented in Eq. ( 5)- (10).These two out-of-the-box solvers are compared against our own problem-specific decomposition routine introduced in Sec.IV, subsequently called Decomp-Gurobi, Decomp-Leap or Decomp-QPU, depending on the method considered for solving the chunk problems (29).QPU refers to direct access to the D-Wave's Quantum Annealing processor Advantage 4.1 [11].When- ever a decomposition solver is followed by an integer, it refers to the chunk size m.The post-processing algorithm is used with a cut-off value r = 500.
In preliminary experiments, we additionally investigated D-Wave's QBsolv hybrid decomposition algorithm [47], but the performance was not comparable to the approaches presented here.Furthermore, we have noticed that the solution quality did not depend on the sub-solver chosen (e.g., D-Wave's QPU or a Simulated Annealing heuristic.),indicating that the classical Tabu Search [55] is solely responsible for the optimization work done.
The hybrid Leap solver only has the time limit as a control parameter exposed to the user.The time limit is bound from below by the minimum runtime that is heuristically calculated from the input problem (i.e., from the number of variables and couplings involved).Because it only depends on the problem structure and not the solution quality found, the minimum time limit cannot be used as the scaling metric.Furthermore, Leap exploits the full time limit setting and does not abort when satisfactory energy has been reached.Thus, a time-to-solution metric is not achievable within a single run and, therefore, also not considered in our benchmarking.
To alleviate the issue and ensure a fair comparison, we give each solver a heuristically increasing time limit of 0.1 s×N C .We observed that Leap tends to overrun the set timeout, which is the reason why we first run Leap with the linear growing timeout and then run the remaining solvers with the timeout matching Leaps runtime.Since the decomposition solver consists of multiple sub-solver calls, we set the timeout for each sub-solver as the whole timeout divided by the number of chunks, i.e., a timeout of 0.1 s × m.

2) Metrics
Because we consider an optimization task with multiple goals involved, it is not sufficient to consider only the objective value of our model as a performance metric.Instead, we simultaneously investigate multiple metrics: • Cost: The cost, or objective, of the optimization problem, Eq. ( 5)-( 8), is the main metric for comparing solver performance.To ensure an easier comparison between problem instances, we investigate the relative cost error with respect to the global solution from Eq. ( 22), defined with C(ζ * ) = E(ζ * )/N 0 .This is a guaranteed lower bound to the cost since all penalty terms are bounded from below by zero.
• CO 2 reduction: The CO 2 reduction is the central term in the DSP objective.Hence, it is also valuable to inspect it separately.We therefore compute the relative CO 2 reduction error through Eq. ( 22) E(0) is the CO 2 emission prior to discount scheduling.• Consumption deviation standard deviation: We expect the consumption deviations for each customer to be centered around zero since the problem is constrained to have a zero total consumption deviation.We therefore measure the standard deviation of the customer discount deviations as follows • Average discount changes: Since we strive to reduce the changes between two discount categories as much as possible, we measure the average discount changes: where δ refers to the Kronecker-Delta.• Average relative cost savings: Not a quantity that is optimized for in the objective, but interesting for the DSO, is the relative cost savings per customer, as defined in Sec.III-D.To obtain a single indicator of the performance, we evaluate the mean ⟨s c ⟩ c of the relative savings.

3) Parameters
Solving the DSP for a given dataset, consisting of the consumption of N C customers at N T timesteps, requires fixing a set of open variables and parameters.In a real-world scenario, the customer price elasticity on demand χ c could be measured from the individual customer's behavior.However, as it only acts as a proportionality constant, we set χ c = 1 for this investigation.Next, we use five discount categories, with a 50% discount maximally.That, in turn refers to the following valid discounts z c,t ∈ { −50%, −25%, 0%, 25%, 50% }.
As a consequence, a discount of, e.g., 50% would result in an increase in the customer's consumption by 50%.
The power deviation bounds ∆p t are set to a constant 10% of the average total consumption (0.1 × ⟨D t ⟩ t ).For the purpose of this novel problem formulation and benchmarking regarding scalability and solution quality, this is a pragmatic approach to approximation.In practice, however, those values may be derived from real-world grid constraints that can be inferred through power-flow calculations.
Finally, the remaining penalty parameters are fixed by analyzing a small-scale example with Gurobi and dialing in the strengths of the penalties, such that they have a reasonable effect for the Gurobi result.It is important to note that a comprehensive examination of the solver's response to parameter settings is beyond the scope of the current investigation.
An overview of all parameter settings is given in Table 3.

B. EXAMPLE WITH 100 CUSTOMERS
We first examine the optimization result of the different solvers in detail for a 100-customer example qualitatively before focusing on the previously discussed metrics.For that, we analyze the solutions of four solvers, Gurobi, Leap, and two m = 50 decomposition methods with the same solvers as the sub-routine.The results for the discount matrices z c,t can be seen in Fig. 3, while their overall effect on the consumption is displayed in Fig. 4. Visually, the individual discount matrices exhibit distinct patterns (cf.Gurobi and Leap), but the effective result stays comparable regarding the CO 2 reduction.The optimal CO 2 reduction is 12.45 kg, Leap differs by 0.3%, Gurobi by 0.6%, Decomp-Leap by 1.4% and Decomp-Gurobi by only 0.005%.Apart from the global optimization metrics, we are also interested in how the optimization performs per customer.In Fig. 5, one can see how the relative consumption changes are distributed.Furthermore, Fig. 6 visualizes the distribution of  The standard deviation of all customer consumption deviations (6) the average discount changes (7), which we both want to be small.The bottom right pane displays the average relative savings ⟨sc⟩.The error bands indicate the maximum and minimum of the three runs.
savings to the customers.Lastly, it remains important to note that the results for the Leap solvers vary throughout multiple runs.Here, only a single run has been picked, which is characteristic of the behavior of these solvers.Furthermore, no investigation towards direct QPU access has been made since the space requirements for a single customer are already 76 integer variables, i.e., 228 binary variables.The problem after gathering multiple customers in a chunk is, hence, not embeddable in the QPU since we are facing quite dense connectivity in the QUBO.For a reduced problem size, we perform investigations in Sec.V-F.

C. SCALING ANALYSIS
To test the performance of different solvers, we created test instances using generated data with N C ranging from 25 to 3200 customers and considering the full 76 timesteps.Our problem instances, therefore, consist of 1,900 to 243,200 integer variables.To account for the stochasticity of the results from the quantum solvers, we run the quantum solvers three times.
The results in terms of the objective function depending on problem size are visualized in Fig. 7.It is evident that a crossover in performance between Gurobi and Leap happens between 100 and 200 customers.After that size, Gurobi is not able to finish the root relaxation within the given time bounds and falls back to a heuristic solution, which has inferior performance.Although not a directly fair comparison since Gurobi runs on a local machine while the Leap hybrid solver is run on a proprietary D-Wave cloud architecture, we argue that the pattern generalizes, i.e., the inflection point where Gurobi doesn't reach satisfactory results anymore shifts to larger instances but eventually happens.In the regime N C < 200, Gurobi's MIP Gap roughly coincides with the relative cost error since the lower bound of Gurobi is almost equal to C(ζ * ).For low problem size, the Leap solver demonstrates high relative cost error, however, this error decreases rapidly up to low hundreds of consumers.Nonetheless, the decomposition routines outperform the general-purpose solvers, especially the purely classical Decomp-Gurobi approach.
Fig. 8 shows the relative CO 2 reduction error and three percustomer metrics.The relative CO 2 reduction error shows a similar pattern as the inset in Fig. 7, which is due to the emission reduction being the main part of the optimization objective.It is apparent that the decomposition routines in the higher problem instances produce results with almost perfect CO 2 reduction (less than 10 − 5 error), which can be explained by the fact that they have access to the best emission reduction bound.As a consequence, the per-customer penalties (Eq.( 6)-Eq.( 8)) are responsible for the cost error visible in Fig. 7.
Investigating the per-customer constraints, we notice that the Gurobi-based solvers outperform the quantumenhanced routines (N C < 200).This is likely due to Gurobi being better at handling smaller changes in the optimization objective.However, it is also important to note that, as apparent from the discount matrices in Fig. 3, Gurobi gives many customers not even a single discount.Hence, they do not receive any discount changes or consumption deviations, which reduces the average measure.
Examining the heuristic solutions of Gurobi, when solving its root relaxation aborts (N C ≥ 200), reveals that the discount matrix is almost completely filled with extremal values z c,t = ±z max .The constraints are satisfied, but the discounts are randomly distributed, which allows for computing the per-customer metrics analytically, supposing z c,t = ±z max with equal probability.The consumption deviation metric

Runtime [s]
Decomp-Gurobi [5] Decomp-Gurobi [10] Decomp-Gurobi [25] Decomp-Gurobi [50]  from Eq. ( 33) simplifies to z max ⟨D −2 c t d 2 c,t ⟩ c , which is e.g.0.064 in the N C = 400 case.The average discount change metric reduces to the probability of observing one discount change (0.5).Finally, s c = z 2 max = 1/4, since the customer savings are dependent on a weighted average over z 2 c,t = z 2 max = const., see Eq. ( 18).To conclude this analysis, we remark that Gurobi struggles at large problem sizes since its root relaxation cannot be solved within the given time constraints, which indicates a potential advantage of the quantum-enhanced solver here.Yet, the domain-specific decomposition routine provides even better results, especially in conjunction with the classical solver.We argue that since the decomposition-based solvers work so well, the space of good solutions is large, which makes this problem a fitting choice for heuristic-based solvers more than mathematical solvers, like Gurobi.

D. CHUNK SIZE EFFECT
After we observed that the decomposition solver provides satisfying results both with Gurobi and Leap employed as sub-solver, we are interested in what impact the chunk size has on the result.For that, we only inspect Decomp-Gurobi with different chunk sizes m = 5, 10, 25, 50 and focus on a reduced problem size frame up until N C = 800.We have seen that the problem complexity does not grow linearly with the problem size.Thus, we give a more generous timeout of 0.5 s × m in this investigation in order to isolate the effects of the decomposition routine from the solver performance 4 .The global effect, i.e., how much CO 2 is reduced, does not differ between the chunk sizes (below 1% error).The constant sequential updating of the objective also helps a lot with finding the best CO 2 reduction, even with five customer chunks.Fig. 9 shows the consumption deviation and discount changes, where a clear tendency that larger chunks result in lower per-customer metrics can be observed, i.e., less consumption deviation per customer and fewer overall discount changes.

E. FAIRNESS ANALYSIS
The goal of this section is to investigate how the solvers strategically distribute the discounts to the target customers.This is done by investigating how the relative savings s c are distributed between individual customers.Fig. 6 and Fig. 10 show two cumulative distribution plots of the results from 100 and 800 customer problem sizes.The more vertical (zero slope) a given cumulative line is, the fairer the discounts are distributed among the consumers, thereby implying a better social-welfare measure for the energy consumers.Except for Gurobi, the qualitative patterns of the solvers are similar.Leap produces a fair savings distribution, which means that all customers experience the same savings (10%-15%).
In Fig. 6 the splitting in half of the decomposition can be observed quite remarkably.The resolution of the 16 individual chunks in Fig. 10 is no longer possible.However, a kink in Decomp-Leap can be observed, which means that about 70% of the customers save a similar and relatively large amount (22%-25%), while fewer savings are distributed to a smaller group (10%-22%).Decomp-Gurboi reveals a straight but shallow curve, which means that customers will receive savings between 0% and 20% almost equally likely.

F. DIRECT QPU-ACCESS WITH DECOMPOSITION
A quantum annealing processor, such as D-Waves Advantage 4.1, suffers from limited connectivity between the physical qubits.However, for our QUBO sub-problems (29), we can analytically compute the number of couplings for a single qubit as follows: where Q is the number of qubits required to encode the discount, N T is the number of timesteps, and m is the number of customers per chunk.This term is derived by inspecting the terms in the QUBO formula and observing that we either have couplings within all customers of a chunk at a single timestep or couplings within all timesteps of a single customer.For the first case, one qubit is connected to all Q qubits of the other m−1 customers in the chunk and to Q−1  FIGURE 12.All metrics for the small problem sizes, also considering QUBO sub-problems that can be solved using D-Wave's QPU.Comparing both Quantum routines, one can observe that Decomp-QPU returns results with slightly lower cost than Leap.Yet, the CO 2 reduction is vastly better in the decomposed case.Notably, the difference of the Decomp solvers in the relative cost error is mostly due to the higher consumption deviation from the QPU results.To match the runtime of the Decomp-SA to Decomp-QPU, we had to manually adjusted the number of samples from the simulated annealing routine to 30.We also observe that Leap outruns the set timeout here.
qubits of the same customer.Analogue for the second case, but the Q − 1 connections within the timestep have already been covered in the first case.The derived quantity grows with the problem size, but the couplings per qubit of the D-Waves Pegasus graph is a constant 15 [11].Thus, physical qubits have to be chained together to logical qubits in order to allow for higher connectivity.Finding the best, so-called embedding, is itself an NPhard optimization problem, for which we utilize D-Wave's heuristic MinorMiner.
Fig. 11 shows the computed embeddings for the subproblem QUBOs with different problem sizes.It is apparent that we are very limited to small problem sizes.Since we do not want too few customers in a chunk to preserve flexibility, we settle at a reasonable middle ground of chunk size six and 12 timesteps.We interpolate the original data to 12 timesteps and use various (multiples of 6) customer sizes to compare the performance of Decomp-QPU against the other solvers.For each sub-problem, we take 100 readings from the QPU.We cannot directly steer the timeout in this case.Thus, we first run Decomp-QPU and then set the timeout of the remaining solvers to exactly that time.However, Leap has a minimum runtime of 5 s, which is the reason why we only include Leap in the cases where the Decomp-QPU time is more than 5 s, being the case from N c = 480 onwards.Embedding times are not considered since the embeddings are computed beforehand and remain constant for one timestep, chunk size (N T , m) combination, independent of grid data.
Again, we perform the analysis for different problem sizes, reaching from 60 to 1920 customers or 720 to 23,040 integer variables.The sub-problems comprise 72 integer variables, resulting in 216 binary variables in the QUBO formulation.In contrast to the previous analysis, we additionally investigate Simulated Annealing (SA) as a sub-problem QUBO solver in this instance.Due to the larger problem sizes of the previous sub-problems, the SA routine could not return results within the runtime boundaries we had set.Fig. 12 displays the results of the experiments.The previously discussed solvers (Gurobi, Decomp-Gurobi, Decomp-Leap) exhibited similar patterns to the investigation done for the larger problem-sizes (Fig. 7).Therefore, we only focus on the QPU and SA-based decomposition routines and Leap.
Decomp-SA exhibits the lowest cost error compared to the two quantum-enhanced methods.The two Decomp solvers (one using classical and the other using quantum compute) demonstrate similar performance in terms of CO 2 reduction.Interestingly, the consumption deviation, and therefore also the dominating factor in the cost, are measured at a very constant level between the problem sizes.Curiously, SA, as the sub-solver, performs better (≈ 2%) concerning consumption deviation than the QPU does (≈ 8%), leading to a gap in the cost.Leap exhibits similar performance as in our previous experiments.Most notably, although, is that Decomp-QPU seems to perform better than Leap regarding the optimization objective (1.3% error versus 1.7% error), and concerning the CO 2 reduction (4.5 × 10 −8 error versus 3.4 × 10 −3 error).That indicates that our developed hybrid quantum routine does seem to outperform the generalpurpose Leap for this particular task.

VI. CONCLUSION
We explored the feasibility of current quantum computing techniques for DSR by developing a mathematical formulation that utilizes discount scheduling to shift grid load to more appropriate times.Our formulation involves providing discretized discounts to multiple customers at different times to incentivize a load shift while ensuring the total consumption stays fixed.We chose CO 2 emission reduction as the main objective for our DSP implementation of DSR.With secondary objectives, such as maintaining grid stability and ensuring customer well-being, we formulated a QIP problem.
Upon close inspection of the problem, we developed a custom decomposition algorithm that compartmentalizes the problem into customer chunks.These sub-problems involve unconstrained integer optimization and can be effectively addressed on quantum computers if encoded correctly.Moreover, since the problems are solved sequentially, we incorporated the accumulated errors into the subsequent optimization problems.Lastly, we developed a post-processing algorithm that further refines the solution.
In the end, we benchmarked the performance of a classical general-purpose solver against D-Wave's Leap hybrid quantum-classical solver and our customized decomposition method with various (quantum or classical) sub-solvers employed.We observed that the classical solver fails to produce acceptable results after a specific problem size when using a linearly increasing timeout for the problem size.In contrast, the quantum-enhanced Leap continues to provide satisfactory results.This indicates a potential advantage of solving this particular problem using Leap over the purely classical counterpart, Gurobi.Nonetheless, the decomposition method with the classical solver as sub-solver developed the best-achieved results over the range of problem sizes we investigated.Furthermore, using quantum or simulated annealing for the QUBO problems has resulted in good performance.We found that decomposition paired with quantum annealing returned comparable energies to Leap.
We remark that the pairing of the decomposition method with Leap with large chunk sizes might be a promising pathway for utilizing the quantum-enhanced method for huge instances of this problem.This statement requires further experiments, but we argue that solving large sub-problems within time constraints may pose challenges for Gurobi, whereas Leap could yield acceptable results.Further future work includes the response of the solvers to different problem parameter settings and making the grid constraints more physically realistic rather than our realistic yet pragmatic chosen constant band.
Lastly, determining precise energy requirements of quantum computing hardware and the trade-off between quantum computing algorithms runtime and used energy is an active and interesting area of research [56]- [58].If the community can demonstrate practical quantum advantage (e.g.quantum runtime of seconds or hours as opposed to weeks or years for classical HPC runtime) for use-cases which themselves reduce CO 2 emissions, perhaps this may offset the potential CO 2 cost resulting from the manufacturing or operating future quantum computers.We believe this work on how quantum computers may solve optimization problems related to the sustainable energy transition by embedding sustainability notions into the use-case itself is a progressive step towards using quantum computing for important global issues.

APPENDIX. GLOSSARY
For a better overview of the used symbols and parameters in the formulation of the DSP, we provide an overview in Table 4.

2 )
Consumption deviation penaltyCustomers should not change the total energy they consume over the optimization horizon, i.e., t d c,t z c,t ≈ 0 ∀c ∈ {1, . . .N C }.

FIGURE 1 .FIGURE 2 .
FIGURE 1. Overview of the decomposition routine.The problem is split into sub-problems.Solutions can influence the following sub-problems via sequential updating.Finally, sub-solutions are gathered to a full solution and a post-processing step is employed that improves the solution quality greedily while also making the power restriction constraint is satisfied.

FIGURE 3 .
FIGURE 3. The discount matrices zc,t found by the investigated solvers for N C = 100.Blue indicates a discount, and red corresponds to a penalty.White means no discount given at all.Despite their effects on the overall consumption (see Fig.4) being the same, the discount matrices differ a lot from each other.It is apparent that Gurobi hands out the discounts more greedily than Leap, indicating a bigger impact of the regularization.

FIGURE 4 .FIGURE 5 .FIGURE 6 .
FIGURE 4. The effect of the DSP solution for problem size N C = 100.The plot shows the aggregated consumption with and without (z = 0) discounts in place, as well as the grid CO 2 intensity.Visually, the solutions of all solvers produce a similar effective consumption change, as already predicted in Sec.IV.As expected, times with high CO 2 emissions produce an effective decrease in consumption and vice versa.

4 FIGURE 7 .FIGURE 8 .
FIGURE 7. The relative cost error for different solvers with respect to problem size N C .Cost is the optimization objective known from Eq. (5)-(8).The relative value is taken with respect to the bound known from C(ζ * ).The inset shows the relative cost error with logarithmic scaling.The error bands indicate the maximum and minimum of the three runs.

FIGURE 9 .
FIGURE 9.Per-customer metrics evaluated with different chunk sizes in the decomposition.As expected, the metrics improve (get smaller) as the chunk sizes get larger since more flexibility remains in the chunk.

FIGURE 10 .
FIGURE 10.A cumulative distribution plot of the relative savings of the customers at N C = 800.As discussed earlier, Gurobis root relaxation does not finish anymore, which causes savings of around 25%. Leap produces fair discounts, similar to Fig.6.The other two solvers produce more complex, unfair savings distributions.

40 FIGURE 11 .
FIGURE 11.Embeddable sub-problem size for the D-Wave Advantage 4.1 QPU.The left-hand matrix shows how many physical qubits are needed when a sub-problem with N C customers and N T timesteps are embedded.A white field indicates that no embedding has been found.The right-hand plot shows the maximal chain length for the found embedding, i.e., how many qubits are maximally connected to form one logical qubit.All embeddings were found using D-Wave's MinorMiner package.

TABLE 2 .
Overview of the investigated Solvers

TABLE 3 .
Parameter setting for the investigated problems