A Machine Learning-Assisted Distributed Optimization Method for Inverter-Based Volt-VAR Control in Active Distribution Networks

The number of smart inverters in active distribution networks is growing rapidly, making it challenging to realize a fast, distributed Volt/Var control (VVC). This work proposes a machine learning-assisted distributed algorithm to accelerate the solution of the VVC strategy. We first observe the convergence process of the Alternating Direction Method of Multipliers (ADMM)-based VVC problem and explore the potential relationships between the convergence and time-series regression. Then, the long short-term memory (LSTM) technique is applied to learn the convergence process and regress the converged values of the dual and global variables with previous ADMM observations. After that, the LSTM-assisted ADMM algorithm is proposed, where the regressions are used for ADMM parameter updates. In this algorithm, the inputs of the LSTM model are carefully designed since the complementary conditions implied in the conventional ADMM should be considered. Unlike existing methods, the proposed method does not use the LSTM to determine the VVC strategy directly, indicating that it is non-intrusive and can satisfy all safety constraints during operations. The proof of its optimality and convergence is also given. The numerical simulations on the 33-bus distribution system demonstrate the effectiveness and efficiency of the proposed method.

improve the voltage quality and avoid potential voltage violations in ADNs.Some standards are published to require DGs equipped with smart inverters to provide reactive power control.IEEE Std 1547.9-2022 standard stipulated the voltage support requirements through the reactive power control [1].Also, the European grid code EN 50438:2013 [2] and the German grid code VDE-AR-N 4105:2011-08 [3] had similar requirements for IBDGs providing reactive power through power factor adjustment.
Usually, VVC consists of two types: local control of the lower level and upper-level optimization.The local control focuses on the voltage droop control of the smart inverters.It can adjust the reactive power output of the inverter with the voltage droop curve surrounding the setting point [4].As the proportion of renewable energy connected to the grid continues to increase, local VVC control has also been further developed to deal with uncertainties [5].On this basis, artificial intelligence technology is introduced to learn the droop curve to regulate the voltage better [6], [7].The upper-level operation focuses on determining the setting points of local voltage control.The upper-level control collects the system information to uniformly set the inverters' operating points to realize a global optimal regulation.In this work, the latter is studied in more detail.
Currently, the research about VVC is developed using centralized and distributed methods [8].Centralized methods collect the information and states of the whole ADN and then optimize the control strategies uniformly [9], [10].The commands will be solved and issued from a central operator to each IBDG [11].With the development of the active distribution network, the hierarchically-coordinated method was proposed to integrate the inverter's droop control to realize a balance between uncertainties and real-time control [12].Taking into account the uncertainties, the stochastic programming technique was used in the VVC problem [13], [14].Robust optimization also caught the researcher's attention [15], [16], which could ensure the voltage fluctuates in a reasonable range under worst-case operational situations.Furthermore, the risks were introduced to the formed optimization problem [17].Different risk indicators were used to evaluate the possibility of constraint violations, including conditional value-at-risk [18], conditional risk [19], etc. Chanceconstrained optimization was also developed to guarantee all the operational and security constraints [20].
The performance of VVC with centralized methods suits well for small-scale ADNs.However, the rapidly growing IBDG equipment in the ADN requires a huge demand for communication [21] to obtain a centralized VVC strategy.Also, the formed optimization model could be computationally expensive and cannot be solved centrally [22].These reasons make the centralized methods challenging to apply.
Distributed methods are then developed to deal with the above problems.They are allowed to solve an optimal solution without a central coordinator [23].Each agent collects the information of its own area and determines the corresponding operation strategy in a distributed manner, which is suitable for the VVC in future ADNs.Most of the studies were based on consensus-based control.Ref. [24] proposed a reactive power sharing method in inverter-based microgrids with consensus-based distributed voltage control.A distributed cooperative controller was developed with consensus protocol to stabilize the voltage of wind farms [25].The alternating direction method of multipliers (ADMM) is developed from the consensus-based method and has become quite popular [26].With ADMM, VVC could be implemented online based on a linear power flow model even facing line failures [27].A fully-distributed ADMM-based VVC method was then proposed [28], [29].The information exchange between the separated areas replaced the exchange between the areas and the central coordinator, thereby improving communication efficiency.Moreover, different power flow models and devices were considered in the VVC problem solved by ADMM [30].
Although ADMM has lots of advantages in solving the VVC problem distributedly, it still requires hundreds or even thousands of iterations to let the distributed solutions converge to the global optimum.It seems that there is no need to accelerate the convergence process of the ADMM since the calculation time is indeed very short if we only look at the running speed of the code.However, the actual operation will be quite different from the simulation process when we apply the distributed control method to the actual power system.
The real application of the distributed VVC not only involves solving the optimization model but also involves the data transmission between different controllers.As the iterative process progresses, problems such as data out-of-sync, data delay, and data loss may occur during each data transmission process.All of these may lead to a decline in data quality.Although there is much research on improving communication speed and data quality in power system communication, we still need to point out that the more iterations of ADMM in the VVC control process, the greater the possibility of data quality degradation.Therefore, reducing the number of distributed control iterations will effectively reduce the communication requirements in the power system, reduce the construction cost of communication equipment, and improve communication quality as much as possible.
To improve the convergence of the ADMM, methods including self-adaptive ρ [31], Nesterov-type ADMM [32], and over relaxed ADMM [33], etc, are proposed.These methods achieved good convergence speed and effect by adjusting the model and parameters.
Recently, machine learning (ML) techniques have been used in power system operation [34], giving us some insights to accelerate the solution of VVC with ADMM.Noticing that VVC is a special case of optimal power flow (OPF) problem, we mainly investigate different applications of ML on OPF.Neural network-related techniques were used to estimate the power flow solutions to DC [35] and AC models [36], [37], [38].Deep reinforcement learning [39], [40] was also applied for the same purpose.Some scholars have further developed ML-based distributed methods based on centralized methods.ML-based decentralized framework [41] and multi-agent reinforcement learning [42], [43] were developed for OPF problems as well.
Notably, most of the applications were intrusive, meaning ML is used to regress and predict the solution to OPF problems directly.The ML model is supposed to replace the actual physical model, which may lead to some feasibility issues.In general, some corrections could be made to the final solutions obtained from ML [36].Besides, the "optimality" of solutions is also challenging to be guaranteed.
This work proposes a non-intrusive long short-term memory (LSTM)-assisted ADMM method to solve the VVC strategy distributedly and efficiently in a three-phase balance distribution system.LSTM is used to learn the dynamic convergence process.Compared to the other methods, the potential of ML in OPF can be further explored with our method.The contributions are threefold: 1) An LSTM-based distributed method for the VVC in ADNs is proposed considering flexible IBDGs.It is noteworthy that the convergence process of the ADMM-based VVC can be regarded as a time series regression.Thus, the LSTM technique is applied to learn and regress the converged values.The regressions will be applied to update the parameters of ADMM for the acceleration of distributed solutions.2) A guideline for designing the LSTM model to speed up the solution is given.Considering that some special conditions need to be satisfied during convergence, the general rules are given to select inputs and outputs for the LSTM model in the proposed method.Also, we give all the detailed hyperparameters for model training.
3) The convergence and optimality of the VVC strategy solved with LSTM-assisted ADMM are proved.The proposed algorithm is non-intrusive, indicating that the acceleration is achieved by convergence parameter improvement rather than regressing the distributed VVC strategy directly.Therefore, this acceleration method does not affect the optimality of distributed solutions.In the following, Section II establishes the VVC optimization model with inverter-based sources.Section III gives a brief introduction to the area partition and conventional ADMM for the VVC problem.Section IV proposes the LSTM-assisted algorithm and illustrates all details of its application.Section V verifies the effectiveness and efficiency of the proposed method.Section VI concludes the paper.

II. MATHEMATICAL MODEL FOR VOLT/VAR CONTROL
This section establishes the mathematical model for the VVC problem first.The constraints and objective function are given.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
ADMM algorithm can be applied to that model to realize a distributed optimization for VVC.

A. Power Flow Balance Constraints
The undirected graph model is always used to model the ADN with the radial structure.Here, G(N , L) is denoted as the topology of the ADN, where N and L are bus and line sets, respectively.In this work, the LinDistFlow model [44] is used to describe the power flow balance of the ADN with a radial structure.
Let v i denote the voltage magnitude of the voltage at bus i ∈ N .V 0 is the reference value, which usually equals to 1 p.u.Let p i and q i denote the active and reactive power injection at bus i ∈ N , respectively.p Di and q Di represent the active and reactive load demands at bus i ∈ N .p Gi and q Gi represent the active and reactive power injections at bus i ∈ G DG , where G DG is the bus set equipped with DGs.For each distribution line (ij) ∈ L, let p ij and q ij denote the active and reactive power flows.r ij and x ij represent the resistance and reactance of the line (ij).With these denotations, the model can be given as: Equations (1a) and (1b) reflect the active and reactive power balance of the ADN.Equation (1c) represents the nodal voltage relationship between two adjacent buses.Equations (1d) and (1e) calculate the active and reactive power injections on a single bus, respectively.Equation (1f) sets the voltage as 1.0 p.u. since we assume that the point of common coupling locates at bus #1.

B. Operational Constraints for IBDGs
Here, we assume that DGs mentioned in this work are all equipped with smart inverters, which means that the renewable DGs can realize active and reactive power control during their operation.Hence, the constraints for the operation range of DGs can be obtained as follows: where (•) min and (•) max are lower and upper bounds of variable (•), respectively.
Constraints (2a) and (2b) restrict the range of active and reactive power generation for DGs.By adjusting p Gi and q Gi , the voltage regulation for the ADN can be achieved.

C. Safety Constraints
During ADN's operation, some safety constraints must be satisfied, including nodal voltage and line capacity limits.The constraints for line capacity is: where s ij is the line capacity of the distribution line (ij).
The constraint for voltage limits is: Remark: Some machine learning techniques have already been used for estimating power flows and control strategy of ADN.
However, there might be some trouble using these strategies since these safety constraints would be violated sometimes.This work considers using machine learning to assist the existing distributed optimization technique to accelerate its solution.Hence, the optimality and feasibility of the strategy are guaranteed by optimization theory.

D. Objective Function for VVC Problem
In the inverter-based VVC problem, the operator of the ADN is mainly concerned about minimizing the line losses and voltage deviations.Since the IBDGs are involved in the VVC control, it is also necessary to minimize their costs [26].Considering that renewable DGs aim at outputting more active power and less reactive power, we then have the objective functions as follows: where and c P i indicate the coefficients of different penalty terms.The first term reduces the gap between real voltages and reference values.The second term comes from [44], calculating the line losses.The last two terms focus on the operation cost of IBDGs.Providing reactive power can be regarded as an ancillary service.It will not produce profits directly like active power output.Hence, the minimization of reactive power should be considered in the third term to avoid too much reactive power generation.The fourth term tries to make the system absorb more active power.
Remark: We have noticed that there are many other methods in the distribution network to improve the voltage problems in the system's operation, including but not limited to on-load tap charger adjustment, storage management, etc.These methods can be easily integrated into our optimization model and improve the system operation performance together.Since our method is an improvement of the existing method, it doesn't conflict with other voltage regulation methods.

III. DISTRIBUTED VOLT/VAR CONTROL WITH CONVENTIONAL ADMM
Combined with the mathematical model established in the last section, this section introduces the area partition in the VVC problem.An ADMM-based distributed implementation of the centralized model is further given, laying the foundation of our proposed distributed method.

A. Area Partition in VVC Problem
The ADN sometimes needs to be divided into different areas for independent control since the large-volume data has become a communication bottleneck to the central coordinator.Suppose the ADN is divided into several areas.The set of partitioned areas is defined as P. IBDGs are scattered in different areas.
For each area a ∈ P, define x a to represent the decision variables, including v i , p ij , q ij , p Gi and q Gi that are related to a. Considering that areas need to exchange values of boundary variables with neighbor areas, we divide variables in x a into local variables x a in and boundary variables x a adj for ease of explanation.
Fig. 1 gives an example of a three-area ADN.Taking area a as an instance, area a shares ties line with area b and c.The boundary variable x a adj can be expressed as x a adj = {x a ab , x a ac }.In general, x a adj = {x a ab |b ∈ P a }, where P a is the set for neighbor areas of a.
Fig. 1 shows that two adjacent areas share the same copies of boundary variables.These copies for these two areas should be driven to the same value in solving a distributed optimization problem.To deal with this problem, the global variable s a corresponding to each x a adj is then defined.Thereafter, we have the following consensus constraint for two adjacent areas, a and b: With these denotations, the distributed implementation for the centralized VVC optimization model can be given as: where f (x a ) is the objective functions corresponding to each a; χ a is the feasibility region for x a .a∈P f (x a ) is equivalent to the objective function in (5).
The distributed VVC model established in this study is based on three-phase balanced systems.However, with the increasing integration of renewable energy, attention has been increasingly drawn to three-phase unbalanced systems.The proposed distributed control method can also be applied to such systems.Nonetheless, it should be noted that the direct application of the proposed method to unbalanced systems is not feasible due to the use of different VVC models.Thus, local and boundary variables, which impact the inputs of the proposed method, must be redefined.
To simply clarify this point, the optimization models in [45], [46] and [47] are used.Since the currents and power flows are distributed differently across each phase, all three phases must be considered when constructing the centralized optimization model for VVC.The same principle applies to its distributed implementation.For an unbalanced system, physical quantities such as v φ i , p φ ij , and q φ ij (where φ ∈ {A, B, C} represents the three phases of unbalanced power systems) will be included as boundary and global variables.As a result, the size of the global and dual variables in an unbalanced system will be three times the size of those of a balanced system.Nevertheless, the proposed method treats the distributed VVC problem based on ADMM as a dynamic system and performs regression on the converged values.From this perspective, the proposed method is also applicable to three-phase unbalanced systems, and its effectiveness can be guaranteed.

B. Implementation of ADMM
ADMM is an efficient algorithm to solve the distributed optimization problem (7).To implement the ADMM, (7) must be reformulated as a form of the argument Lagrangian first.
Define the corresponding scaled dual variables λ a for each consensus constraint (6).The augmented Lagrangian with penalty parameter ρ can be obtained as follows: where: With ( 8), the conventional ADMM can be implemented with following steps.During the m-th iteration, the steps are: 1) Minimization step: The value of x a for each area a can be solved with the following optimization problems in parallel: 2) Averaging step: The update for global variables s a can be realized by averaging corresponding boundary variables.
By exchanging the values of x a adj between neighbor areas, each area can update s a independently.For each element Fig. 2. Convergence process of different dual variables and global variables (including voltage and power flow variables).The figure shows that there is a clear trend for their convergence, making it possible to regard the convergence as a time series.s g in s (where s = {s a |∀a}), s g can be updated with: where N (g) = {1, . . ., |N (g)|} is the set for all boundary variables x h adj ∈ {x a adj } ∀a corresponding to the global variable s g .|N (g)| is the cardinality of N (g).Taking the update of the active power flow on the distribution line (ij) and the voltage update at bus i in Fig. 1 as an example, the global variable corresponding to p ij can be updated with , while the global variable corresponds to the nodal voltage v i can be updated with can be updated with (12).
It is noteworthy that there is no need for local controllers to communicate with the central coordinator.The ADMM can be executed fully distributed because the update of x When a feasibility tolerance σ is given, the convergence criterion of the conventional ADMM would be:

IV. LSTM-ASSISTED ADMM
This section first analyzes the convergence process of the conventional ADMM.Based on the dynamic characteristics of the convergence, LSTM is applied for the parameter updating in the ADMM.Then, the framework and detailed algorithm for LSTM-assisted ADMM is given.

A. Dynamics of ADMM Convergence
Conventional ADMM often requires multiple rounds of iterations to converge.Fig. 2 exhibits the parameters change in the first 200 ADMM iterations of the 33-bus system for solving the VVC problem.These curves show that the changes in s will change periodically.Besides, some studies in [48], [49], [50] reveal the potential connections between ADMM convergence and dynamic systems.Intuitively, the parameter convergence in ADMM can be regarded as a time series.The problem of ADMM acceleration can be approximately equivalent to a time series regression.

B. Application of LSTM on ADMM Convergence
The dynamics of ADMM convergence inspire us in terms of speeding up solutions.If we could learn the convergence process of ADMM, then the values of dual and global variables can be regressed by using previous rounds of parameters directly.In this work, LSTM is used for learning convergence.The flow chart of LSTM participating in ADMM acceleration is shown in Fig. 3.
It can be seen from Fig. 3 that the application of LSTM includes two parts, which are model training and application.In the training part, the historical data (including loads and DG outputs) is brought into the VVC model.The optimal operation strategy is solved with conventional ADMM for each historical trial.The changes in dual and global variables are recorded to train the LSTM model.In the application part, we use the trained LSTM model to regress their converged values when solving the VVC model distributedly with new data.The regression values are then used in the following iterations with steps in Section III-B, accelerating the convergence.
Remark: This work only focuses on the regression of global and dual variables.Other ADMM parameters, such as penalty parameter ρ, are not learned in this work since they do not have similar dynamic features during the convergence process.Neural network technology will be more suitable for the learning of such parameters.

C. LSTM Model
After we illustrate how the LSTM can be used in improving the efficiency of conventional ADMM, the details of the LSTM model are given.Scenario generation, model structure design, and data processing are introduced.
1) Scenario Generation: Different scenarios can be generated with various methods, which can be used for model training.In this work, Monte Carlo simulation is used.Scaling factors ξ D i and ξ G i are set for the load at each bus and power generation of renewable DGs.With these definitions, we have: where ξ D i and ξ G i follow the standard uniform distribution, i.e., ξ D i , ξ G i ∼ U (−0.1, 0.1).(•) is the historical data, while (•) is the simulated data.(•) and (•) include p Di , q Di and p Gi .Conventional ADMM can be implemented with these scenarios to obtain the changes in dual and global variables for LSTM training.
Remark: Different operating conditions may correspond to different scenarios, which will impact the effect of LSTM on ADMM acceleration.To deal with this problem, we can train an LSTM model for each condition to prevent the degradation of the model's performance.Also, scenario clustering can be performed for training different LSTM models.
2) LSTM Inputs: For the ADMM algorithm, the optimal distributed operation strategy can be obtained directly once the converged values of global and dual variables are known.Intuitively, if we can use the values of global variables and dual variables solved in the previous rounds of iterations to regress (or predict) their converged values, the regressions can then be used to accelerate the ADMM.
Having all global and dual variables as inputs to every LSTM unit is an option.However, Lemma 1 must be satisfied when using ( 10)-( 12) for ADMM iterations.Thus, the inputs of the LSTM model for the converged value regression need to be further designed to suit Lemma 1.
Lemma 1 (Complementary condition): Suppose that λ adj (g) = {λ h adj |h ∈ N (g)} are the corresponding dual variables of the global variable s g ∈ s, where N (g) is the index set of all λ h adj .For each s g , the corresponding λ h adj updated by ( 12) should satisfy the following constraint at each iteration m.
Proof: See the Appendix.Lemma 1 ensures that the ADMM that iterates can converge to the global optimal solution.In the conventional ADMM, we just need to ensure h∈N (g) λ h(0) adj = 0 before the first iteration.Usually, we set λ h(0) adj = 0. Then we get Lemma 1 guaranteed automatically in the following iterations (See Appendix).
If LSTM is used directly to regress all global and dual variables involved in the ADMM iteration, the ADMM with regressions may not converge to the optimal solution because the LSTM cannot guarantee that ( 16) is always satisfied.The s a can no longer be updated with (11) as well.At this time, it is meaningless to discuss whether LSTM improves the computational efficiency of ADMM.
To let the regressions obtained from LSTM satisfy the special conditions stated in Lemma 1, we further design the inputs of LSTM.One solution proposed in this work is to select only some of the dual variables and regress their converged values with the trained LSTM model.Suppose there are |N (g)| elements in set N (g).Define the last element in N (g) as H.In this work, the LSTM model can be designed with (N (g) − 1)-dimension {λ h adj |h ∈ N (g)\H} as part of the inputs.
With the trained LSTM model, the converged values of {λ h adj |h ∈ N (g)\H} can be regressed.Then with the regressions and Lemma 1, the value of λ H adj can be further obtained by solving: This designation of the model inputs ensures that the complementary condition (21) can always be satisfied.
According to the above analysis, the inputs and outputs of the LSTM model can be defined as {s g , λ h adj |∀g, ∀h ∈ N (g)\H}.All these inputs are concatenated and then pass through the LSTM model.In the following contents, {s, λ} is still used to represent {s g , λ h adj |∀g, ∀h ∈ N (g)\H} for simplicity.3) Data Processing: Since the acceleration of ADMM convergence is modeled as time-series regression, layer normalization, instead of batch normalization, is used to normalize features at each time step [51].Unlike the latter, layer normalization depends only on the inputs to a layer at the current time steps and can substantially reduce the training time.
When applying machine learning techniques, we usually need to normalize the inputs (i.e., features) only, not the outputs (i.e., data labels).However, only data from the first M -th iterations are used to regress converged values that require hundreds of iterations.As shown in Fig. 2, the initial value of the dual variable is 0, while their values may grow significantly after hundreds of iterations.If the LSTM model is directly trained with unprocessed outputs, we find it difficult for it to converge.For this reason, labels in the train set need to be scaled as follows: where λ * is the output after scaling; λ * is the original output.τ is a constant that is only used for scaling all labels in the train set.Our goal is to scale λ * a , ∀a to an order of magnitude close to inputs {λ (m) |∀a} M −1 m=0 .In this work, we take τ = 1000.We trained the LSTM model with labels before and after scaling.The results turn out that the training process barely converges without label pre-processing.
When we use the trained model to predict the converged value, we can multiply the obtained regression value λ * a , ∀a by τ and then use the values for subsequent ADMM iterations.
Remark: Only the values of dual variables need to be scaled.Since all global variables (including voltages and power flows) are processed as per unit value during calculation, their value changes will not be very obvious (as shown in Fig. 2).Therefore, there is no need to scale the global variables in labels.
4) Structure of LSTM: LSTM is a type of recurrent neural networks which can be used to deal with the regression problem of time series data [52].Usually, the LSTM model is formed with several LSTM units.Each unit consists of a cell, an input gate, an output gate, and a forget gate [52].The cell is used to remember values over arbitrary time intervals, while three gates are designed to control the information flows.In this paper, we do not focus on the specific structure of the LSTM unit, only on its inputs and outputs.The structure we used is shown in Fig. 4, in which H (m) and C (m) the output vector (a.k.a., hidden state vector) and cell state vector, respectively.H (M ) is the final output used for the converged value regression.
The sequence length of our LSTM model can be chosen arbitrarily.However, we want to use less information to regress the converged value since less convergence means better speedup performance.In our work, the sequence length M is set as 5 (for smaller systems), which means that we only need to collect parameters of the first five ADMM iterations before applying it for the regression.Mean squared error (MSE) is used as the loss function for model training.It is always applied for the regression and forecast tasks, which is suitable for the prediction of converged values.

D. Algorithm
According to the preceding discussions, we try to accelerate the ADMM by regressing the converged s * a and λ * a for each area.The optimal distributed control strategy satisfying the feasibility tolerance σ can be solved by bringing the regression results into (10)- (12).Even though the regressions may be biased from the true values, we can still use them as a warm start to further update the parameters until convergence.The detailed procedure of LSTM-assisted ADMM is exhibited in Algorithm 1.
In Algorithm 1, we collect the value of λ Remark: Algorithm 1 requires global communication once between each partition and the central controller.The dual and global variables in the first M -th iteration iterations need to be input into the trained LSTM model for the regression of the converged value.However, since only one communication is performed, we can mitigate the impact of global communication by designing a reasonable communication method.
The regression is made only once at the M -th iteration.After that the ADMM iteration will continue to be executed according to ( 10)-( 12) until convergence.It is noted that LSTM is only used to regress ADMM-related parameters and does not participate in the system operation directly.That is, the distributed control strategy is still determined by solving the optimization model.Therefore, the safety constraints about line capacities and nodal voltages can be satisfied.Besides, the algorithm's convergence is still determined by the feasibility tolerance σ, guaranteeing the optimality of the solved strategy.
Remark: LSTM can be replaced with other machine learning techniques for time series regression.This work just explores the potential of using time series regression to speed up ADMM and gives a practical algorithm.

E. Extension
The composition of the data set used to train the LSTM model significantly impacts the proposed algorithm's performance.Theoretically, as long as we collect enough scenarios, we can train an LSTM model to accelerate the convergence of ADMM under every circumstance.
However, we have noticed that the operating scenarios of the power system are very complex.Therefore, if the scenarios in different time periods are put into one data set to train the model, it may cause problems such as non-convergence of the model training and a decline in the acceleration effect of ADMM.
In order to make our proposed method more widely applicable, we propose an alternative plan.That is, training different LSTM model corresponding to different time periods separately.
It is noteworthy that the same time period on different days has similar load and renewable output characteristics.Therefore, if the training dataset is formed with the scenarios in only one specific time period, both the convergence speed of LSTM model training and the acceleration effect of ADMM will be much better.
This alternative is relatively easy to achieve from the perspective of engineering applications.For the VVC problem studied in this paper, the control time interval is 15 minutes, so only 96 LSTM models need to be trained to accelerate the ADMM in each time period.Of course, we can also aggregate the scenarios within 1 h and train 24 models based on this.

V. CASE STUDY
This section verifies the effectiveness of the proposed method.The parameters related to the test system and algorithm are given first.Then, the control performances of conventional methods and the proposed LSTM-assisted method are compared.Since Lemma 1 is the foundation of the proposed method, we further discuss its importance and give the corresponding results.

A. System Configurations 1) System Parameters:
The proposed method is demonstrated on the modified 33-bus system [53].Two test systems with different area partitions are tested to verify our model's effectiveness. 4 DGs are located at bus #18, #22, #25, and #33.Detailed information on the system, including system parameters, load profiles and power outputs are provided in [54].Two test systems is shown in Figs. 5 and 6.The only difference between the two systems is the area partition.The training of LSTM involves lots of setting and hyperparameters, which is given as follows.According to the structure built in Fig. 4, concatenated inputs will be fed into the corresponding LSTM units and then pass through two dense layers.All of them are with 256 hidden units.The sigmoid activation is applied in the LSTM unit layer [55]; the rectified linear activation units are used in both dense layers.The loss function -mean squared error (MSE) and Adam optimizer are used for backpropagation.The batch size is set as 32.The learning rate decays from 0.1 to 1e − 5 with a linear decay schedule.The training of the LSTM model will be stopped when validation accuracy stagnates or after 1000 epochs.
Based on Lemma 1, 12 global variables and 12 dual variables are involved in LSTM model for Case I in Fig. 5. Hence, the dimension of inputs and outputs corresponding to Case I is 24.As for Case II in Fig. 5, 15 global variables and 16 dual variables are involved, leading to a size of 31 for LSTM's input dimension.Other dual variables involved (12 in Case I; 15 in Case II) can be calculated with the regressions and (17).Then the Lemma 1 can be guaranteed automatically.
As for parameters related to conventional and LSTM-assisted ADMM, the penalty parameter ρ is chosen as 1e5.The feasible tolerance σ is set as 10 −3 , which is a practical choice [26].
All the mentioned methods are implemented using Python 3.9.5 and solved by Gurobi 9.5.1, on a personal computer with an Intel Core i7-12700 processor and 32 GB memory.

B. Effect of Inverter-Based Volt/Var Control
We first compare the voltage regulation effect with and without reactive power control to illustrate the impact of VVC.The voltage profiles of the 33-bus system under a typical scenario is given in Fig. 7.
The results show that inverters without reactive power control may face voltage issues.The voltage of bus #18 reaches 1.472  p.u., which is relatively close to the upper limit 1.5 p.u.This situation usually occurs when the output of DGs is high while loads are insufficient.IBDGs with reactive power control can alleviate this problem and guarantee voltage safety.

C. Effectiveness Analysis
In this section, we compare the results among the centralized method, conventional and LSTM-assisted ADMM.We use the optimal solution obtained by the centralized method as the standard to compare the differences between the solutions obtained by the last two methods.The voltage control effects and setting points are exhibited in Figs. 8 and 9 separately.
The distributed optimization theory [56] can guarantee the optimality of conventional ADMM.Therefore, it has the same results as the centralized method.Also, it is easy to see that the conventional ADMM and LSTM-assisted ADMM have similar precision in the solved distributed solutions.This is because the feasibility tolerances for stopping convergence are set the same for both methods.The LSTM model only provides a better regression value for global and dual variables than the initial value 0. The values satisfy Lemma 1; hence can be then used for iterations with (10)- (12), suggesting that the LSTM is only used as a parameter improvement, not a replacement of the conventional ADMM method.

D. Computational Analysis
On the premise of ensuring the precision of distributed solutions, we further analyze the computational efficiency of the proposed method.Usually, ADMM needs rounds of iterations to converge to the optimal values.Excessive time may incur the failure of strategy.
To illustrate the efficiency of the LSTM-assisted ADMM, another 100 scenarios are generated for Case I with (15).The changes in primal and dual residuals are exhibited in Fig. 10.The solid lines indicate the mean value of all runs for 100 scenarios.The shaded areas indicate the corresponding 95% confidence interval.
In the simulation, the parameters {s (m) , λ (m) } 4 m=0 of the first five iterations are input into the trained LSTM model.The model outputs are used as the 6th iteration of the following ADMM procedure.As shown in Fig. 10, both residuals have a step between the 5th and 6th iterations.This phenomenon is caused by the regression values being too different from the initial values (as illustrated in Section IV-C3 and Fig. 2).Nonetheless, the regression values are closer to the true converged values than the values obtained in the first few iterations, so the residuals decrease in a high speed.
Case I shows a special case where each boundary bus is connected to only one adjacent area.That is, |N (g)| = 2 exists for all s g .In this case, we only need to regress half of the dual variables with the LSTM model, and the other half only needs to take the corresponding inverse.Typically, VVC tends to take this special form of the area partition.Case II corresponds to a more general case, where bus #6 in area 3 is connected to area 2 and 4 at the same time.Hence, there are 3 dual variables corresponding to global variable v 6 , i.e., |N (g)| = 3.According to Lemma 1, the regression values of two of the three dual variables should be learned.The third of them can be solved with (17).The convergence process for primal and dual residual of Case II is exhibited in Fig. 11.
After analyzing the convergence process, we then compare the speedup ratio of the LSTM-assisted ADMM.Both Case I and II tested 100 scenarios, the average number of iterations with conventional and LSTM-assisted method are given in Table I.The results indicate that the proposed method has a fairly good speedup effect compared to the conventional one.
We have to admit that hyper-parameter tuning for the LSTM model greatly impacts the convergence speed.However, the LSTM-assisted method can still be better than the conventional method since the regression value is equivalent to providing a warm start to the ADMM iteration.Besides, since the LSTM technique is only used to assist ADMM iteration, the safety of the system operation can be guaranteed.That is, the constraints (3) and ( 4) in Section II-C will not be violated.
Remark: In Case II, two out of three dual variables are selected for regression.However, it is currently unclear how to select dual variables for regression to improve acceleration performance, either qualitatively or quantitatively.Our simulation Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.results suggest that choosing different dual variables can impact convergence speed, but the effect is not significant.This will be discussed in the following section.
The penalty parameter ρ is set as 2e5.All hyperparameters used by LSTM in this system are consistent with those used in the 33-bus system, except for the sequence length M , which is set as 13.We will discuss how to select M in the next subsection.
We generated 5000 scenarios to train the model, and used another 100 scenarios to verify the speed-up effect of our proposed method on the large-scale system.The comparison results is shown in Table II.
Our proposed method demonstrates strong scalability in largescale systems, achieving a speedup ratio of 2.109.Optimization of the algorithm's hyperparameters is likely to enhance the acceleration effect.We think that the proposed method can maintain a relatively stable acceleration effect, arising from the inherent characteristics of the ADMM.As previously noted, the convergence process of ADMM has a clear trend and could be modeled as a dynamic system.This attribute permits the LSTM network to swiftly learn the relevant rules based on the first M iterations of the convergence parameters.

F. Impact of Hyperparameters
The proposed method combines the traditional distributed optimization method ADMM with the machine learning method LSTM, and involves numerous hyperparameters.These hyperparameters can be classified into three categories based on their effect on ADMM, LSTM, and Algorithm 1.
1) Hyperparameters in ADMM: Penalty parameter ρ is the only hyperparameter that may affect ADMM.The value of ρ will only determine how fast the ADMM will converge.According to  our analysis, the solution of ADMM will not impact the LSTM and the proposed Algorithm 1 directly.
However, the convergence process of ADMM will be used for LSTM training.It is necessary to choose proper ρ in the used traditional ADMM to realize an overall speedup of proposed Algorithm 1.At present, there have been lots of studies on ρ selection in power system voltage control [26].In the actual application, it is sufficient to select ρ that can make the traditional ADMM method converge faster according to the existing research.
2) Hyperparameters in LSTM: Hidden units, batch size, learning rates, number of epochs, and others will affect the training and application of LSTM.In this work, tuning hyperparameters was done through trial and error, by systematically testing different combinations.
Aiming at the hyperparameter tunning in LSTM, many scholars have carried out useful explorations.Random search, grid search [58] and Bayesian optimization [59] were proposed to help tunings.These methods can further optimize the hyperparameters in LSTM.
Considering that this paper itself does not make any improvements to the LSTM algorithm, it does not discuss in-depth hyperparameter tuning in LSTM.Nevertheless, it is worth noting that the LSTM method used in Algorithm 1 can be replaced by other more advanced time series regression methods, thereby reducing the influence of hyperparameters in the machine learning techniques on our algorithm.
3) Hyperparameters in the Proposed Algorithm: In the proposed algorithm, the selection and tuning of hyperparameters also appeared.Sequence of length M (also used in the LSTM), global variables s r The selection of M : We tried to use sequences of different lengths M for the regression of converged values.The attempt results show that the data sequence growth will lead to a very long training time, and the regression effect has not been significantly improved.For the sequence from 3 to 10, we found that when the sequence length is 5 for 33-bus system (13 for 123-bus system), a better speedup effect can be achieved, and at the same time, the training time can be maintained within a reasonable range.Depending on the scale of the test system, the length of M may also need to be adjusted accordingly.We can only determine M now by trial and error.Fig. 12 r In this algorithm, all involved global variables sa are used for model training and evaluation.However, according to Lemma 1, only (N (g) − 1) of the dual variables λa can guarantee optimality.To illustrate the impact of λ a on the algorithm's performance, we conducted the following simulations.In the 33-bus system (Case II), we experimented with different combinations of dual variables and performed a total of 3 simulations (because N (g) = 3 for global variable v 6 ).While keeping other parameters unchanged, we observed that the average iterations of times the trained model converged fluctuated are 93.67,92.87 and 87.64.The experimental results show that selecting different dual variables has limited impacts on both the training and utilization of the model.However, we cannot offer strict theoretical proof and rely on simulation results to support this conclusion.We believe that the complementarity among dual variables (as described in Lemma 1) may explain why the algorithm is not very sensitive to variations in the input of dual variables.

G. Discussion
In the preceding simulations, we analyzed the effectiveness and efficiency of the proposed method.In this section, we will discuss the effect of Lemma 1 on LSTM-assisted ADMM.
Take Case I as an example.If we neglect the existence of Lemma 1, all global and dual variables will be treated as inputs to the LSTM model.The dimensionality of the model input will change from 24 to 36 dimensions.We retrain the model with Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.new inputs.Then, new regression values for all 36 inputs are used directly in the following iteration.
The variation of residuals with ADMM iterations is shown in Fig. 13.We exhibit the process of the first 100 iterations.We can explore that although the LSTM may have learned the convergence process and be able to make good regressions, the regressions may not necessarily satisfy Lemma 1.Under this circumstance, the proposed method cannot guarantee the speedup of the algorithm.When the model input is 36-dimensional rather than 24-dimensional, the iterations required for LSTM-assisted ADMM to converge often exceed 500 iterations.
Besides, the regressions of dual variables will no longer satisfy (16).Hence, the LSTM-assisted ADMM won't be able to find the global optimum since (11) cannot be used to update the dual variables in the following iterations anymore.Sometimes although the algorithm converges, the value it converges to is incorrect.We compare distributed voltage control effects under the same feasibility tolerance.The results in Fig. 14 show that the voltage magnitudes obtained from the LSTM-assisted ADMM deviate far from the optimal values.These discussions illustrate that it is essential to design model inputs using the rules given in Section IV-B.

VI. CONCLUSION
This work proposes an ML-assisted algorithm to speed up the solution of the distributed VVC problem with inverter-based DGs.This method uses LSTM to learn the convergence process of conventional ADMM.After that, the converged values of global and dual variables involved in the conventional method can be regressed with the trained LSTM model.The distributed VVC solution can be obtained faster by using these regression values.Considering that the complementary condition needs to be met when solving the ADMM-based VVC problem, we designed the structure of the LSTM model and selected inputs carefully.Since the proposed method is non-intrusive, its optimality and convergence can be guaranteed by the optimization theory.The safety constraints can be satisfied as well in the iterations.These are all proved in this paper.
We test the effectiveness of reactive power control on voltage regulation with two modified 33-bus systems.The precision of the distributed strategies solved by the conventional and LSTM-assisted methods is the same with the same feasibility tolerances.As for efficiency, the LSTM-assisted method has a faster calculation speed.We tested 100 scenarios, and the average number of iterations required to converge changed from 114.17 to 47.34, speeding up almost three times.It can be seen from the results that the ML-assisted method is superior to the conventional one.

APPENDIX COMPLEMENTARY CONDITION FOR FULLY-DISTRIBUTED ADMM
In the conventional ADMM, the following step is used to update global variables s during the iterations: Usually, s (m) a for each area a can be updated separately in their respective areas with necessary information exchange.For each element s g ∈ s, the analytical form of the s (m) g update can be as follows [56]: We can discover that each global variable s Therefore, if (11) is used to update the dual variables in the fully-distributed ADMM, the complementary condition in Lemma 1 must be satisfied, or (20) will be used for updating.In practice, (11) is more commonly used because its calculation is more straightforward, and less information exchanges are required.
We use the LSTM model to regress the global and dual variables in this work.If we still want to obtain the optimal distributed solutions, the regressions should also satisfy Lemma 1.

Fig. 1 .
Fig. 1.Example of area partition in the VVC problem.
a are completed only by exchanging the values of boundary variables among neighbors.The convergence criterion for the conventional ADMM includes the primal residual δ (m) pri and the dual residual δ (m) dual , which can be calculated as follows:

Fig. 3 .
Fig. 3.The process of LSTM-assisted ADMM training and application in the VVC problem.
a are regular (The composition of s (m) a and λ (m) a can be found in Fig. 1).Different curve in Fig. 2 represents different variables in s (m) a and λ (m) a .During the iterations, global variables s (m) a and dual variables λ (m) a

Fig. 4 .
Fig. 4. Structure of the LSTM model used to speed up ADMM.
in the first M iterations.Then, the time series data {λ (m) a , s (m) a |∀a} M −1 m=0 is fed into the trained LSTM model for converged value regression.

2 )
Algorithm-Related Parameters: With (15), 1000 scenarios are generated with ξ D i ∈ [−0.1, 0.1] and ξ G i ∈ [−0.1, 0.1].The optimal distributed solutions of these scenarios are solved by the conventional ADMM first.Then, parameters produced during the convergence are used to train the LSTM model.

Fig. 7 .
Fig. 7. Comparison of voltage regulation effect with and without reactive power control.

Fig. 10 .
Fig. 10.Changes in primal and dual residuals for conventional and LSTM-assisted ADMM (For Case I).

Fig. 11 .
Fig. 11.Changes in primal and dual residuals for conventional and LSTM-assisted ADMM (For Case II).

Fig.
Fig. The average number of iterations for convergence (33-bus and 123-bus system).
(m) a and dual variables λ (m) a are the main hyperparameters used in the Algorithm 1.
(a) and (b) show the relationship between the average number of convergences and sequence length M in two different test systems.
obtain the simpler form(11) from(20), the following complementary condition needs to be satisfied for each global variable.That is, we have: g ).Thus, for h∈N (g) λ h(m) adj , we have:h∈N (g) λ h(m) adj = h∈N (g) λ h(m−1) adj + ρ x h(m) adj − s (m) g (22)If(11) exists for variable updating, we can derive the following relationships: initial value for λ is always set as 0. With this setting,(23) can be further derived as:

TABLE I THE
AVERAGE NUMBER OF ITERATIONS FOR CONVERGENCE (FOR 33-BUS SYSTEM CASE I)

TABLE II THE
AVERAGE NUMBER OF ITERATIONS FOR CONVERGENCE (FOR 123-BUS SYSTEM)