Two-stage ANN-based bidding strategy for a load aggregator using decentralized equivalent rival concept

As an intermediator between the wholesale electricity market and retail market, a typi-cal load aggregator submits an optimal bid to the system operator to meet the expected demands of its customers. In this regard, the provision of an effective optimal bidding strategy is very crucial for a load aggregator to increase its proﬁt. Within this context, this paper proposes a two-stage artiﬁcial neural network based adaptive bidding strategy procedure for an LA by revealing, modelling, and predicting the aggregative behaviour of the competitors in an hourly electricity market. To this end, we develop the concept of decentralized equivalent rival whose behaviour in the electricity market reﬂects the aggregation of behaviours of all individual competitors. Also, an equivalent market which its outcomes are approximately equal to those of the real market is modelled. The equivalent market’s participants are the load aggregator and its corresponding DER. The proposed approach is capable enough to consider transmission constraints. The performance of the proposed approach has been examined on an illustrative example and the IEEE 30-bus test system by considering transmission network constraints. The proposed artiﬁcial neural network-based adaptive bidding strategy has compared with a Q –learning-based bidding approach and the results are analysed.


INTRODUCTION
In order to meet the expected demand of its customers, the load aggregator (LA) purchases the required energy by offering its optimal bid in the wholesale electricity market. In fact, LA can be considered an intermediary entity between the system operator and scattered small-scale consumers. Under the participation of LA, not only, small-size consumers are able to take part indirectly in the electricity market, but also, more flexibility can be procured from the perspective of system operation. However, all of these benefits are realized when an effective bidding strategy for LA is adopted. With respect to LA bidding strategy, a few remarkable works can be mentioned. In [1], a bidding strategy for LA has been proposed to reduce the risk of profit loss arising from the price volatility. In [2], a bi-level optimization model has been adopted to maximize the profit of the LA in a day-ahead forward market including energy and reserve. Artificial Neural Network (ANN) and price and load reduction using ANN and fuzzy logic, a bidding strategy of LA is proposed in [3] with a case study in Taiwan.
Designing an effective bidding strategy approach for an LA is highly dependent on the rivals' behaviour. This is an important lack of information which is observed in the above-mentioned papers.
In the literature of the electricity market, a few techniques have been devoted to providing additional information about competitors such as scenario-based approaches [4,5], price and load forecasting [6][7][8][9][10][11][12][13][14], and competitors' behaviour analysis [15,16]. In scenario-based approaches, uncertainties such as rivals' offers, market prices, wind-power productions, and demand's bids have been modelled using a set of scenarios. Scenario-based approaches are not able to precisely model the competitors' behaviour and the impression of rivals' behaviour on market outputs and participants' results. Forecasting techniques have small average errors, but these approaches cannot model the actual electricity price fluctuation and the influence of individual participants' behaviour on forecasted parameters [17].
The literature which are focused on competitors' behaviour analysis mostly have tried to estimate the market players' aggregate supply curve based on the data which are assumed to be available. In most cases, these assumptions are incompatible with the reality of electricity markets. For example, in [18], the author has assumed the participants' marginal costs are available and proposed a bid function equilibrium model to predicting supply curves of participants based on this information which is not available in real electricity markets. An inverse optimization method is proposed in [19] to estimate the market participants' historic bids which could be used to estimate the aggregate supply curve. The basic assumption in this paper is the market players' technical characteristics and power allocated to them are available. But this information is confidential in practice.
In our previous paper [20], we proposed an approach to modelling and predicting the market participants' behaviour based on practically available data. In [20], we have introduced the concept of the equivalent rival whose behaviour in the electricity market reflects the aggregation of behaviours of all individual competitors. Based on this concept, the behaviour of the competitors from the viewpoint of a power plant has been obtained. A Bayesian inference approach is proposed in [21] to estimate the net aggregate supply curve which is based on Markov Chain Monte Carlo and Sequential Monte Carlo methods. In [20], we have shown the proposed approach can estimate the net aggregate supply curve more accurately than the proposed method of [21]. Moreover, the proposed approach of [20] allows prediction of the power allocated to the intended power plant and the market-clearing price with high accuracy. Using the obtained model of competitors' behaviour, the proposed approach of [20] has been used to obtain the optimal bidding strategy. The approach of [20] focuses on a one-side electricity market and also neglects transmission constraints. To enable the proposed approach to consider demand-side bidding, transmission line constraints, and congestion effects, we introduce the decentralized equivalent rival (DER) concept by extending the equivalent rival concept of [20].
(i) In this paper, we show the proposed approach, which is based on the DER concept, is effectively able to model and predict the competitors' behaviour in a two-side electricity market with transmission line constraints. Based on DER concept, we propose a bidding strategy approach from the perspective of a Load Aggregator. The main advantages of the proposed approach are: The proposed approach is able to model not only the net aggregate supply curve but also the rivals' behaviour from the viewpoint of a load aggregator. (ii) The proposed approach uses practically available data and solves the problem of data shortage by the development of the equivalent market and introducing the DER. The proposed approach can accurately predict market results for each of the possible bids of the intended load aggregator and find the optimal bid based on this prediction.
Using the proposed approach, the actual electricity price fluctuation and the influence of individual participants' behaviour on market outputs and participants' results are predictable. (iii) The proposed approach is developed for a two-side electricity market with transmission line constraints and there is no limitation for applying the proposed approach.
Within the above framework, the major contributions of this paper are to propose a novel ANN-based bidding strategy for a LA, to propose a DER concept for a two-side electricity market with power network constraints, to reveal and model the competitors' behaviour from an LA point of view using the DER concept based on practically available data, and to predict the power allocated to the intended LA and Local Marginal Price (LMP) in the respective bus. The organization of the paper is as follows: After presenting the introduction in section I, section II-A provides an overview of the proposed approach by presenting concepts of DER and equivalent market. Also, notations used throughout the paper are introduced in Section 2.2. In Section 3, market-clearing model is presented and the development of the equivalent market is described. In Section 4, the proposed two-stage ANN-based method for revealing and modelling the competitors' behaviour is presented. In Sections 4.1, 4.2, and 4.3, the data fitting problem via ANN, the first stage (bid estimation), and the second stage (bid prediction) are presented, respectively. In Section 5, the application of the obtained model of Section 4 in LA's bidding strategy is described. In Sections 6.1 and 6.2, the effectiveness of the proposed approach is examined on an illustrative example and IEEE 30-bus test system, respectively. The robustness and adaptability analysis is provided in Section 6.3. Finally, in Section 7, the overall conclusion of the paper is presented.

Approach overview
The DER concept is an extension of the equivalent rival concept that we presented in [20]. In the proposed approach of this paper, we define a DER entity, from the viewpoint of the intended LA, with one generation unit and one demand unit in each bus. The LA and its corresponding DER compete in an equivalent market. The DER's corresponding actions in the equivalent market reflects the aggregation of all of the competitors' behaviour in the real market. From the perspective of the LA, this approach models the real electricity market (with the unknown number of players) by an equivalent market (with two completely known players, i.e., the LA and its DER). The actions of DER corresponding units (its generation and demand units in each bus) must be determined in a way that the results of the equivalent market-clearing are almost equal to those of the real market. By determining DER's units' bidding strategies, the competitors' behaviour is modelled. Using this model, the LA is able to predict the results of market-clearing, i.e., power allocated to itself and LMP in the respective bus for any arbitrary bids of itself with enough accuracy. Using this additional information, the LA optimizes its bidding strategy. It is assumed that the topology of the power system, transmission lines' constraints, and LMPs are available for all players. In addition, we assume that total load of the network and the aggregate production quantity of renewable sources are predictable for shorttime periods and the strategic players adopt their bid based on the remaining load which is the difference between total load of the network and the aggregate production quantity of renewable sources. In this paper, a two-stage ANN-based approach which consists of four steps are designed: (i) Equivalent market development. (ii) Bid estimation: Estimation of the bids of DER units for previous market runs. (iii) Bid prediction: Revealing the bidding strategy of each unit of the DER and predict the next bid of them. (iv) Bid optimization: Determining of the optimal bid of the LA based on obtained data of step 3. Figure 1 presents a holistic viewpoint of the proposed approach. The details are provided in the upcoming sections.

Notation
LM P lh is Local Marginal Price (LMP) of bus l at the hour h. LM P h is LMP vector at the hour h. LMP ′′ lh is LMP of buslat the hour h in equivalent market (estimation of LM P lh ), H is the hour in which the LA decides about bid offering, P gih is active power produced by generation unit i at the hour h in real market. P djh is active power consumed by demand unit j at the hour h in real market. P ′ gxh is active power produced by xth generation unit of DER at the hour h in the equivalent market. P ′ dxh is active power consumed by xth demand unit of DER at the hour h in equivalent market. P ′′ dkh is active power consumed by demand unit k at the hour h in the equivalent market (estimation of P dkh ). i is the slope of marginal cost function of generation unit i. i is intercept of marginal cost function of generation unit i. j is the slope of marginal cost function of demand unit j. j is the intercept of marginal cost function of demand unit j. a ih is the slope of bid function of generation unit i at the hour h. b ih is the intercept of bid function of generation unit i at the hour h. c jh is the slope of bid function of demand unit j at the hour h. d jh is the intercept of bid function of demand unit j at the hour h. a  is minimum/ maximum active power limit of xth generation/ demand unit of DER in equivalent market. P max Dag is maximum of aggregate power consumption in the market. P grnh is total power generation of renewable resources at the hour h. P Th is total load of the network at the hour h. MAPE is mean absolute percentage error.

EQUIVALENT MARKET DEVELOPMENT
Assume n producers and m consumers participate in the real market that is related to a power system with L buses. Without loss of generality an hour ahead market is considered. Each player (generation/demand unit) has a cost function, marginal cost function and bid function as shown in Equations (1)-(3), respectively.
In Equation (1a), C gi (P gih ) is the cost function of producer i and shows the cost of generating power P g for producer i at the hour h. In Equation (1b), C dj (P djh ) is the cost function of jth LA and shows the cost of consumption power P d for LA j at the hour h. Equation (2) shows the marginal cost function of each player which is the derivation of the cost function. Equation (3) shows the bid function of players. In a competitive market, players increase their profit by adopting a strategic bid function. Each player according to the cost function and the bid function, submits the bid parameters (a i , b i and c j , d j ) to the independent system operator (ISO). ISO solves an optimization problem to maximize social welfare with respect to the operational constraints. The market-clearing procedure is performed by solving the following optimization problem: max P djh , ∀j; P gih , ∀i Equation (4) indicates social welfare which is to be maximized by ISO. The constraint of Equation (5) ensures the balance between generated and consumed active power in each bus. The constraint of Equation (6) limits active power of each line within its limits. Equations (7) and (8)  The first step of the proposed approach is development an equivalent market which its outcomes are approximately equal to those of the real market. The number of buses (L), the topology of the power network, and transmission network constraints are exactly the same in both real and equivalent markets. There are n + m players in the real market which are located on different buses: n producers and m consumers. Two players are considered in the equivalent market: the LA and the DER whose corresponding actions in the equivalent market reflect the aggregation of all individual competitors' behavior in the real market. The DER has L production units and L demand units: one production unit and one demand unit in each bus. Therefore, there are 2 players with 2L + 1 units in the equivalent market. In the real market, the LA does not have any information about the maximum and minimum limits of competitors' power generation and consumption. In the equivalent market, the LA is aware of rivals' limitations because we determine the limitations of DER units in a way that the market-clearing model of the equivalent market is feasible in any condition. For this purpose, the minimum active power limit of all generation and demand units is considered zero which is the minimum possible value. The maximum active power limit of each generation and demand unit is an arbitrary value higher than the maximum expected load in the real market. In this regard, the lower and upper bounds of power produced/consumed by generation/demand units in equivalent market are set as follows: Since P ′ max dj > P max Dag , each demand unit of DER can model all consumers of the real market singly. Also, since P ′ max g j > P max Dag , each generation unit of DER can model all producers of real market singly. These considerations give the equivalent market's generation and demand units this flexibility to be able to model producers and consumers of the real market for any possible operational conditions.
Assuming the LA is determined by the index k and it is located in bus T, the equivalent of Equations (4)-(8) for the real market are Equations (10)-(15) for the equivalent market: Equations (10)-(15) are DC optimal power flow model expanded for the equivalent market. The solution of this optimization should be approximately equal to that of DC optimal power flow model expanded for the real market from the LA point of view. The power allocated to the LA and LMP of all buses should be approximately equal in both solutions: Therefore, we should find appropriate bid parameters for DER units (a ′ jh , b ′ jh , c ′ jh , d ′ jh ∀h) to reach this goal. In the next section, we propose a two-stage ANN-based method to find appropriate bid parameters for DER units knowing the power allocated to the LA and LMP of all buses in the real market.

DECENTRALIZED EQUIVALENT RIVAL'S BID REVEALING
In this section, the two-stage ANN-based procedure is devised to reveal the bid parameters of DER units. The data obtained from the first stage (bid estimation) is used to train the bid predictor in the second stage. Figure 2 shows the conceptual diagram of these stages. Before presenting these two stages in subsections B and C, we introduced the data fitting via ANN very concisely in subsection A:

ANN and data fitting
Artificial Neural networks are used to solve complex problems such as fitting problem. In fitting problem, a neural network is required to map between a data set of numeric inputs and a set of numeric targets. To solve a fitting problem using neural networks, first of all, a set of training data should be collected or generated. This set consists of a number of network inputs and corresponding outputs which are called input and target set respectively. Then, the network architecture is chosen. Several neural networks have been introduced for different applications where each of them has its own advantages and disadvantages. Perceptron, backpropagation, competitive, Grasberg, and Hopfield networks are some of the most popular types of them. It can be shown that a two-layer network with sigmoid transfer function in the hidden layer and linear transfer functions in the output layer, is able to fit multi-dimensional mapping problems arbitrarily accurate, given consistent data and enough neurons in its hidden layer [22]. We will use such a network in this paper. In the next step, the number of neurons in the hidden layers should be specified which is chosen according to the complexity of the problem and desired accuracy. After that, training the network with a training algorithm is begun. The training algorithm modifies the weights and biases in different layers in order to map between inputs and targets by minimizing a specific criterion.

The first stage: Bid estimation
This stage aims to estimate the slope and intercept of the bid function of DER units for previous hours (h < H ) in a way that the equivalent market becomes the equivalent of the real market (i.e., Equation (16) be satisfied). To reach this goal, we should find appropriate bid parameters for DER units It should be done based on practically available data for the LA which are: the power allocated to the LA and the LMP of all buses in the real market. Knowing these data and based on Equation (16), the power allocated to the LA and the LMP of all buses in the equivalent market are known also. Based on Equation (9), the production/consumption limits of all players of the equivalent market are known. On the other hand, as mentioned in section III, it is assumed that the topology of the power system and transmission network constraints are known. Therefore, Equations (11)-(15) are completely known for the LA. Through using this information and a well-designed ANN, we estimate appropriate bid parameters of DER units in a way that the equivalent marketclearing results of Equations (11)-(15) satisfy Equation (16). To this purpose, we should design an ANN which takes bid parameters of the LA (c kh , d kh ), power allocated to the LA (P ′′ dkh ≈ P dkh ) and LMP of all buses (LMP ′′ lh ≈ LM P lh ∀l ∈ {1, 2, … , L}) as the input and gives appropriate bid parameters for DER units as the output. Figure 3 shows the input-output diagram of this ANN which is named bid estimator. To design the bid estimator, after determining a suitable topology of the neural network (a two-layer The input-output diagram of ANN2 at time H feed-forward perceptron with sigmoid transfer function in the hidden layer and linear transfer function in the output layer is a good choice for this purpose), the ANN should be trained with appropriate training data. To provide the training data, we solve the DC optimal power flow model of equivalent market (Equations (10)-(15)) for an arbitrary number of random bid and classify the results with corresponding bid sets in the form of inputoutput data for the bid estimator as shown in Figure 3. The data are used to train the bid estimator. After training procedure, for all previous hours, knowing LMPs and the LA's bid parameters and power allocated to it, we can estimate the bid parameters of DER units using the bid estimator. Therefore, we can estimate the bidding history of DER units. Using the detected bidding history, in Section 4.3, another ANN is designed to be used for predicting the bid parameters of DER units for next hour.

The second stage: Bid Prediction
This stage aims to design an ANN to predict the slope and intercept of the bid function of DER units at hour H. Without loss of generality, in this paper, we assume that the bidding strategy of each player is a function of remaining load which is the difference of total load of the network and the aggregate production quantity of renewable sources (P Th − P grnh ). Since DER's behavior is equal to the competitors' behavior in the real market, we can assume that the DER's units' bidding strategies are functions of remaining load too. Therefore, prediction of their next hour bids is possible by discovering their bidding strategies. Since there is no information about the bidding functions, a black-box model can be considered for each function.
Then, an ANN can be used to estimate the functions if a number of input-output samples are available. Figure 4 shows the input-output diagram of such ANN which is called bid predictor. In order to design the bid predictor, after determining a suitable topology of the network (a two-layer feed-forward perceptron with sigmoid transfer function in the hidden layer and linear transfer function in the output layer is a good choice for this purpose), the ANN should be trained with appropriate data that are a number of input samples and their corresponding outputs. Assuming that the total load of the network and aggregate production quantity of renewable sources for previous hours (P Th , P grnh ∀h < H ) are known, we need the corresponding bid parameters of DER units to train the bid predictor, i.e., a ′ jh , b ′ jh , c ′ jh , d ′ jh ∀h < H . To this purpose, we use the bid estimator designed in Section 4.2 to estimate the bid parameters of DER units for previous hours. Therefore, training data for the bid predictor will be available.

BID OPTIMIZATION
Using the bid predictor designed in section IV-C, the LA is able to predict the bids of DER units for any value of the remaining load (P Th − P grnh ). After predicting the bids of DER units at hour H, the following bi-level optimization model is solved to obtain the optimal bid of the LA in the equivalent market which will be optimal in the real market too: s.t (10) − (15) Note that generally, a bi-level optimization problem is not easy to solve. However, the most important feature of the proposed approach is modelling the rivals' behaviour in the form of the DER, and providing the possibility of predicting the market results with appropriate precision in the form of the equivalent market. Due to this feature, solving the optimization problem and finding the optimal bid will be very simple. To explain how to find the optimal bid, we consider two states: (i) In a stepwise pricing market, as the players have to choose their bid from one of the predetermined bids, each player has a limited number of bid options. In this case, at first, we predict the bids of the equivalent rival's units for the next hour using the bid predictor. Afterwards, at the first level of optimization, we clear the equivalent market for all permitted options of the intended LA and obtain the results (power allocated to the intended LA and LMP in the Respective bus) which are actually a prediction of the real market results. Then, in the second level of optimization, we compute the profit resulting from all of the above options based on predicted values in the previous step. Finally, the bid which leads to the highest profit is offered in the real market. (ii) In a non-stepwise market in which the players are allowed to offer any arbitrary bid in a reasonable range, we discretize the intended range to a set of arbitrary number of steps. As an example, the range of 20 to 60 $/MWh can be divided into 41 steps with a step length of 1 $/MWh or 401 steps with a step length of 0.1 $ / MWh. Through this process, we have a limited number of options for the intended LA and we can act exactly like what we said for a stepwise market to find the optimal bid. As the market-clearing for different bid options is done in the equivalent market (not in the real market), there are no limitations on the number of marketclearing iterations, and the equivalent market can be cleared for an arbitrary number of different bids before offering in the real market. Note that, increasing the number of market players does not affect the dimensions of our problem, because regardless of the number of players in the real market, the equivalent market will always include only two players: the intended player and the DER. Therefore, the number of bid options of the intended LA is the only effective parameter on the dimensions of the problem. Since through this approach all the search space is scanned, falling into a local optimum is avoided, and the obtained optimal bid is the global optimum.
According to Figure 1, to ensure that the proposed approach is adaptive in terms of the changes in the competitors' behaviour, the algorithm compares the predicted values with the real market results announced by the ISO at the end of a user-defined period by calculating the MAPE index. If the value of MAPE exceeds a threshold set by the user, the change in the competitors' behaviour will be indicated. In such a case, the obtained model is invalid. Afterward, as shown in Figure 1, the algorithm returns to estimate the bids of DER units, and all the steps of estimating and predicting the competitors' behaviour are repeated. Afterward, the new behaviour of competitors is modelled based on the latest information about their participation in the real market. Note that the number of samples required to model and predict the competitors' behaviour depends on the complexity of their behaviour, and modelling should be done with the least possible number of samples that provide the required accuracy. If a long time has passed since the presence of the players in the market, many samples are available and can be used. In the absence of sufficient samples for any reason, the proposed solution is to model the competitors' behaviour with the existing samples and increase the accuracy of modelling and predicting with the passage of time and acquisition of the new samples, by continuously adding the new samples to the training data and retraining the bid predictor, and continue this process until the required accuracy is achieved.
Note that our proposed approach is flexible enough to consider any factors such as demand, fuel price, weather conditions, renewable sources production which affect bidding strategy. To consider these factors, the bid estimator does not need any change because it does not depend on competitors' behaviour, and it just depends on the market clearing mechanism. For the bid predictor, all of these effective factors considered as the inputs of the ANN and the historical data of these parameters are used in the bid predictor's training data. Since in the proposed approach we used the neural networks to reveal the competitors' behaviour and because of the high efficiency of neural networks in estimating the multi-input multi-output functions, adding additional effective factors just increase the number of neural network inputs and does not impose any limitation on the effectiveness of the proposed approach.
The presented revealing procedure is summarized as follows: (i) Step 1: Developing an equivalent market which its number of buses, power system topology and line constraints are exactly same as the real market but there are two players in this market: The LA (LA k in this paper) and an Decentralized equivalent rival (DER) which has one production unit and one demand unit in each bus. Consider actual constraints for the LA, and minimum and maximum production/consumption limits of the DER units as Equation (9)

TEST AND RESULTS
The proposed approach is examined on an illustrative example and the IEEE 30-bus test system.

Illustrative example
In this section, to characterize the process of estimating and modelling the competitors' behaviour, we propose a simple illustrative example. Figure 5 shows a 5-bus power network  with seven players, including 4 producers (players 1 to 4) and 3 consumers (players 5 to 7). Without loss of generality, assume that the bidding strategy of players is just a function of network total load. Assume that we are going to model the behaviour of players (competitors) from the perspective of player 5 (the consumer in bus 2). The equivalent market from the perspective of player 5 is modelled. The 5-bus test system with a relevant equivalent market is shown in Figure 5(b). As observed from this figure, the structure and parameters of the power system (number of buses, transmission lines between buses and lines capacity) are exactly the same in both real and equivalent markets. However, in the equivalent market, among the market players, only the intended player is present (player 5 in this example). In the equivalent market, player 5 competes with a decentralized equivalent rival (DER) which has a generation unit Gu i, i = 1, … , 5 and a demand unit Du i, i = 1, … , 5 in each bus. Hence, DER = {Gu1, … , Gu5, Du1, … , Du5}. Suppose that the real market has been cleared for a certain hour and the values, which the corresponding market-clearing results, i.e., the power allocated to each player and the LMP of each bus in that hour are written in purple font in Figure 5(c). Player 5, who intends to model the behaviour of other players in the form of a DER, only knows the network total load, the bid provided by itself, the amount of power allocated to itself, and the LMP of the buses in the discussed hour. The bids of DER units must be determined in such a way that for the same total load and the same bid for player 5, the power allocated to player5 and the LMPs of all buses are approximately equal in both equivalent and real market. However, in the case of other parameters equality in the two markets is not necessary. Obtaining the DER bids based on the information available in the real market (i.e. network total load, the bid of player 5, power allocated to player 5 and LMP of network buses) in such a way that the stated condition is met is what is done by the bid estimator. In this paper, for training the bid estimator, we solve the DC optimal power flow model of the equivalent market for an arbitrary number of network total load and random bids.
The network total load, bid of player 5, and LMP of network buses are considered as the inputs, and the bids of the DER units are considered as the outputs of the bid estimator for training purposes. Using this method, we have no restrictions for the generation of training data in terms of the number of samples and the method of sample generation. The artificial neural network trained via this method will be able to determine the bids of the DER units in a way that the LMP at different buses and the power allocated to player 5 be the same as the values determined at the input. Now, assume that for a given hour, player 5 knows the network total load, the bid of itself, power allocated to itself, and LMP of all network buses in the real market. Note that this assumption is valid for all the previous hours that the real market has been cleared and the results have been announced. By giving this information to the designed bid estimator, player 5 obtains the bids for DER units for which the power allocated to player 5 and the LMP of all buses are approximately equal to the corresponding values in the real market. Figure 5(d) shows the equivalent  Figures 5(c) and 5(d) shows that the equivalent market-clearing results are close to the corresponding results of the real market. Therefore, it can be concluded that from the perspective of player 5, the behaviour of the DER in the equivalent market is the equivalent of the behaviour of competitors in the real market, because player 5 has won approximately the same power at the same price in both markets by offering the exact same bids. By repeating this process for several hours with various network total loads the corresponding bid samples of DER units are obtained. By analysing these data, the bidding pattern of the DER units can be discovered and modelled. To do this, we use the samples of the network total load as training input and the corresponding samples of the bids of the DER units as training output (target). Accordingly, an ANN is trained which will be able to predict the behaviour of the DER based on its past behaviour. Note that we do not have any limitation in terms of the number of training samples, because for all previous hours that the real market has been cleared and the results have been announced, the required data (i.e., network total load, player 5 bid, power allocated to player 5 and LMP of network buses) are available and can be used by the bid estimator. Also, the corresponding bids of the DER units can be obtained and used for training the bid predictor. Using the bid predictor, player 5 can predict the bids of the DER units for the next hours knowing the network total load. So, the best bid among the possible options can be easily obtained and it can be offered in the real market.

IEEE 30-bus test system
As the second case study, IEEE 30-bus test system is considered. We consider 6 fossil fuel generation units (producers), one wind power generation unit which is located in bus 1 and 20 elastic loads (LA) which are participating in an hour ahead electricity market [23]. The technical characteristic of producers and elastic loads are given in Tables 1 and 2, respectively. These data are obtained from [23] with some changes in the maximum power consumption of LAs. Assume the total load varies between 300 to 340 MW and the quantity of wind power production varies between 50 to 150 MW and these are predictable for the next hour (remaining load varies between 150 to 290 MW). We assume all producers and the loads in buses 7 and 14 are strategic players. Without loss of generality it is assumed that for each non-strategic player, the bid function is equal to the cost function. Also, for the strategic players, the slope of bid function is equal to the slope of cost function. The players offer the intercept of bid function strategically: To illustrate the effectiveness of the proposed ANN-based bidding strategy, it is compared with a Q-learning based bidding approach. With the learning and exploration rate of 0.1, all strategic players reach an optimal bidding strategy in a learning process with 3000 iterations using the Q-learning approach. Figure 6 shows the LMP of bus1, bus7, bus19, bus26, bus 29, and the profit of player 2 (producer 2), player 5 (producer 5), player10 (LA 4 and player13 (LA 7 ), for a 48-hours period as example. The LMP difference between two buses of a transmission line shows that congestion has occurred. LA 4 (j = 4, bus = 7), which the corresponding details have been brought in Table 2, adopts the proposed ANN-based bidding strategy. The equivalent market from LA 4 viewpoint is developed which is a market with 30 generation units and 30+1 demand units. In the equivalent market, the DER has one consumption and one production unit in each bus and LA 4 is located in bus 7 as like as the real market. For this case study, Equations (10)-(15) are formed as: According to Equation 9, in Equations (24) and (25) the maximum limits should be arbitrary values more than ∑ 20 j = 1 P max dj = 689.2. For designing the bid estimator, as described in Section 4.2, a two-layer feed-forward perceptron with 12 neurons and sigmoid transfer function in the hidden layer and linear transfer function in the output layer is designed. To generate training data, we apply 1000 random bid sets to Equations (21)-(26), and the results with corresponding bid sets are saved. Afterward, we organize these learning data in the form of input-output data for the bid estimator as shown in Figure 3 then we have trained the designed ANN with the Levenberg-Marquardt backpropagation algorithm. To evaluate the performance of the bid estimator, we estimate the bids of DER units for the discussed 48-hours period using the bid estimator and solve the equivalent market-clearing model of Equations (21)-(26) with these estimated bids, and compare the results with the results of the real market-clearing model. Figure 7 shows the comparison of power allocated to LA 4 and LMP in the corresponding bus (bus 7) with the estimated values (the peer values in the equivalent market). It is observed from Figure 7 that the results of the bid estimator are approximately the same as those of the real market.   are predicted values of P d 4h and LM P 7h respectively. According to the simulation results, the MAPE value for the power allocated to LA 4 is 3.57%, and MAPE for the LMP of bus 7 is 2.11%. These results corroborate the ability of DER to modeling the competitors' behavior, and the effectiveness of the proposed approach.
To design the bid predictor, we consider a two-layer feedforward perceptron with 75 neurons and sigmoid transfer function in the hidden layer and linear transfer function in the output layer and train it with the estimated bids as the outputs and the corresponding quantity of the remaining load as the input of the bid predictor as shown in Figure 4. Training has  LA 4 optimizes its bid bidding strategy using the estimation and prediction procedure of the equivalent market by solving bi-level optimization model of Equation (17). Figure 9 shows the results of the real market clearing when the LA 4 offers the optimal bid using the proposed ANN-based  approach. The other competitors optimize the bidding strategy using the Q-learning method. Comparison of Figures 9  and 6 shows that using the proposed method the LMP of bus 7 approximately does not change and the profit of LA 4 has increased. Figure 10, compare the LMP of bus 7, amount of power allocated to LA 4 , and profit of LA 4 when it uses the method proposed in this paper, and when it uses the optimal bidding strategy using Q-learning for the 24-h. As seen from Figure 10, the same value of LMP has been obtained for bus 7 through both methods. However, the power allocated to LA 4 and its corresponding profit through the proposed ANN-based approach is more than or equal to those of Q-learning based method for all hours. In Table 3, the average values of the power allocated to LA 4 , LMP of bus 7 and the profit of LA 4 for these two states are compared for the 24-h period. Based on these data, the LMP of bus 7 is approximately equal in both cases, power allocated to LA 4 has increased by 11.27%, and profit of LA 4 has increased by 15.76 % using the proposed ANN-based approach.
Note that the number of iterations via Q-learning based method is 3000 iterations while the proposed strategy requires only 48 iterations (hours) to provide training data for the bid predictor. For the sake of more comparison, the average profit resulted from applying the proposed approach has been compared with the average profit resulted from applying the Q-learning based approach with 1500, 3000, and 8000 iterations for the 24 hours. The results are shown in Table 4.
To guarantee the fairness of the comparison, we keep all conditions the same for both cases except the number of iterations.
As seen in Table 4, the performance of the Q-learning based bidding strategy has increased by increasing the number of learning process iterations. However, the proposed ANN-based bidding strategy with only 48 training samples provides higher profits (about 6.8%) for LA 4 in comparison with the Q-learning with 8000 learning iterations. Therefore, one of the main advantages of the proposed approach is that the number of training data required to model the competitors' behaviour is very low. For example, in this paper, the proposed method has successfully modelled and predicted the competitors' behaviour using 48 training data. It means LA 4 has successfully modeled competitors' behavior and adopted an appropriate bidding strategy after 48 h (2 days) of presence in the market. While in other learning-based methods, the player spends a long time in the market to find the optimal bidding strategy. This superiority has brought two special advantages for the proposed method: First, this method is very suitable for the players who recently entered the market and do not require to spend a lot of time to learn the optimal strategy. Second, if the competitors' behaviour changes  for any reason, using the proposed method, the intended player learns the new behavioural pattern quickly and adopts an optimal bidding strategy based on the new conditions.

Robustness and adaptability analysis
To examine robustness of the proposed approach, we investigate the effect of the prediction error of the remaining load quantity. For this purpose, in Table 5, we propose the MAPE index for prediction of the LMP of bus 7 and power allocated to LA 4 in three states of prediction error of the remaining load quantity: (i) without error, (ii) error with normal distribution function with average of 5% and variance of 1.5 MW, (iii) error with normal distribution function with average of 8% and variance of 3%. As illustrated in Table 5, the accuracy of the proposed method in prediction of the LMP of bus 7 and power allocated to LA 4 is decreased by increasing the prediction error of the remaining load quantity. However, the accuracy of the proposed method is still enough for bidding improvement purpose. Based on simulation results, in case (iii), the profit resulted from the proposed approach has increased by about 11.9% in comparison with the Q-learning approach. In the following, the adaptability performance of the proposed approach is numerically investigated in the case of rivals' behaviour variation. To this purpose, LA 4 uses the proposed method to predict the results of the market in 30 days. At the end of each 24 h, the MAPE index is calculated for predicted values (LMP of bus 7 and power allocated to LA 4 ) and compared with the determined thresholds. If the MAPE is less than the threshold values, the model is valid and can be used for the sake of prediction in the upcoming days. If the behavior of rivals changes, the MAPE index increases, and the algorithm notices the change in the behaviour of rivals. In this case, the algorithm uses the data of the last 24 h to retrain the bid predictor and updates the model and uses it to predict the rivals' behaviour and the market results for the next day. At the end of the next day, the MAPE index is recalculated and if the MAPE index is not within the allowable range, the bid predictor is retrained using the data of the last 48 h to obtain a new model. This process is continued until the MAPE index lies within the allowable range and the model reaches to an acceptable accuracy level. The result of this evaluation has been shown in Figure 11. As shown in Figure 11, until the sixth day, the MAPE index values for both predicted parameters are less than the defined threshold values (2.7% for LMP and 3.9% for allocated power). On the seventh day, two of the seven rivals have changed their strategy. The relevant MAPE index increases because of the noncompliance of the new behaviour of the rivals with the model. Using the data of the last 24 h of the market, the related bids of the DER units have been estimated and the new model has been obtained by repeating the bid predictor training process. For the next 24 h, the new model has been used. As observed from Figure 11, the use of the new model has reduced the MAPE index on the eighth day. This observation shows that the new model is closer to the new behaviour of the rivals. However, the index is still higher than the threshold value, which indicates that the model is not accurate enough due to the insufficiency of the training data. In this regard, the training process of the bid predictor has been repeated using the new 24-hour data and the previous 24-h data (in total, 48 h which are related to the seventh and eighth days), and the model has been updated again. This time, the new model is more compatible with the rivals' behaviour and the MAPE index has entered the acceptable range. After that, until the 21st day, the strategy of the rivals has not been changed and the model is valid. On the 22nd day, five of the seven rivals have changed their strategies. As can be seen from Figure 11, the mentioned process of modifying the model has repeated this time and after 2 days, the new model has been obtained in accordance with the new behaviour of the rivals.

CONCLUSION
This paper proposed an adaptive bidding strategy by revealing and modelling the behaviour of market competitors from the viewpoint of an intended LA in a two-side electricity market.
To reduce the complexity of competitors' behaviour analysing problem, this paper proposed a DER concept to equalize an electricity market with different unknown players to an equivalent market with two known players from the intended LA point of view. A two-stage ANN-based procedure has designed for determining DER's units' bidding strategies and revealing the competitors' behaviour. The proposed approach has been examined on an illustrative example and the IEEE 30bus test system with 6 fossil fuel power producer, one wind power producer with variable power production quantity, and 20 elastic loads which participate in an hour ahead electricity market. Simulation results showed the effectiveness of the proposed approach in modelling the competitors' behaviour and predicting the power allocated to the LA and LMP of the corresponding bus. Also, simulations proved the high efficiency of the proposed method in reaching the optimal bid. The results show 15.84% of profit increase while the LA uses the proposed approach in comparison with a Q-learning based approach. It was also shown that the proposed approach expends a very short time to reaches the optimum bidding strategy which is an important advantage for the proposed approach especially for players who are just about to enter the market. In the future works, the approach can be developed for an LA engaging demand response which is a demand response aggregator. The effectiveness of the proposed bidding strategy approach can be examined for the demand response aggregator in the joint energy and reserve market. Also, the effect of adopting this new bidding strategy by all players on market equilibrium can be studied in the future.