1 Introduction

The smart grid infrastructure enables the integration of renewable energy resources at the individual consumer level [1]. It creates a paradigm where any individual consumer in the grid can also be a power supplier. This facilitates the creation of microgrids. Microgrids are the localized grids that can be separated from the larger power grid to operate autonomously and be self-sufficient in the power. A microgrid typically consists of renewable (wind turbines, solar panels, etc.) and/or non-renewable (micro-turbines, fuel cells, etc.) energy resources, energy storage devices, and energy consuming devices/appliances, all of which are connected through a power and communication network [2]. A microgrid can operate in a grid with the connected or islanded mode. In the islanded mode, it could be connected to other microgrids or operate independently. Therefore, microgrids can provide energy independence to individual communities or entities who intend to manage their own power generation and distribution [3]. Moreover, microgrids can provide resilience against large-scale failures across the grid. They can continue to operate if large-scale blackouts occur [3].

With autonomous energy, microgrids may fully or partially feed their local demand. Numerous microgrids would have great flexibility to utilize their local energy to collaboratively advance the energy management in the power grid, e.g., load balancing [4, 5], energy sharing [6, 7], and load shifting [8]. Thus, it is desirable to discover microgrid communities that can efficiently implement their cooperation in the grid [9,10,11]. For instance, the grid can identify communities for a mixed set of microgrids, some of which request external power supply while the others have excessive electricity, such that the microgrids within each community can supply their demand load by themselves regularly or when power outage occurs in the main grid.

Fig. 1
figure 1

Energy communities of microgrids on power grid (“+” represents positive NE while “−” represents negative NE)

More specifically, based on every microgrid’s local energy amount (supply) and its local consumption amount (demand load), we can simply derive its net energy (NE) as the amount of supply minus the demand load, which can be either positive or negative at specific time. If the NE of a microgrid is 0 in \([T_1,T_2]\), we can simply skip it or assign it to the nearest community. Thus, in this paper, we only consider the microgrids whose NE is either positive or negative. Clearly, a microgrid with positive at time t means that it has excessive electricity at time t; otherwise, it requests external power supply at time t. In addition, we denote the time series NE of a microgrid \(m_i\) over a period \([T_1, T_2]\), where \(T_1<T_2\), as \(\forall t\in [T_1,T_2], e_i(t)\), which can be either positive or negative. Then, some energy communities with respect to time interval \([T_1, T_2]\) can be defined as follows.

1.1 Energy communities

1.1.1 Definition 1: homogeneous energy community (HEC)

A group of microgrids whose NE are exclusively positive, or exclusively negative at any time in \([T_1, T_2]\).

In this case, all the microgrids in the community can feed themselves using their local energy, or all the microgrids in the community request external supply. On the contrary, if the microgrids in the community have different NE status (positive and negative) at any time over the period \([T_1, T_2]\), we define such community as the follows.

1.1.2 Definition 2: mixed energy community (MEC)

A group of microgrids whose NE are mixed with positive and negative at any time in \([T_1, T_2]\).

Hence, we can categorize the energy community discovery problems [10] based on their inputs (the NE of all the microgrids is homogeneous or mixed between time \(T_1\) and \(T_2\)): \(\textcircled{1}\) HECs discovery; \(\textcircled{2}\) MECs discovery. Figure 1 presents the examples for two different energy communities in the grid at a specific time, respectively. Note that if \(T_1=T_2\), HECs and MECs are obtained for a specific time instead of a time interval.

Furthermore, we define a special form of MEC in which all the microgrids’ local energy can fully supply the overall demand of the community.

1.1.3 Definition 3: self-sufficient energy community (SEC)

A mixed energy community whose total NE is nonnegative at any time in \([T_1, T_2]\).

Since classic clustering algorithms (e.g., K-means, DBSCAN) can be tailored to discover HECs by integrating the NE amounts [10], we focus on the MEC discovery and SEC discovery.

1.2 Related work

As the important building blocks on the grid, microgrids have attracted significant interests in both industry and academia in the past decade. In such context, many recent research are conducted to design microgrids and/or energy management schemes so as to improve the performance of the power grid such as load management techniques [12], demand response solutions [13], and home automation [14]. More specifically, [15] and [16] propose techniques for establishing microgrids in the power grid based on different criteria such as cost minimization [15] and power flow optimization [16]. In addition, the analysis of data collected from distributed microgrids (e.g., demand load, energy generation and storage) has advanced the energy management of the grid and microgrids [17]. Such applications include short term load forecasting for microgrids [18], load restoration for microgrids [19], load shifting [8], energy trading [20, 21], etc.

Moreover, some cooperative models among distributed microgrids have been investigated in multiple applications, e.g., optimizing the power loss via a unified microgrid voltage profile [22], eliminating the central energy management unit and price coordinator via localized smart devices [23], distributed energy dispatch and demand response [24], privacy preserving energy management among networked microgrids [25], and load management via sharing local electricity [6, 26]. In this paper, we develop techniques to identify communities of microgrids which can directly implement all these cooperative applications within each energy community to further improve the grid performance.

1.3 Contributions

Community discovery problems generally group data objects which share similar characteristics or are close to each other, e.g., detecting communities of individuals who have similar interests on the social network [27], and analyzing the spatial datasets to identify geographical communities [28]. The energy community discovery problems are significantly different from the aforementioned prior community discovery problems studied in other contexts. The key difference is that the criteria of grouping two microgrids into the same energy community should consider not only the spatial distances on the power grid but also their NE amounts of time series. Moreover, additional constraints may apply in the problems, for example, MECs and SECs may require all the microgrids in each community to balance their demand and supply, and to bound the overall NE within a small number or even as 0 [4]; SECs require a nonnegative overall NE for each community. In addition, both energy consumption and generation of microgrids (e.g., wind and solar) are generally stochastic, thus the energy communities (e.g., MECs and SECs) may vary over time. To the best of our knowledge, these have not been investigated and tackled in literature. To address these issues, this paper has the following primary contributions.

  1. 1)

    We define the energy community discovery problems for MECs and SECs as well as the proposed new algorithms to effectively and efficiently generate MECs and SECs.

  2. 2)

    We discuss how to realize MECs and SECs in the current energy management system, and define some utility metrics to evaluate their performance.

  3. 3)

    We conduct comprehensive experiments to validate the performance of our approaches using both synthetic and real-world microgrid datasets.

The rest of this paper is organized as follows. Sections 2 and 3 illustrate how to discover MECs and SECs, respectively. Section 4 discusses how to realize the discovered communities in the current energy management system on the power grid. Section 5 demonstrates the experimental results. Finally, Section 6 presents the concluding remarks and discusses the future work.

2 Discovering MECs

Among thousands of microgrids on the power grid, some of them may have excessive energy while some others may request the energy from external resources, e.g., the main grid. Therefore, adjacent microgrids can share or trade their locally generated electricity for avoiding wasting excessive energy while ensuring better reliability and resilience of power supply [6, 26]. Such microgrids can form an energy community to occasionally feed their local energy demands, e.g., via trading, which are beneficial to both the power grid and themselves. Clearly, the NE of the microgrids in the communities is mixed with negative and positive, thus called as MECs.

The ideal case of the discovered MECs is that all the microgrids in the same MEC are geographically close to each other while balancing their demand and supply of each MEC within a tight margin [4, 9, 29] (then microgrids can fully consume their local energy). We now propose an algorithm to identify such MECs on the grid towards this goal.

Specifically, we denote each NE microgrid \(m_i\) at time t as \(e_i(t)\), which can be either positive or negative. While grouping two microgrids, e.g., \(m_i\) and \(m_j\), into an MEC, besides the spatial distance between them on the grid \(Dis(m_i,m_j)\), we also have to consider their NE \(e_i\) and \(e_j\) towards the load balancing of their community. The overall demand and supply at different time should be balanced (ideally, equal to each other). For example, if one microgrid has an NE \(e_i\) while the other has an NE demand \(-e_i\), such two microgrids can supply their demands using their local energy. Thus, we define a new measure namely “NE distance” of two microgrids \(m_i\) and \(m_j\) in time interval \([T_1,T_2]\) as:

$$\begin{aligned} NE(m_i,m_j)= \sum _{t\in [T_1,T_2]}|e_i(t)+e_j(t)| \end{aligned}$$
(1)

If \(\forall t\in [T_1,T_2]\), \(e_i(t)+e_j(t)=0\) holds, we have \(NE(m_i,m_j)=0\). If \(\forall t\in [T_1,T_2]\), \(e_i(t)=e_j(t)\) holds, however, we have \(NE(m_i,m_j) = 2\sum \limits _{t\in [T_1,T_2]}|e_i|\). The NE distance differs from other distance measures used in traditional community discovery problems due to its unique feature: two opposite values, e.g., \(e_i\) and \(-e_i\), are measured as “close”.

figure a

Therefore, the difference of the overall supply and demand of every MEC is bounded/balanced at different time by \(\xi\), and the spatial distance between any microgrid and its MEC centroid is bounded by \(\xi '\).

For the MEC discovery, we define two maximum distance thresholds for the normalized NE distances and the normalized spatial distances, respectively: \(\xi ,\xi '\in [0,1]\). Then, we propose a new agglomerative algorithm [30] to identify MECs by utilizing \(\xi\) and \(\xi '\) to specify the criteria for bounding the differences between the overall supply and demand of each community and the spatial distances between the microgrids in each community. Specifically, we let each microgrid find its nearest microgrid (with an NE distance no more than \(\xi\) and a spatial distance \({{Dis}}{({\mu_{j}}, {m_{k}})}\) no more than \(\xi '\)) to form an MEC, update the MEC centroid geo-location and NE, and then hierarchically merge “small MECs” to form “large MECs” for better resilience. The merging process terminates if the NE distance between any two MECs’ centroids exceeds \(\xi\) or their spatial distance exceeds \(\xi '\) as shown in Algorithm 1.

3 Discovering SECs

Many real-world applications require that the microgrids in each MEC can fully supply their demand with their local energy, e.g., large-scale blackouts. Therefore, it is also desirable to discover the SECs with nonnegative NE [11].

Specifically, given N microgrids \(m_1,m_2,\dots , m_N\), we denote the number of SECs for the N microgrids as K. Then, denoting K SECs as \(c_1,c_2,\dots , c_K\), we can define binary variables \(\forall i\in [1,N], \forall j\in [1,K], x_{ij}\in \{0,1\}\) to indicate if the microgrid \(m_i\) is included in SEC \(c_j\) or not: if \(x_{ij}=1\), \(m_i\in c_j\); otherwise, \(m_i\notin c_j\).

3.1 Optimization-based SEC discovery

If the aggregated NE of the given microgrids is non-negative in \([T_1,T_2]\), we can formulate an optimization problem to discover SECs. We first consider the clustering constraints. Note every microgrid can only be assigned to exactly one SEC. This creates a group of clustering constraints \(\sum \limits _{j=1}^K x_{ij}=1, \forall i\in [1,N]\).

Secondly, recall that the NE of any SEC should be non-negative at any time \(t\in [T_1,T_2]\). This criterion creates another group of clustering constraints: \(\sum \limits _{i=1}^N[e_i(t)x_{ij}]\ge 0, \forall t\in [T_1,T_2], \forall j\in [1,K]\).

Then, we can summarize clustering constraints of SECs as:

$$\begin{aligned} \left\{ \begin{array}{ll} {\mathrm {s.t.}}\\\sum \limits _{j=1}^K x_{ij}=1 &{}\quad \forall i\in [1,N]\\ \sum \limits _{i=1}^N[e_i(t)x_{ij}]\ge 0 &{}\quad \forall t\in [T_1,T_2], \forall j\in [1,K]\\ &{}\quad\forall i\in [1,N], x_{ij}\in \{0,1\} \end{array}\right. \end{aligned}$$
(2)

3.1.1 Problem formulation

If all the binary variables satisfy all the constraints in (2), all the output energy communities would be SECs. Thus, we can solve the constraint satisfaction problem (CSP) without an objective function to find out feasible solutions for SECs. Note that such CSP problem is NP-hard due to the involvement of a large number of binary variables.

More importantly, besides the constraint satisfaction problem, we can formulate the SEC discovery problem by minimizing the overall load on the transmission lines (energy loss in transmission) in all the SECs. Then, we can denote the energy loss rate as \(\theta\). For example, transmitting an amount of energy 100 W, the load on 1 unit distance is \(100\theta\) W. Given \(m_i\) with positive NE at time t as \(e_i(t)\) and any other microgrid \(m_s\) with negative NE at time t as \(e_s(t)\), we define the amount of energy from \(m_i\) to \(m_s\) at time t as \(y_{is}(t)\). Thus, the overall load on the transmission lines can be represented using the model in [26]:

$$\begin{aligned} \sum \limits _{t=T_1}^{T_2}\sum \limits _{j=1}^{K}\sum \limits _{i=1}^N\sum \limits _{s=1,s\ne i}^N [x_{ij}x_{sj}y_{is}(t) \theta \cdot Dis(m_i,m_s)] \end{aligned}$$
(3)

If \(x_{ij}=1\) and \(x_{sj}=1\) (\(m_i,m_s\in c_j\)), then the load of the power flow from \(m_i\) to \(m_s\) at time t is derived as \(y_{is}(t) \theta \cdot Dis(m_i,m_s)\). If \(x_{ij}\) or \(x_{sj}=0\) (they are not in the same community), there is no power transmission from \(m_i\) to \(m_s\), and the load is 0. Then, the overall load on the transmission lines can be aggregated as (3). Meanwhile, there are two additional sets of power flow constraints:

$$\begin{aligned} \\ \left\{ \begin{array}{ll} \displaystyle{\mathrm {s.t.}}\\ \sum \limits _{s=1,s\ne i}^{N}[x_{ij}x_{sj}y_{is}(t)] \le e_i(t) &{}\quad \forall t, \forall i \in [1,N] \\ \displaystyle \sum \limits _{i=1, i\ne s}^N[x_{ij}x_{sj}y_{is}(t)](1-\theta )\ge |e_s(t)| &{}\quad \forall t, \forall s\in [1,N]\\ \displaystyle y_{is}(t)\ge 0 &{}\quad \forall t, \forall i, \forall s\in [1,N] \end{array}\right. \end{aligned}$$
(4)

where the above two sets of constraints ensure that the overall outgoing energy of every microgrid with positive NE is no greater than its current excessive energy, and the overall incoming energy of every microgrid with negative energy is no less than its current demand, respectively [26].

In summary, we consider (3) as the objective function, and combine (2) and (4) as constraints.

3.1.2 Tabu search based algorithm

Due to the NP-hardness of the optimization problem, we propose a Tabu search [31] based meta-heuristic algorithm to solve the problem. Specifically, the algorithm first specifies a range for the number of SECs \(K\in \{K_{\min },K_{\min +1}, \dots , K_{\max }\}\), and arbitrarily partitions all the microgrids into K groups based on their geo-locations. Then, for every \(K\in \{K_{\min },K_{\min +1},\dots , K_{\max }\}\), the algorithm iteratively searches the neighboring solutions to make the number of SECs reach K where “moving a microgrid from one group to another nearest group” is defined as one of its neighboring solutions. After obtaining a set of candidate neighboring solutions (different moves), the neighboring solution can mostly improve the objective function (reduce the load with the greatest amount), then replace the current solution with the neighboring solution. To improve the performance of searching performance, the following criteria are integrated in the algorithm.

  1. 1)

    An initial community assignment should be specified in Tabu search, e.g., assigning all the microgrids to random communities based on their geo-locations.

  2. 2)

    To avoid the solutions getting stuck in local optimum while searching SECs for every K, a Tabu list is defined with length S which stores S most recent solutions that replaced the previous solution. Then, in the searching process, if any neighboring solution is found in the Tabu list, the searching process continues without visiting such neighboring solution.

  3. 3)

    Among all the SECs, select the SEC with the highest NE (positive) at most times in \([T_1,T_2]\), and then move each microgrid with the positive NE to the corresponding nearest non-SEC, so that a set of candidate neighboring solutions can be found.

The load based objective function cannot be reduced for the current K. Then, the algorithm moves to the next \(K\in \{K_{\min },K_{\min +1},\dots , K_{\max }\}\). Among all the discovered SECs for all \(K\in \{K_{\min },K_{\min +1},\dots , K_{\max }\}\), the best solution (with the minimum overall load on the transmission lines while satisfying all the constraints) will be selected as the output SECs.

3.2 A two-phase algorithm for discovering SECs

Besides the optimization-based approach which formulates the optimization problem and solves the problem with a Tabu search based algorithm, we present a two-phase algorithm to discover a subset of microgrids to form the SECs. Note that, if the overall NE of all the given microgrids are negative in \([T_1,T_2]\), the constraints in the optimization-based approach cannot be satisfied simultaneously to form the SECs for all the given microgrids. Instead, the proposed two-phase heuristic algorithm can still effectively discover SECs out of the given microgrids.

Specifically, among all the N microgrids, we denote the set of microgrids with positive NE at any time in \([T_1,T_2]\) as \(M^+\), and the set of microgrids with any negative NE in \([T_1,T_2]\) as \(M^-\). Then, the two phases are illustrated as follows.

Phase 1: the algorithm first clusters all the microgrids in \(M^+\) based on their geo-locations, where each cluster can be considered as a “merged microgrid” with aggregated positive NE. In this phase, we extend the K-means algorithm [32] to cluster such microgrids based on their geo-locations by specifying different \(K\in \{K_{\min },K_{\min +1},\dots , K_{\max }\}\). Then, the algorithm applies different K values to K-means and chooses the best clustering result – the minimum sum of squared errors (SSE) of the spatial distances [30] in all the clustering results.

Phase 2: Denote the clustering result of \(M^+\) as \(c_1^*,c_2^*,\dots , c_K^*\), and the NE of any cluster \(\forall j\in [1,K]\), \(c_j^*\) at time t can be aggregated as \(\sum \limits _{\forall m_i\in c_j^*}e_i(t)\). Then, \(\forall j\in [1,K], c_j^*\) iteratively adds the nearest ungrouped microgrid of its centroid in \(M^-\) until its NE drops close to 0 at any time in \([T_1,T_2]\)

Finally, the updated \(c_1^*,c_2^*,\dots , c_K^*\) are identified as K different SECs. The details of the two-phase algorithm are given in Algorithm 2. Note that Algorithm 2 involves all the microgrids in \(M^+\) in the SECs, but may not involve all the microgrids in \(M^-\) (depending on the NE of the microgrids in \(M^+\) and \(M^-\)). Furthermore, the NE of most self-sufficient communities can be well balanced to form “zero NE” communities [9].

figure b

4 Realizing MEC and SEC

After discovering MECs and SECs, microgrids could cooperate with each other by sharing/trading their local energy [6, 20, 26]. Since every microgrid can only be either a power supplier or consumer [6] at any specific time, MECs and SECs are implemented as a bipartite graph on the power grid. In each MEC or SEC, the power might be routed from any microgrid with positive NE to any microgrid with negative NE.

Note that the structure of the bipartite graph may change over time (e.g., \(M_1\) might be a supplier at time \(T_1\) and it may become a consumer at time \(T_2\)). Also, the connection between every pair of microgrids can be available via the power transmission network of the main grid [1, 26]. As illustrated in Section 3.1.1, the optimal energy transmission solution (power flow) within each community can be obtained using the model in [26] (which is simplified from the optimization model in Section 3.1.1):

$$\begin{aligned} {\left\{ \begin{array}{ll} \min \,\, \sum \limits _{\forall i, \forall s} y_{is}(t)\\ {\mathrm {s.t.}}\\ \displaystyle \sum _{\forall s}y_{is}(t) \le e_i(t) \qquad \forall i\\ \displaystyle \sum _{\forall i}y_{is}(t)(1-\theta )\ge |e_s(t)| \qquad \forall s\\ \displaystyle y_{is}(t)\ge 0 \qquad \forall i, \forall s \end{array}\right. } \end{aligned}$$
(5)

Note that SECs are always feasible in the above problem (due to their relatively large amounts of excessive energy). If MECs cannot find an optimal solution (overall demand exceeds overall supply in any MEC), the main grid will fill the gap [26]. Similarly, we can also identify some utility measures for evaluating MECs and SECs.

  1. 1)

    Average distance between every pair of power supplier (positive NE) and consumer (negative NE): shorter distance could reduce the energy loss during transmission from the power supplier to the power consumer. Since the structure of the bipartite graph may change over time, we still use the metric of the (spatial) SSE of all the communities to measure such average distance.

  2. 2)

    The average NE of each MEC or SEC by taking into account each microgrid’s NE at different time in \([T_1,T_2]\) denotes |t| as the number of timestamps utilized for energy community discovery. We identify MECs and SECs based on the energy status of microgrids over a longer period \([T_1,T_2]\) (a larger |t|), which would reflect more accurate results of the communities.

  3. 3)

    The load on transmission lines: MECs and SECs have better utility if such load is lower.

5 Experiments

5.1 Experimental setup

Our experimental simulations are conducted on the synthetic data generated from three real-world datasets: a spatial dataset and two power generation and consumption datasets. Firstly, the spatial dataset of 115475 cities/towns in the US is collected by the US geological survey on 7 July, 2012 and is available in National Imagery and Mapping Agency [33]. Secondly, two power generation and consumption datasets are collected in [34] in East Midlands, UK, and in Massachusetts [35], US. Specifically, [34] collectes 22 dwellings’ power consumption over 2 years. Reference [35] collects a low resolution dataset (Umass smart* home dataset) with 443 households’ power consumption on 2 April, 2011. And it collects a high resolution dataset (Umass smart* microgrid dataset) with three microgrids’ power generation and consumption over 3 months in 2012. In the Umass smart* microgrid dataset, both solar panels and wind turbines are installed.

In our experiments, we generate synthetic datasets based on the real-world spatial dataset, and the time-series generation and consumption datasets: ① we aggregate all the generation and consumption datasets with the frequency of one reading per 15 min; ② to test the MECs, we generate two synthetic datasets by sampling 50000 microgrids’ power generation and consumption over 1 month based on the microgrid dataset in [35], and then randomly assigning geo-locations in the spatial dataset [33] to the 50000 microgrids; ③ to test the SECs, we use the data in ② MEC discovery to evaluate the two-phase algorithm. To compare the optimization-based approach and the two-phase algorithm, we selecte 10000 microgrids with a high percent of microgrids with positive NE out of the 50000 microgrids with both generation and consumption, ensuring that the optimization-based algorithm can find a feasible solution.

We use Euclidean distance to measure the spatial distance between any two microgrids on the grid. Both the Euclidean distances and the NE distances are normalized into [0, 1] in all the experiments.

5.2 MEC discovery

Recall that the NE of all the 50000 microgrids (overall power generation minus overall power consumption) is negative. To test the effectiveness of Algorithm 1 in two different cases: \(\textcircled{1}\) positive NE, and \(\textcircled{2}\) negative NE, we extract two subgroups of microgrids from the 50000 microgrids, each of which includes 20000 microgrids, mixed with positive and negative NE at 2880 different time. For the simplicity of notations, these two subsets of microgrids are named as “positive” and “negative”, respectively. Note that the “positive” means all the microgrids are mixed with positive and the negative NE (and the overall NE of all the microgrids is positive); “negative” means all the microgrids are also mixed with positive and negative NE (but the overall NE of all the microgrids is negative).

Firstly, we implemented Algorithm 1 with \(\xi \in [0.03, 0.3]\), where the normalized spatial distance threshold \(\xi '\) is fixed as a reasonable value 0.05. Then, Fig. 2a shows the average, maximum and minimum NE of all the communities generated from “positive” where \(\xi \in [0.03,0.3]\). As \(\xi\) increases from 0.03 to 0.3, the allowed maximum differences between the overall demand and overall supply in every MEC increase significantly. The average, maximum and minimum NE then increase as \(\xi\) increases. Thus, the demand and supply of the MECs become better balanced with an NE closer to 0. On the contrary, Fig. 2c demonstrates the results for “negative”, which presents a reverse trend as “positive”, but still tend to better balanced load (NE also becomes closer to 0) as \(\xi\) decreases.

Fig. 2
figure 2

MEC discovery

Secondly, we also had some other findings in the MEC discovery by utilizing microgrid time series NE over different lengths of periods (varying number of timestamps |t|). As shown in Fig. 2b and 2d, as the NE of microgrids over a longer period (larger |t|) is utilized in the MEC discovery, the average NE of the identified MECs can have both increasing and decreasing trends. This is because larger |t| can possibly lead to involving either more or less microgrids in every MEC (i.e., NE distance of two microgrids might be large in the short term but small in the long term, and vice-versa). Then, we cannot determine whether the number of microgrids in each MEC can increase or decrease as |t| increases in Fig. 2b and 2d. Furthermore, also in Fig. 2b and 2d, larger \(\xi\) would lead to a higher average NE (positive) and lower average NE (negative). This is because larger \(\xi\) (the threshold of NE distance) allows more microgrids to be clustered in every MEC.

Thirdly, we also measure the geo-locations of the microgrids in the MECs. On one hand, we have examined the (spatial) SSE of the discovered MECs by utilizing microgrid time series NE over different length of periods (different |t|). As shown in Fig. 3a, for any |t|, larger \(\xi\) leads to higher SSE of MECs since microgrids in the same MEC would be less cohesive if more microgrids are clustered with a larger \(\xi\). Meanwhile, larger |t| (more timestamps) results in lower SSE of MECs. This means less microgrids are clustered in each MEC as |t| increases. Indeed, this fact cannot be observed from Fig. 2b and 2d. Even if larger |t| gives more average number of microgrids in each MEC, since such mixed microgrids can have either positive or negative NE, more microgrids in each MEC do not necessarily make the NE of the MECs (positive case) higher nor make the NE of the MECs (negative case) lower. This matches the observations in Fig. 2b and 2d.

On the other hand, we fix \(\xi =1\) and \(\xi '=0.05\) in Algorithm 1, which then removes the constraint of NE distances and turns into a traditional agglomerative clustering problem based on geo-locations. Then, we compute the (spatial) SSE in the above case as the benchmark SSE (\(SSE_0\)) and test how the spatial distances (SSE) within each MEC vary for different levels of balanced load (different \(\xi\)). More specifically, we fix \(\xi '=1\) (Algorithm 1 only specifies the maximum NE distance threshold \(\xi\) and removes the constraint of spatial distances), generate the MECs with \(\xi \in [0.03, 0.3]\) for two inputs “positive” and “negative”, respectively, and compute the corresponding (spatial) SSE for each MEC. Then, we define a new measure SSE ratio as \(\frac{SSE}{SSE_0}\) and plot it in Fig. 3b. Clearly, the (spatial) SSE increases as \(\xi\) declines – an MEC with better balanced load includes the furthest microgrids from each other if the spatial distances within each MEC are not bounded (since \(\xi '=1\)).

Fig. 3
figure 3

Spatial SSE in MECs

Finally, we let \(\theta =0.0001\) per normalized distance of 0.1, randomly simulate five substations, and derive the average distance between each of the 50000 microgrids and its nearest substation. Then, we compare the overall load on transmission lines at 2880 timestamps for 50000 microgrids in two cases (with or without MECs). Table 1 also shows that such energy loss can be greatly reduced with MECs.

Table 1 Load on transmission lines (MEC discovery)
Table 2 SEC discovery (optimization-based approach, 10000 microgrids)
Table 3 SEC discovery (two-phase algorithm, 10000 microgrids)

5.3 SEC discovery

We implement both the optimization-based approach and the two-phase algorithm to discover the SECs. For the optimization-based approach, we solve the optimization problem using the proposed Tabu Search [31] based algorithm (the length of Tabu list was set as \(S=10\)). If the algorithm cannot find a feasible solution within 10000 seconds, the algorithm will be terminated. As mentioned earlier, to compare the two approaches, we have generated a synthetic dataset for 10000 microgrids with mixed NE (more microgrids with positive NE in \([T_1,T_2]\)). Tables 2 and 3 present the experimental results of these two approaches. We have the following observations.

Firstly, both approaches are effective to discover SECs. Optimization-based approach can assign all the microgrids to the corresponding SECs as long as the all the constraints are satisfied. However, as a heuristic algorithm, when \(|t|\ge 900\), the two-phase algorithm cannot involve all the microgrids in the SECs (feasible solution indeed exists as solved by the optimization-based approach). Among all the microgrids, the two-phase algorithm has missed some microgrids with negative NE in \([T_1,T_2]\) as \(|t|\ge 900\). Then, the average NE of all the SECs discovered by the two-phase algorithm is greater than that of the optimization-based approach (as \(|t|\ge 900\)).

Secondly, the SECs discovered by the optimization-based approach are more cohesive than that discovered by the two-phase algorithm (smaller SSE), since the optimization-based approach minimizes the SSE out of all the K values. In addition, we use K-means to simulate five substations of the main grid, and derive the average distance to the main grid (nearest substation) for the 10000 microgrids, which represents the average transmission distance (from the main grid to microgrids). Then, we find out that utilizing SECs for sharing local energy can significantly reduce the energy loss in the transmission, since SSE (the average transmission distance using SECs) is far less than the average distance to the main grid (0.097/0.108 vs. 0.247). Also, Table 4 shows that the load on transmission lines can be significantly reduced using the SECs discovered by both approaches.

Table 4 Load on transmission lines (SEC discovery)

Thirdly, for both approaches, K is selected as \(\{50, 60, \dots , 200\}\), which is a reasonable set of values for 10000 microgrids (6588 microgrids in \(M^+\)). Then, the average number of microgrids with positive NE in each community varies from 32.94 to 131.76. Tables 2 and 3 show that the optimization-based approach identifies more SECs than the two-phase algorithm. For any |t|, the number of SECs identified by the two-phase algorithm is fixed (since the best K is determined only by the microgrids’ geo-locations with positive NE in \([T_1,T_2]\), in the first phase). However, the optimization-based approach may identify different numbers of SECs if different |t| are considered.

6 Conclusion and future work

Energy communities formed by distributed energy resources (microgrids) could facilitate the power grid to advance energy management and enable microgrids to find peer microgrids to cooperate (e.g., sharing/trading energy). In this paper, we have proposed a series of approaches to identify different energy communities for the microgrids such as mixed energy communities and self-sufficient energy communities. We have also validated the effectiveness and efficiency of the approaches using real-world spatial dataset as well as the power generation and consumption datasets.

In the future, we will investigate and solve some other variants of energy community discovery problems for microgrids and we will try to incorporate such preferences into the energy community discovery problems. In addition, besides integrating all the energy generation and consumption over a period into the MECs and SECs discovery, we will explore stochastic optimization models for energy community discovery based on the prediction of the future power generation and consumption, which is expected to improve the efficiency of the energy community discovery algorithms. Finally, energy community discovery requests data collection from all the microgrids, which may compromise their privacy [36]. It is also interesting and challenging to propose privacy preserving energy community discovery techniques which enable the cooperation of microgrids while protecting their local information [5, 7].