A Virtual Power Plant Load Curve Clustering Method Based on Improved K-means Algorithm and Its Application

In view of how virtual power plants can effectively participate in power grid operation, a method of load curve clustering of virtual power plants based on principal component analysis reduction and aggregation level clustering and k-means clustering is proposed, and the application of clustering results is studied. Firstly, combined with the data obtained from the information physical network, the principal component analysis method is adopted to analyze the characteristics of different loads participating in the virtual power plant aggregation, so as to standardize the data and reduce the dimension. Then, the algorithm combining aggregation hierarchical clustering and k-means clustering is used to cluster all load output curves participating in the aggregation, to obtain load curve clusters of the same class and find out the clustering center. Finally, the clustering results are analyzed, and the corresponding evaluation system is established. Through comprehensive evaluation, appropriate load combinations are selected to participate in the virtual power plant aggregation.


Introduction
In the context of the construction of the Energy Internet, small and micro smart sensors will be widely distributed in all corners of the power grid. The data obtained by the power terminal is huge, which will promote the transformation of all power and related businesses to intensive, intelligent and automated [1]. Virtual Power Plant (VPP) as the specific form and basic unit of the energy Internet, has various aggregation methods. This paper analyzes the output curve characteristics of each load, uses artificial intelligence technology to aggregate loads with different characteristics, uses clustering centers to describe their output characteristics, incorporates these characteristics into the aggregation indicators of the virtual power plant, and uses comprehensive evaluation to determine the final aggregation program [2].
There are many ways to aggregate virtual power plants. First, the dimension of the resource data participating in the aggregation of virtual power plants is reduced, and then the clustering is performed. At present, the application of cluster analysis in power systems is mainly focused on load forecasting. With the construction of the ubiquitous power Internet of Things, cluster analysis will be applied to many levels of the power grid. Literature [3] uses the k-means clustering method to model through a multi-scenario method to describe the uncertainty of the load. After obtaining the cluster centers of various clusters participating in the virtual power plant aggregation, this paper establishes a matching index system to determine the optimal aggregation method [4]. The wind power and photovoltaic IOP Conf. Series: Earth and Environmental Science 619 (2020)  In order to highlight the seasonality of wind power and photovoltaic, this paper, based on the data of one year, aggregates the curves by quarter and finds their clustering centers. The complex curve is divided into several specific clusters, the curve characteristics of each cluster are analyzed, and different combination methods are comprehensively evaluated to determine the optimal load combination.

Principal component analysis method
Massive data brings great value to the prediction, aggregation and optimal scheduling of virtual power plants, and also increases the computational burden. Data dimensionality reduction can effectively reduce computational pressure.
Assuming that there are n output curves of wind power, photovoltaic, and energy storage participating in virtual power plant scheduling, each curve is a p-dimensional variable, and the sample matrix X composed of these curves is: where ij x is the i-th variable of the j-th sample. Record the original variable index as 1 x , 2 x ，…， p x set the new variable, that is, the comprehensive index after dimensionality reduction processing, as The score of each principal component can be obtained: At this point, the dimensionality reduction of various types of load curves of the virtual power plant has been achieved, and in the subsequent calculations, these m principal components are used to replace the original complex data for calculation, so as to simplify calculations and improve efficiency.

K-means clustering method based on agglomerative hierarchical clustering
Hierarchical clustering methods are divided into two types: cohesive hierarchical clustering and split hierarchical clustering. This article uses the more commonly used agglomerated hierarchical clustering [5]. First, consider n objects as n clusters, calculate the Euclidean distance ij d between clusters: In the formula, ij d represents the distance between the i-th curve and the j-th curve; is the corresponding principal component score in the Z matrix; n clusters have a total of ( 1)/2 n n − distances. Arrange the distances from small to large, and the array contains marks for n objects.
Use agglomerative hierarchical clustering to determine the initial cluster center to perform k-means clustering, the specific steps are as follows: (1) Determine the number of clusters k; (2) Input the Z matrix and perform agglomerated hierarchical clustering analysis on the data set; (3) Calculate the average value of the obtained data of k clusters to obtain the initial cluster center; (4) Using k-means clustering, calculate the distance from each data to all cluster centers, and classify the data into the cluster where the nearest cluster center is located; (5) After the classification is completed, recalculate the cluster center of each cluster; (6) Repeat the fourth and fifth steps until the cluster center no longer changes or no data is reassigned to other clusters.

Virtual power plant load combination scheme based on comprehensive evaluation
This article mainly considers economy and environmental protection. The economics considers the virtual power plant's wind power, photovoltaic power generation benefits, energy curtailment costs and other factors. Used to measure the clean energy consumption level of virtual power plants. The higher the economy and environmental protection, the better the effect of participating in virtual power plant aggregation on behalf of wind and light. Curve characteristics refer to several curve clusters obtained according to improved k-means clustering, analyze the curve characteristics of different clusters, and use the curve characteristics and scale of each cluster to measure the characteristics of renewable energy itself, mainly including output characteristics and reliability [6], IOP Conf. Series: Earth and Environmental Science 619 (2020)  where the output characteristics consider factors such as the daily load rate of energy, the daily peakvalley rate, and the daily load fluctuation rate, and the reliability considers the percentage of time that the curve output of this type of cluster meets the threshold every day and the type of cluster within a specific time .The abnormal rate of the curve, better the output characteristics and higher the reliability. This paper selects the eight indicators shown in Table 1 as the comprehensive evaluation indicators for guiding load combination, and gives the quantitative calculation methods of each indicator. In order to determine the optimal load combination and ensure energy efficiency, this paper evaluates all load combinations in the number of combination clusters. After obtaining all the load combinations that meet the conditions, the entropy weight method is used to determine the index weight, and the final score of each combination is weighted according to the results. For a given number of aggregated clusters, compare the evaluation scores under different combinations, select the best load aggregation method for virtual power plants under the indicators considered in this paper.

Case analysis
This paper is based on the actual power of the 7 photovoltaic sites, 8 onshore wind power and 4 offshore wind power sites in Europe in 2017, taking into account the seasonality of the scenery, and clustering on a quarterly basis. Take the data of the first quarter as an example, take 15 minutes as an interval, draw a total of 1710 curves of 19 load 90 days for clustering, and study the law of output. The normalized output distribution of all loads is shown in Figure 1. Each curve is the normalized output distribution of a certain load in one day.  First, perform principal component analysis, take the cumulative contribution rate of 85%, and obtain the eigenvalues that meet the conditions and their contributions as shown in Table 2. Then proceed to agglomerative hierarchical clustering. After the validity test, the clustering effect is best when the number of clusters is 8. After obtaining 8 initial cluster centers, perform k-means clustering. The clustering results are shown in Figure 4, where the red curve is the final cluster center of the cluster. The numbers of 1-8 types of curves are: 155, 65, 631, 190, 113, 170, 246, 140, the characteristics of each cluster curve are more obvious. Secondly, the performance of the two algorithms is quantitatively compared through the effectiveness index. The improved k-means algorithm has shorter clustering time, and X is smaller than the traditional method, and the clustering effect is better.
After the clustering is completed, the effect of different cluster loads participating in VPP aggregation is evaluated according to the clustering results, and the optimal solution for aggregation is determined. Different types of resource output characteristics are different, so this article uses clusters as the aggregation unit, assuming that the number of clusters that can participate in the aggregation is greater than or equal to 4, and a total of As the number of combination clusters increases, the economy and environmental protection of this method will increase. In order to reduce variables, this article takes the number of cluster clusters equal to 7 as an example to compare the advantages and disadvantages of the 8 combination methods, as shown in Table 3. Considering the index system selected in this paper, the virtual power plant scores the highest when the 7th cluster is not included, and the aggregation effect is the best. The virtual power plant scores the lowest when the third cluster is not included, and the aggregation effect is the worst. It can be seen from Figure 2 that the third cluster is mainly the output curve of photovoltaic power plants. When photovoltaic output is excluded, both the power generation revenue and the clean energy consumption rate are significantly reduced, resulting in the lowest comprehensive evaluation score, which is consistent with the actual situation.
The method proposed in this paper can be used not only for the aggregation and evaluation of virtual power plant load curves, but also for dimensionality reduction and clustering of power curves in other scenarios. At the same time, it provides ideas for the evaluation of virtual power plant aggregation results. In the selection of indicators, the contribution of wind power and photovoltaics is mainly considered, which has certain limitations. Full consideration of the contribution of various types of loads of virtual power plants will make the evaluation results more accurate.

Conclusion
Considering the complex components of the virtual power plant load curve and the special operating conditions, this paper proposes a virtual power plant load curve clustering method based on PCA and improved k-means algorithm, and discusses its application. Clustering analysis of the load curves participating in the aggregation of virtual power plants through the principle component analysis method of dimensionality reduction and the clustering method combining agglomerated hierarchical clustering and k-means algorithm can solve the practical problem of virtual power plants participating in the operation of the power grid. The calculation example shows that the method can be applied to the clustering of power load curves, and the clustering effect is obvious. In the application of clustering results, a comprehensive evaluation index system for virtual power plants has been established, which can accurately reflect the effect of load participation in virtual power plant aggregation. It evaluates the possible load combination set, and verifies the rationality of the evaluation through calculation examples, which provides ideas for other related applications of clustering results.