Clustering disaggregated load proﬁles using a Dirichlet process mixture model

The increasing availability of substantial quantities of power-use data in both the residential and commercial sectors raises the possibility of mining the data to the advantage of both consumers and network operations. We present a Bayesian non-parametric model to cluster load proﬁles from households and business premises. Evaluators show that our model performs as well as other popular clustering methods, but unlike most other methods it does not require the number of clusters to be predetermined by the user. We used the so-called ‘Chinese restaurant process’ method to solve the model, making use of the Dirichlet-multinomial distribution. The number of clusters grew logarithmically with the quantity of data, making the technique suitable for scaling to large data sets. We were able to show that the model could distinguish features such as the nationality, household size, and type of dwelling between the cluster memberships.


Introduction
There are supply and demand-side drivers to better understand power-use patterns to help deliver robust electricity distribution networks.The introduction of advanced metering raises the possibility of exploiting increasing volumes of data with potential benefits (and disadvantages) to customers, retailers, and network operators.Measures to implement demand-side management are becoming technically feasible in both commercial and residential premises, and innovation in deregulated markets is arising from changing customer expectations and usage patterns.Increasing embedded and multi-scale generation, decreasing reliance on base-load generation, the electrification of transport and heating alongside increasing use of cooling loads are factors introducing uncertainty into network management.The ability to recognise types of customer load and to differentiate between them will become an important tool for the design of tariffs to incentivize load-shifting or other changes in consumption patterns, fault recognition and detection, and planning.The clustering of time-series power-use data may provide a useful tool in these respects.
In general, clustering techniques are unsupervised machine learning algorithms to determine the subsets into which a data set can be divided without a priori information.The objective is to detect the data elements that are similar and to ensure that the elements of these clusters are all different to the elements of other clusters.An overview of many of the aspects of the clustering of electricity profiles and a review of the techniques has been made by [1].Previous work adopted the frequentist approach and used well-known clustering algorithms such as k-means and hierarchical algorithms [2][3][4][5], fuzzy k-means [6], a Support Vector Clustering model [7], or iterative self-organised maps [8,5].In [9], they simply group the data employing environmental characteristic such as months of the year.The disadvantage of these techniques is the need to declare the number of clusters before beginning computation.In [10], the follow-the-leader algorithm was used to cluster load profiles where instead of defining the number of clusters, a distance threshold among the clusters is given by the user.Other approaches segment electricity profiles based on a priori known customer features such as the commercial sector, without applying any clustering method [11].
Our approach uses Bayesian statistics to enable the modelling of unknown parameters that govern the distribution used for explaining the data (which has a distribution itself).Using a distribution of these parameters gives greater flexibility and robustness for managing the uncertainty that data present.In non-parametric algorithms, the number of parameters is not previously established, there being potentially infinite parameters.When clustering with a Bayesian non-parametric method, the number of the resulting clusters is determined by the model and the data, it is not fixed by the user.The Bayesian model that we use is the Dirichlet process mixture model (DPMM).These models have been used successfully for solving clustering tasks in diverse areas such as computational biology [12], computational linguistics [13] or marketing [14].In [15] they used a Bayesian clustering by dynamics method to cluster electricity use time series for load forecasting.Their method models the data dynamics as Markov chains and then applies an agglomerative clustering procedure (our algorithm is not agglomerative) and employs an entropy-based heuristic search strategy to find the most likely partition, which is not needed in our case.
In this paper, we describe the data sets (Section 2), development of the new model and clustering process (Section 3), its application and the extraction of various features from the data set (Section 4), an evaluation with other common clustering techniques.We then conclude and discuss possible further work in this area.

Data sets and pre-processing
Our approach to time-series power-use data exploits two unique data sets.The first is a small high-resolution residential data set (with metadata) produced by a European project. 2 Previous studies have used data of 15-30 min resolution [2,3,7].We suggest that using a higher data rate should improve the usefulness of clustering of power consumption data.The second data set is larger and comprises 30-min resolution data from commercial users.

Residential data set
This data set [16] comprises electrical power consumption data collected between 30-03-2010 and 24-11-2010 from 135 British and 84 Bulgarian dwellings (a total of 219).The meters were recording at between 6 and 8 s resolution.The metadata (Table 1) was: nationality [UK or Bulgaria], number of occupants, number of bedrooms, and the type of dwelling.There are five categories of dwelling: flat or apartment, terraced (a house that is situated in a row of houses sharing side walls with neighbouring properties), semi-detached house (a houses that only shares a single common wall with another house or property), detached house (a dwelling that does not share any walls with any other structure), and other kind of dwelling.
Two pre-process filters removed anomalous readings and meter faults.First, the negative and zero values, and secondly, dwellings whose readings had five or fewer different values for at least half of the total readings.The clean data was transformed to one minute resolution by averaging the number of readings present within each minute.The daily load profile corresponds to the averaged data with minute resolution during a day aggregating all working days of a specific dwelling.Only load profiles that present values for at least 1438 of the 1440 min were used subsequently, with a total of 197 dwellings satisfying all these criteria (125 British: 72 Bulgarian).

Commercial data set
This data set consists of half-hourly electricity use for 1877 UK business from the entertainment sector during 2009 and 2010. 3hese businesses are categorised as restaurants/cafes, hotels/guest houses, pubs/bars, clubs, and cinemas/other leisure.
The pre-process procedure had four steps.First, negative and zero values were removed.Secondly, for each business, we

P
K þ 1 probabilities of a multinomial distribution that correspond to mixture components or cluster in the DPMM H i d-dimensional parameter of a multinomial distribution governed by a Dirichlet distribution that corresponds to the ith cluster removed readings that are over three times the mean plus three times the standard deviation.Thirdly, businesses that did not have a minimum of ten different values in their total readings were removed.Finally, businesses that did not present a minimum of six months of data i.e. 8760 readings, were also removed.
After applying these filters created a data set of 1207 businesses with an average of 20,230 readings and a standard deviation of 7706.On average, they present around 60% of the readings during the two years of sampling.Each daily profile has 48 readings.

Developing the DPMM
A d-dimensional Dirichlet distribution with concentration parameter b ¼ ðb 1 ; . . .; b d Þ is a continuous distribution that defines a probability measure over the k À 1-simplex, i.e. the domain from the Dirichlet distribution can be seen itself as a d-dimensional discrete distribution.
where h i P 0; for all i; and A Dirichlet process (DP) [17] is a distribution over probability measures that can be seen as an extension of a Dirichlet distribution with infinite dimension.It is composed of two parameters DP(G 0 ; a 0 ), where G 0 is the base probability measure and a 0 is the precision or scaling parameter.Any draw G from the DP (G $ DPðG 0 ; a 0 Þ) can be viewed as a discrete distribution with probability of one.Therefore, they can be used as prior probabilities for other discrete distributions or components in a mixture model.There are different representations of the DP such as the stickbreaking construction [17], the Pólya urn scheme [18], and the one used in this work: the Chinese restaurant process (CRP) [19].
The CRP uses the property that if there are n À 1 independent variables distributed by a probability measure generated by a DP (i.e.H 1 ; . . .; H nÀ1 $ G and G $ DPðG 0 ; a 0 Þ), the next draw from G i.e.H n has a probability that is greater than zero of repeating the value of any of the previous draws [18].In addition, the draws that appear more times are more likely to appear again than those draws that appear fewer times ('the rich gets richer' effect).These two properties have a clustering effect that can be exploited with a method analogous to allocating spaces in a Chinese restaurant.Imagine a Chinese restaurant with potential infinite number of tables.Consider the data points to cluster as clients of the restaurant, and the clusters as the tables where customers will sit around (assigned to the cluster).The CRP works in the following way, the first client will sit at the first table, and the nth client will sit at: where n k is the number of customers there are already sitting at the table k, and K is the total number of tables (clusters).Following the analogy, the dishes on the table can be seen as the parameters of the distribution that explains all of the data points in the cluster.Eq. (2b) guarantees that there exists always a small probability to create a new cluster.Once this first allocation of all the customers is carried out, customers can be reallocated between tables.Posterior probabilities of reallocating a customer from one table to another (tables 1 to K) or to a new one (K þ 1) is computed.Computation of these probabilities for our model is described in the following section.

The model
We used the CRP method to solve the DPMM with a potentially infinite number of components (clusters).We made use of a hierarchical Dirichlet process [20] (the Dirichlet-multinomial distribution).The load profiles to cluster are represented as draws from a Multinomial distribution whose parameters are generated by a Dirichlet distribution.Formally, to cluster n load profiles X ¼ fx 1 ; . . .; x n g, where each profile x i is formed by d counters x i ¼ ðc i1 ; . . .; c id Þ, the DPMM that models our problem can be hierarchically expressed as: where the distributions in Eq. ( 3) model the probability of generating the values of the load profile as a multinomial distribution whose parameters correspond to a cluster with index z i (i.e.z i ¼ j indicates that ith load profile is assigned to jth cluster).The Dirichlet distribution that generates the parameters of the multinomial distribution of each cluster (prior distribution in Bayesian statistics) is given by Eq. ( 4), where K is the number of clusters (these K can vary depending on the observed data).The distribution of Eq. ( 5) models the cluster selection (mixture component) by each of the data points.It corresponds to a draw from one element from a multinomial of parameters P.These ps are the components probabilities (priors on mixture model) governed by the distribution in Eq. ( 6).These draws from a DP were calculated using the CRP based on Eqs.(2a) and (2b) where p Kþ1 corresponds to the probability of allocating the data point to a new cluster, i.e. creating a new cluster.In a DPMM the number of clusters obtained grows logarithmically in relation to the number of input data points [21].This also depends on the distributions parameters.

Gibbs sampling algorithm in the DPMM
It is not feasible to compute exactly the posterior PrðP; H; ZjXÞ where H ¼ fH 1 ; . . .; H K g and Z ¼ fz 1 ; . . .; z n g.We used a Gibbs sampling algorithm to approach iteratively the probability of reallocating the object x i to a new cluster z0 i [14] (lines 5 to 11 in Fig. 1).The reallocating posterior distribution ðp 1 ; . . .; p Kþ1 Þ from Eq. ( 6) was computed for each iteration and ith data point in the following way: where k is the index of the cluster in which point x i is reallocated and Prðx i jz i ¼ kÞ is the likelihood of the data point x i , given that the cluster to reallocate this point is the k.Computing this marginal is not straightforward [14].We computed it analytically, integrating the multinomial parameter h (Eq.( 4)) in the following way: where Prðx i jH k Þ corresponds to the probability mass function of x i in the multinomial distribution with parameter H k : is the probability density function (pdf) of a Dirichlet distribution with parameter b updated with the previous data points in this cluster: where n kj is the summation of the jth counter of all the data points in kth cluster but not x i (i.e.n kj ¼ ).This update of the parameters is possible because a Dirichlet distribution is the conjugate prior for a multinomial distribution.Substituting Eq. ( 9) with Eqs. ( 10) and ( 12), and after removing the constant term that is equal for all clusters, we obtain: Note that integrating any probability distribution over all possible parameter values should be one, therefore for any Dirichlet distribution: Cð Applying the results of Eq. ( 16) in the integral of Eq. ( 13), we obtain: By taking the numerator of the first term of Eq. ( 17) and the denominator of the second term of the same equation and applying the property that xCðxÞ ¼ Cðx þ 1Þ, we obtain: By taking the denominator of the first term of Eq. ( 17) and the numerator of the second term of the same equation and applying the same property over the C function, we obtain: Joining the results from Eqs. ( 18) and ( 19), we obtain an expression that does not contain Hs parameter to compute the probability of Eq. ( 9) and make use only of simple operations: Note that in the case of a new cluster (Eq.( 8)) the marginal probability is: The stop criteria for the iterative Gibbs sampling algorithm (line 5 of Fig. 1) are stopping when a consecutive number of iterations without changes during the reallocations has occurred (i.e.values from Z are constant), or when a maximum number of iterations is reached.

Parameter estimation
There are two unique input parameters of the model: (1) the Dirichlet prior or concentration parameters (b ¼ ðb 1 ; . . .; b d Þ in Eq. ( 4)) that control the distributions that govern the data points in clusters, and (2) the precision parameter prior in the DPMM (a 0 in Eq. ( 6)) that controls the number of clusters.This second Fig. 1.The Gibbs sampling algorithms used for clustering load profiles and for estimating a0.
parameter can be computed with its Maximum Likelihood Estimator and practically estimated using a Gibbs sampling algorithm [22].This algorithm will also include the sampling algorithm used for the CRP with the DPMM (see Fig. 1).The new value of a 0 in each iteration of this new sampling algorithm was computed by solving: 1 where I corresponds to the number of iterations of the sampling algorithm used for the CRP (line 10 of Fig. 1) and k ði is the number of clusters in the ith iteration (line 9 of Fig. 1).The stop criteria in this new sampling algorithm (line 3 of Fig. 1) is that either a 0 remains almost constant, or reaches a maximum number of iterations.To solve Eq. ( 22), the Newton-Raphson method was used.
In addition to the cyclical updates of the precision parameter a 0 , we also update the concentration parameter of each cluster so they utilise all members of the cluster.To perform this every 20 iterations, parameter H z for each cluster (see Eq. ( 4)) is redrawn, but from a new Dirichlet distribution whose parameters are updated taking into account the counters of the data points of the cluster (see Eq. ( 11)).
The time complexity of the whole process depends on the number of iterations for both the Gibbs sampling algorithms and the cost of reallocating all points in each iteration: where T is the number of iterations of the sampling algorithm for estimating a 0 (loop starting in line 5 of Fig. 1).I Ã is the maximum number of iterations in the sampling algorithm for computing the posterior distribution (loop starting in line 5 of Fig. 1).k Ã is the maximum number of clusters in all iterations, since the reallocation probability is computed for each data point and cluster.This number cannot exceed the number of data points n and usually it is low (see Section 4).v is the time complexity of computing the reallocation probability for one data point on a cluster that is given by Eqs. ( 20) and ( 21): where c Ã is the maximum counter for all data points and dimension.The final time complexity is OðT Á

Estimating the concentration parameter
Estimating the concentration (prior parameter) of the Dirichletmultinomial distribution is an open research problem for which there are two basic approaches.First, informative prior in which the parameters are estimated taking into account the data composition i.e. which counters are more likely than others, and secondly non-informative prior in which no a priori information of the data counts is used.
A non-informative prior approach can simply be the direct assignment of the same constant value to each b i ; 1 6 i 6 d if it is not know which dimensions should received more weight.Other solutions [23] propose to divide b i into b i ¼ q Á t i , where is the prior mean and q is a constant called the strength of the prior information.Then, they give a value to t i e.g.t i ¼ 1=d and set q to a constant such as q ¼ 1; q ¼ d=2 or q ¼ 1=d.
In the case of informative prior estimation, [24] proposed a maximum-likelihood approach, employing fixed-point and Newton methods.However, a common approach [12,25] is to divide again b i ¼ q Á t i computing the prior mean t i ; 1 6 i 6 d as the sample mean: and estimating q in a similar way as non-informative prior or from a draw from a new finite mixture distribution.In [12], they combined an exponential and uniform density, estimating q with a Metropolis sampling algorithm.The estimation can be performed before starting the Gibbs sampling algorithm shown in Fig. 1.Whether using the informative or non-informative prior, [23] recommended that any prior selection should be reduced by the data dimension.We test different informative and non-informative approaches in the experimental section.

Experiments
The DPMM model was implemented in C++ with all experiments performed over the processed data set.The clusters obtained with the DPMM algorithm were compared with other well-known algorithms using various validity evaluators.The experiments were performed using an Intel Core2 Quad CPU Q9650 at 3.00 GHz with 4 Gb of memory.
Each input data point is represented as an array of d counters (x i ¼ ðc i1 ; . . .; c id Þ) that correspond to a draw from a multinomial distribution.To obtain these counters, each of the averaged values of the load profiles is normalised and transformed into its closest integer value.
Firstly, we used non-informative priors, performing a scanning process to check sensitivity to the concentration parameter b ¼ ðb 1 ; . . .; b 1440 Þ (Eq.( 4)).Using the Gibbs sampling algorithm, the dependency of the number of clusters obtained for values of b for the residential data set are shown in Fig. 2a.The number of clusters increases logarithmically with b, reaching a constant number of output clusters for a wide range of values of this parameter.Furthermore, the cluster size is reasonably stable with respect to the load profiles that formed them.The number of clusters obtained ranged between three and six for a large part of the b parameters space (from 10 À9 to 10 À4 ).The most repeatable result is four clusters (from b ¼ 8 Á 10 À8 to 7 Á 10 À6 ).Two different sets of four clusters were obtained, differing only in the allocation of two profiles.A similar behaviour was observed with the commercial data set (Fig. 2b), but with outputs that contain five and six clusters.To obtain a reduced number of clusters the values of the parameter should be smaller than for the residential data.This is due to the higher number of profiles in the set.This robustness to the variance of the concentration parameter is advantageous with respect to other clustering algorithms such as k-means that have strong dependency on the initialisation conditions where the user has to fix the number of clusters.
Using an informative prior based on the sampling mean reduced by the dimension (i.e.q ¼ 1=d) to estimate the concentration parameter, the resulting output of the DPMM algorithm is also four clusters (Fig. 3).Applying the same technique to estimate the concentration parameter of the commercial data set, we obtained 36 clusters.However, applying a technique based on [12] and reducing by the dimension as [23] suggested, the number of clusters falls to 16.The parameter estimation techniques proposed in [24] produce an non-practical number of clusters for both data sets (gt 70 for the residential data set).

Analysis of clusters
Each of the four clusters obtained for the residential data set exhibits distinct behaviour.Of the 197 complete profiles 94 were in cluster one (Fig. 3a), 57 in cluster two (Fig. 3b), 38 in cluster three (Fig. 3c) and 8 in cluster four (Fig. 3d).From each of these the load profile representing the centroid of each cluster is plotted as shown in Fig. 4. Analysing the shape of the profile in each gives us some basic information about the general characteristics of energy use.The majority of the load profiles of cluster one present two peaks: the first one around 6am which is also a ramping up from the lower overnight load level, and a more sustained second one at around 6 pm.These two peaks can also been seen in the profiles of cluster two.However the morning peak starts earlier, and the evening peak is usually shorter, than those of cluster one.The peak energy consumption of both peaks in cluster two appear to be similar.The profiles of cluster three present a single peak in the evening.In cluster four, there are only eight load profiles and they show a unique long peak from 6am to 6 pm.There is a sharp ramp from the overnight low consumption period for clusters 1 and 2, the centroid with the latest peak corresponds to cluster three with its single peak.The centroid of cluster four with its unique long peak has a different shape and is worthy of further examination.
Analysing the clusters using the metadata in Table 1 we observe: 1.There is a clear division of the load profiles according to nationality (Fig. 5a).In cluster one there are a majority of profiles of English houses.Approximately 70% of the profiles in cluster two are Bulgarian, while cluster four is 100%.Cluster three is the only one where there is no clear dominant group.2. For the number of bedrooms of the property (Fig. 5b), cluster 1 has a majority of which are three bedroom properties and cluster 4 are all single bedroom properties.The other two clusters give no clear differentiation with this feature.3.For the type of dwelling (Fig. 5c) cluster one has the majority made up of terrace or semi properties, whereas in cluster two a clear majority are flats.Cluster four is also all flats (or Other).
For the commercial data set, the centroids (for six clusters) show heterogeneous behaviour (Fig. 6); the composition histogram is shown in Fig. 7.The centroid of cluster one (the largest cluster) shows a double peak-the first at midday and the second around 7 pm.This cluster is mostly pubs and restaurants and shows the lunch and dinner time business peaks.In contrast, although cluster four contains a similar number of profiles and composition the lunchtime peak is clearly smaller.The evening peak is also shifted a little later, which is most likely due to the slightly larger number of clubs in this cluster.This demonstrates that quite subtle differences can be detected.Cluster two, whose centroid has a single large peak at midday, is mainly composed of restaurants and cafes which we interpret that are only open during normal office hours.The centroid of cluster three shows double peak around 7 am and 7 pm.As hotels and guest houses are the main categories of this cluster, it indicates the business activities at breakfast and dinner times.The small number of profiles of cluster five are mainly pubs   and bars that show a peak at around 10 pm and significant activity into the early hours of the following day.This suggests that they have so-called 'late' licences.Cluster six is very small and shows modest peaks at 3am and midday.
For the residential data, we examined the statistical differences between the clustered profiles using ANOVA tests (Section 4.2).This analysis suggests that we can make some further observations, although the data set is not large enough to be conclusive.
The cluster four profiles are all one-bedroom dwellings with one occupant (Fig. 5b).Most of the profiles that correspond to houses with four or more occupants are in cluster one.Additionally, cluster two and three are formed by a majority of profiles from dwellings with one or two occupants.Taking into account the type of dwelling (Fig. 5c), the clearest division is that cluster one is mainly formed by load profiles whose house is a terrace or a semi.Profiles from cluster one (majority English) have the morning peak later than profiles from cluster two where the majority are Bulgarian.Whether this is statistically representative of Bulgaria is not clear, but in this data set the feature is systematic.Cluster four profiles deviate significantly from normal behaviour and their peak load is also high.

Clustering evaluation
The resulting clusters need to be compared with those obtained using other well-known techniques.To assure the validity of the comparison the output data should be in the same format (granularity and normalisation) and only results with the same number of clusters can be compared.
In the DPMM algorithm, the number of clusters is not one of the input parameters as in most other clustering algorithms.Therefore we cannot compare the results for all the possible numbers of clusters, only the ones obtained after scanning the concentration parameter (Fig. 2).The algorithms selected for comparison were: k-means, single link, complete link, pair group method average (PGMA), pair group method centroid (PGMC), and the Ward or minimum variance algorithm [26].Most evaluators are based on computing the similarity of the data elements within each cluster, and the difference among elements of the other clusters.We used three evaluators [3,4,7]: the mean index adequacy (MIA) measures the distance of all the load profiles of the cluster with its cluster centre, the variance ratio criterion (VRC) (so-called Calinski-Harabasz Index) that is based on a ratio between intra-cluster and inter-cluster factors, and the scatter index (SI) makes use of the distance of data points and centre with the mean of all data points.
For the MIA and SI evaluators, lower values suggest better clustering results; it is the opposite for the VRC.The scores are shown in Fig. 8.Only results below 20 clusters are shown as above this number the population of each cluster becomes too small to make meaningful comparisons.For the MIA evaluator (Fig. 8a and b) the DPMM performed slightly worse than other techniques as the number of clusters increased.For the VRC evaluator (Fig. 8c and  d) the DPMM algorithm performed moderately well, especially over the commercial data set.Using the SI evaluator (Fig. 8e    (c) Type of dwelling feature.f) the DPMM algorithm performed well for three or more clusters for the residential data set, and performances converged for all algorithms at higher cluster numbers for both data sets.
As an additional test of the statistical difference between the load profiles of the clusters, we conducted ANOVA tests (T-test and F-test with a significance level a ¼ 0:05).For the experiments using the residential data set, for each minute we tested the hypothesis that all the clusters have the same mean (i.e.m 1i ¼ m 2i ¼ Á Á Á ¼ m Ki where m ji is the mean of the jth cluster at the ith minute, and K is the number of clusters).Fig. 4 shows the means of the four clusters.The failure of the test would imply that there is at least one cluster mean that is statistically different for this particular minute.For the residential data set the results indicate that there is at least one different cluster for almost all the minutes (row 1 of Table 2).It implies that the profiles of the clusters present some degree of separation.However, it does not mean that all the cluster means are different each other, excepting for K = 2 where 72.9% of means during the 1440 min are statistical different between the two clusters.For this reason, a second more strict test was performed with the following condition V j;l m ji -m li for 1 6 j; l 6 K ^j -l for each minute 1 6 i 6 1440.It implies that all cluster means are different each other.Results show (second row of Table 2) that the number of minutes that fulfil the condition changes with the number of clusters, e.g. six clusters seem to divide the load profiles in more different groups than four or five.This is due to the creation of subgroups with more specific behaviour.Nevertheless, the most important fact is that clusters obtained with the DPMM algorithm present a significant number of minutes that are statistically different among all the clusters, indicating that the division is justified.
Similarly, using the commercial data the less stringent test (third column of Table 2), all numbers of clusters had 100% of the minutes with at least one cluster with a different mean.When comparing the more strict condition, where each cluster mean is different from each other (fourth column of Table 2), the scores are higher than the ones obtained for the residential data set.This may due to: (1) the commercial profiles have greater variety than those of the residential set, (2) the lower temporal resolution of the commercial profiles (consumption is aggregated over a longer time).
In Fig. 9, we show the execution times of the different algorithms for experiments using the residential data set as reference.As expected from the complexity of the both Gibbs sampling algorithms used for the DPMM (Section 3.3), its running times are longer than the other clustering methods.The number of iterations (I Ã ) to converge the algorithm is the most important element for the DPMM and is the reason for the longer execution times.We need to be aware that these other methods start with the important advantage of knowing the number of clusters, meanwhile the DPMM algorithm has to converge to the solution that best fits the data given the model with a unique input parameter.If we compare the running time of just one iteration of the DPMM algorithm we appreciate that the time is not far from that of the PGMC and WARD algorithms for small number of clusters.The execution time and the number of iterations to converge the algorithm increase with the number of clusters.This is governed by the stop criterion explained in Section 3.2 where the profiles should remain in the same cluster for some consecutive iterations.

Conclusions
We have shown that a clustering algorithm based on a Bayesian non-parametric model, the DPMM, can distinguish between electrical power use profiles.The flexibility and robustness for managing uncertainty in real data of Bayesian statistics enabled us to model the unknown parameters that governed the distribution used for explaining the differences between load profile types.This method has the advantage that the number of clusters does not need to be determined before computation is initiated as there are techniques to estimate all of the model parameters.These estimation techniques are important for the resulting clusters, therefore their evolution with varying amounts of data should be taken into account to obtain a robust method.Although the computational performance of the DPMM was found to be slower than other techniques, the difference was not significant for this application.Furthermore, it may be possible to reduce computational complexity by parallelising some of the Gibbs sampling algorithm steps or allowing more relaxed convergence conditions.
Our model was tested using two different real data sets.One comprised residential energy consumption data with one minute resolution and the second of 30-min commercial profiles.The DPMM generated four and six clusters for the residential and commercial data sets respectively.In both cases this was a small enough number to be credible, yet sufficient to present meaningful and distinct load profile types.In particular, using the metadata of the dwellings our analysis showed that we could assign statistically significant features such as the nationality, household size, and type of dwelling to the cluster memberships.
From the residential data set it is apparent that the measurement devices used were not of sufficient quality as they produced unexplained high levels of device failure and data corruption, causing us to discard 10% of the available raw input data.As larger and better quality data become available through widespread deployment of smart meters, our technique can be tested more extensively with potential application to network operations.

Fig. 2 .
Fig. 2. Number of clusters depending on the concentration parameter b.

Fig. 3 .
Fig.3.Clusters obtained using the DPMM clustering algorithm for the residential data set.

Fig. 4 .
Fig. 4. Centroids of the clusters of the residential data set.
Number of bedrooms feature.

Table 1
Features and categories of the processed data set.
the creation of a new cluster.B is a normalisation factor that guarantees that probabilities sum to one.Z Ài are all indices Z but not including z i .The marginal probabilities