A Data-Driven Approach for Discovery of Heat Load Patterns in District Heating

Understanding the heat use of customers is crucial for effective district heating (DH) operations and management. Unfortunately, existing knowledge about customers and their heat load behaviors is quite scarce and very few studies have been focusing on this aspect. The deployment of smart meters offers a unique opportunity for researchers and DH utilities to analyze large-scale data and discover both typical, as well as atypical, patterns in the network. Heat load pattern discovery is a challenging task in DH systems, since a comprehensive analysis needs to involve many customers. Most of the past studies have relied on analysis of a small number of buildings, which are not shown to be picked as the representative examples. Therefore, the knowledge discovered in such studies is not enough to generalize for the entire network. In this work, we propose a data-driven approach that enables automatic discovery of heat load patterns in a complete district heating network. Our method clusters the buildings into different groups based on the characteristics of their load profiles, extracts the representative patterns for each of them, and detects abnormal profiles, i.e., the ones deviating from the expected behavior. We present the first comprehensive analysis of the heat load patterns by conducting a case study on all the buildings, in six customer categories, connected to two district heating networks in the south of Sweden. Our method has captured fifteen typical patterns among the heat load profiles of all buildings in our dataset. It shows that control strategies are not enough to explain the variability in the heat load behaviors. In conclusion, we demonstrate that the proposed approach has a great potential to develop knowledge about customers and their heat use habits in practice by automatically analyzing their typical and atypical profiles in large-scale.


Introduction
Future energy systems are facing critical challenges such as the steady growth of energy demand, the energy resource depletion, and increasing emissions of carbon dioxide (CO2) and other greenhouse gases. District heating plays a vital role in the implementation of future sustainable energy systems [1,2,3,4,5] by diversely incorporating recycled and renewable heat sources and contributing to a decrease in carbon emission. However, the present generation of district heating technologies must be improved to achieve the target of a 100% renewable energy supply system. The concept of 4th generation district heating [6] discusses how to design efficient and reliable networks, and considers environmentally-friendly heat production units.
The most important factor in increasing the efficiency in such systems is reducing distribution temperatures so that the quality between the energy supply and demand improves [7]. Achieving low temperatures in the network requires intelligent control systems and elaborated strategies for continuous identification of operation errors causing high return temperatures. To design such strategies, it is crucial to have in-depth knowledge of the customers, and a better understanding of their heat use as even a single substation can have a significant impact on the global efficiency of the system.
Heat load patterns represent the most "typical" behaviors in the DH networks and provide information about how different customer groups use the heat. Analyzing such patterns is quintessential for effective DH operation and management [8]. It can be used by DH-companies to optimize their operations, to implement new control strategies and personalize demand management for specific customer groups. Furthermore, this analysis can help decision makers to develop energy efficiency policies and roadmaps.
Another important aspect is the analysis of DH customers who exhibit abnormal heat use. Even a single problematic customer can influence the overall performance of the network. Traditionally, DH companies have accepted the inefficient heat use and tried to improve the operations on the production side. However, this approach is not going to work in 4th generation DH. Until today there have been only few works focusing on the demand side, for example aiming to reduce peak loads. However, most of them rely on naive approaches such as using the total head demand or building age as a measure of inefficiency. The biggest challenge is the lack of knowledge about customers and how they use heat, since many factors can lead to abnormal heat demand, including poor substation control, unsuitable control strategies, faults, etc. Therefore, it is of great interest to identify customers with abnormal heat load profiles or unsuitable control strategies for further investigation.
Discovering typical and abnormal patterns is a complex task, especially for DH systems involving many customers with different characteristics. Heat demand can be affected by several factors [9] such as activities in the buildings, outside temperature, incident solar radiation, socio-cultural factors, etc. The knowledge discovered in Sweden may not be directly applicable to, e.g., Italy [8]. Furthermore, inspecting the behaviors of all buildings in the entire network is prohibitively time-consuming. All those reasons make data-driven solutions required for the large-scale analysis of district heating systems.
In this work, we present a data-driven approach for automatic heat load pattern discovery and perform the first largescale analysis of the heat load behaviors in two district heating networks. Our contributions are: 1. Heat load pattern discovery: we develop a new method to automatically discover groups of buildings with similar heat load profiles and extract representative patterns showing the characteristics of each group. 2. Customers of interest: we identify customers whose heat load profiles indicate potential problems and require further investigation. In particular, we detect two types of customers, i.e., those significantly deviating from their expected heat load patterns, and those with control strategies determined to be unsuitable for their category. 3. Large-scale evaluation: we present a large-scale analysis of all the buildings, in six different customer categories, connected to two district heating networks in the south of Sweden. This work is the first study analyzing both individual and aggregated behaviors of all the DH customers in the entire system.
The rest of this paper is organized as follows. Section 2 briefly surveys related work on heat load patterns and clustering. Section 3 first introduces some important concepts and overview of the data-driven approach and then, presents all the steps of the proposed method in details. Section 4 describes the dataset and shows the comprehensive results from analysis of the real-life case study in Sweden. This is followed by a discussion of the results (Section 5). Finally, we conclude in Section 6.

Related Work
In this section, we present related work in the areas of (i) heat load patterns and (ii) clustering analysis. First, we give an overview of the related works in the domain of heat load pattern analysis; then we review the state-of-the art concepts related to cluster analysis which we use in our data-driven approach.

Heat load patterns
Because of the unavailability of high-resolution, hourly or sub-hourly meter data before the installation of smart meters, the literature on energy analytics in district heating is still in its infancy. Therefore, there are not many studies focusing on the analysis of heat load patterns in district heating systems.
In [10,11,12], the heat load patterns were analyzed in order to estimate heat load capacities for billing purposes. An approach to separate domestic hot water from space heating using existing heat meters is proposed in [13]. Heat loads have been monitored and evaluated in [14] to increase energy efficiency in multi-dwelling buildings.
The energy signature (ES) methods are used for characterizing heat load behaviors of buildings in multiple studies, for various purposes such as weather correction [8], the estimation of heat loss [15,16,17], or identifying abnormal demand [18]. However, they only reflect individual heat demand as a function of outside temperature over a year. It does not allow profiling of the buildings based on other aspects such as daily behavior, weekend routine, etc.
More recent works [19,20] have targeted applications in peak forecasting and peak shaving. They mainly concern energy conservation by reducing peaks in daily patterns called load curves, which are 24-hour records of the heat loads. However, daily load curves heavily depend on weather effect and mostly reflect temporary behaviors rather than regular ones.
Our work is complementary to the studies by Werner and Gadd, which presented a method to manually analyze heat load patterns in [21] and set of rules to find unsuitable behaviors of the buildings in [22]. We leverage those works as prior domain knowledge and incorporate some of the concepts that they introduced into the automatic discovery of the heat load patterns. Furthermore, we formalize the individual heat load behavior and group behavior separately in this study while previous works do not define a clear distinction between those concepts in heat load patterns perspective.

Clustering
Clustering is the task of organizing data where similar objects are placed into related or homogeneous groups without prior knowledge of the groups definitions [23]. It is one of the most popular methods in exploratory data analysis as it identifies structures in an unlabelled dataset by organizing data into groups that are objectively similar [24]. Numerous techniques have been proposed in the literature for finding clusters in different types of data. In this study, we are mostly concerned with time-series clustering techniques, since we represent individual heat load behaviors of the district heating customers as the function of time.
Time-series clustering is a special case of cluster analysis that has been used in many scientific areas to discover interesting patterns in time-series datasets such as smart meter datasets. Many time-series clustering algorithms have been proposed in the literature. There are generally three different approaches to cluster time-series: feature-based, model-based, and shapebased [23].
In feature-based and model-based approach, the raw time series is either converted into a feature vector of lower dimension or transformed into model parameters so that classical clustering methods can be applied [25]. However, feature-based or model-based techniques can lead to loss of information, and they present drawbacks such as the application-dependence of the feature selection, or problems associated with parametric modelling [26].
The shape-based approach mostly takes traditional clustering algorithms and modify the similarity measure to match shapes of two time-series as well as possible. This approach has also been labelled as a raw-data-based approach because it typically works directly with the raw time-series data in contrast to feature and model-based approach. Shape-based algorithms usually employ conventional clustering methods, which are compatible with static data while their distance/similarity measure has been modified with an appropriate one for time-series.
Shape-based methods are highly dependent on the similarity measure [27]. For example, although the Euclidean distance is popular and simple, it is not suitable for time series data. On the other hand, Dynamic Time Warping (DTW) [28] distance measure and its variances are more suitable for most time series data mining tasks due to its improved alignment based on shape [29]. However, those approaches are computationally expensive and inefficient for time-series averaging tasks. The shape of the cluster center does not represent the characteristics of sequences accurately in the same cluster [30,31].
K-shape clustering [32] is a novel centroid-based clustering algorithm that can effectively preserve the shapes of time-series sequences while managing fast clustering. In addition, k-shape introduces two components: (1) shape-based distance (SBD) for dissimilarity measure and (2) time-series shape extraction method for centroid computation. The first one allows measuring the similarity of time-series sequences based on their shapes while the second component helps to extract the representative pattern that summarizes the behavior of the cluster.
K-Shape applies an iterative refinement procedure similar to well-known k-means algorithm; every iteration consists of two steps. In the assignment step, the algorithm updates the cluster memberships by comparing each time series with all computed centroids and by assigning each time series to the cluster of the closest centroid. In the refinement step, the cluster centroids are updated to reflect the changes in cluster memberships from the previous step. The algorithm outputs the assignment of the buildings to clusters and the centroid for each cluster, which shows the most representative shape of each group.

Concepts
We introduce some important definitions in the problem of heat load pattern discovery.
Definition 1 (Heat Load): Heat load is the quantity of heat per unit time that must be supplied in order to meet the demand in a building. We define the four seasons in a calender year as winter (12 weeks of December, January and February), early spring and late autumn (18 weeks of March, April, October and November), late spring and early autumn (9 weeks of May and September) and summer (13 weeks of June, July and August).
Intuitively, heat load profiles capture the recurrent behavior of a building over the whole year with the hourly variations during the day, the changes across weekdays and seasonal differences ( Figure 2).

Definition 3 (Heat Load Pattern):
A heat load pattern is the representation of the central behavior in a group of buildings.
Let NP = {P 1 ,P 2 , ...,P n } be a set of heat load profiles in a district heating network. We divide NP into k different clusters such that C = (C 1 , C 2 , . . . , C k ), where C k ⊂ NP and C i ∩C j = ∅. We define p i , a heat load pattern, as the centroid of a cluster C i .
Intuitively, clustering heat load profiles and extracting cluster centroids provides a set of heat load patterns that capture the most typical behaviors in a district heating network.

Method overview
In this section, we describe the details of our data-driven approach. It is shown in Figure 2, and it involves three major steps: (1) data preprocessing, (2) clustering and pattern discovery, and (3) visual exploration. In the first step, the data is cleaned, transformed and normalized. In the second step, kshape clustering is performed to group customers having similar heat load behaviors. Abnormal heat load profiles, which do not conform to behavior in any group, are detected and removed. Clusters are re-computed without the presence of those buildings, and heat load patterns are extracted. Finally, in the third step, heat load patterns are visually inspected and qualitatively evaluated by the expert. Control strategies are assigned to clusters according to the characteristics of their heat load patterns.

Data preprocessing
From 1 January 2015, the Swedish district heating companies are obliged by the government to measure and charge the customers according to their actual district heating consumption. Therefore, all substations in the Swedish district heating systems are today equipped with a smart meter device, which measures the heat used by the customer. In Sweden, the collection of data from the meter devices to the district heating companies has been operated automatically without manual collection. The value delivered to the district heating company may deviate from the real value due to connection problems. Those errors appear quite frequently on data generated for analysis and visualization purposes. One of the typical measurement faults in the meter readings is sudden jumps in the heat load. This case can occur if the meter device is not working accurately or if it gets replaced by a new device where the consumption has not been reset correctly. We use median absolute deviation (MAD) based estimation to detect such extreme values. Heat loads more than five MAD away from the local median are identified as jumps. Those values are corrected by linear interpolation.
Connection problems in the meter device often result in missing values. Buildings with missing sensor readings for more than two consecutive days or more than thirty days in total are excluded from the analysis. Missing values in other cases are filled by linear interpolation of surrounding values. Poorly functioning meter devices can also cause repeating measurements. In those cases, meter readings do not change over a period. We have identified buildings whose values are identical for two consecutive days and also excluded them from the study.
After data cleaning, we extract heat load profiles of the buildings to model individual heat load behaviors in the network. Figure 1 illustrates the computation of each heat load profile where {M w3 , M w2 , ..., M w168 } corresponds to one week of hourly (24x7) heat load measurements in week w. Every heat load profile is first merged into a single unified sequence to be used in the clustering process. After clustering, they are converted back to their original format for better visualization.
The last step of the data prepossessing is normalization of the heat load profiles. This process is essential for the clustering algorithm. As a distance measure, the clustering algorithm uses a normalized version of the cross-correlation measure to consider the shapes of time series. This method is sensitive to scale and requires appropriate normalization to achieve scale invariance. Therefore, all heat load profiles are normalized by z-normalization, (z = x−µ σ ) before the clustering step.

Heat load pattern discovery
After data preprocessing, we apply clustering analysis for grouping similar heat load profiles and extract representative heat load patterns. Heat load profiles reflect how the heat is used in an individual building over a year by forming the change during the day, the differences among weekdays and seasonal variations. Therefore, it is essential to consider the shape characteristics of those profiles, i.e., the time and magnitude of its peaks in the clustering process. For this purpose, we apply k-shape [32] which is a centroid-based clustering algorithm that can capture the similarity in the shapes of time-series sequences.
To capture the similarity in the shapes of time-series sequences, k-shape proposes a new distance measure called shapebased distance (SBD) based on normalized cross correlation (NCC). Considering two sequences − → x = (x 1 , ..., x m ) and − → y = (y 1 , ..., y m ), the shape-based Distance (SBD) can be calculated by finding the position w where NCC w ( − → x , − → y ) is maximized: As it is defined in section 2.2, a heat load pattern is the representative of heat load profiles in a cluster and defined as the cluster centroid. K-shape algorithm looks at centroid computation task as the Steiner sequence problem [33] where the objective is to find the minimizer of the sum of squared distances to all other data points.
For given a partition p j P, the corresponding centroid − → c j is: However, cross-correlation measures the similarity rather than the dissimilarity of time-series. Therefore, the optimization formulated as follow where P k is the kth partition and − → µ k is the initial centroid for the kth partition: K-shape is a centroid-based algorithm similar to k-means, which means it is similarly sensitive to outliers. Heat load patterns and cluster qualities can be affected by the presence of outliers, i.e., abnormal heat load profiles. Therefore, after removing profiles that are detected as abnormal, we apply the clustering process again to obtain final heat load patterns.

Detecting abnormal heat use
The classical assumption in unsupervised anomaly detection is that anomalies are the samples which deviate so much from the other samples as to arouse suspicions that a different mechanism generated them [34]. Following that, our method detects abnormal heat load profiles based on the distribution of distances to the centroids in each cluster. The further away an example is from its cluster centroid, the more likely it is to be abnormal.
Given a cluster C, let D be the vector of distances to centroids and µ and σ be mean and standard deviation of that vector, then the abnormal profiles are determined as follows: where n(d i , a) = d i − µ − aσ and a = 3 Selection of parameter a determines the upper-bound for false positive rate with 1 1 + a 2 based on Cantelli's inequality [35]. By choosing 3σ rule, we estimate a theoretical threshold with confidence level at 10% to distinguish abnormal heat load profiles from normal ones.
Abnormal heat load profiles that are detected by this approach are saved for further investigation of the domain expert.

Visual Inspection
In this step, all extracted patterns and their profiles are visualized in a fashion that can help the domain expert to examine them. The visualization should help the expert quickly grasp the typical behavior of each group. Moreover, it should reflect the diversity of individual behaviors within the group so that the cluster quality can be quickly validated.
To this end, we visualize clusters by plotting heat load patterns with opaque colours and heat load profiles of the buildings with transparency as shown in Figure 4. With this type of visualization, it is also possible to observe the variation among cluster members and how densely they are populated.
Heat load patterns are affected by control strategies that are applied in the substation of the building. Based on domain knowledge, we assume four different control strategies, i.e., continuous operation control (COC), night setback control (NSB), time clock operation five days (TCO5) and time clock operation seven days (TCO7).
In COC, the ventilation is running 24 hours a day. Therefore, it is expected to observe small variations in the heat load during the day (Figure 3a). NSB lowers the set point for the indoor temperature during the night, leading to lower heat loads during nights, which are followed by high peak loads in the mornings that vanish quite fast (Figure 3b). In TCO7 ( Figure  3c) ventilation is shut down during nights resulting in high differences between day and night. TCO5 (Figure 3d) Figure 3: Control strategies according to [21] but the ventilation is completely off during weekends.
At the final step of the visual inspection, the expert assigns control strategies of the clusters. If the heat load pattern of a cluster reflects the characteristics of one of the four control strategies, the cluster and its members are assigned with that strategy.

Data
The dataset used in this study is derived from smart meter readings of buildings connected to the district heating systems in Helsingborg andÄngelholm in the South-West of Sweden. The dataset includes one year of hourly measurements of heat, flow, supply, and return temperatures on the primary side of the substations in 2016. In this study, we only use the heat measurements of the buildings in five customer categories, i.e., multi-dwelling buildings, industrial demands, health-care social services, public administration buildings and commercial buildings. The total number of buildings is approximately 2200.

Heat load patterns
In this section, we present the clusters and the heat load patterns discovered by our method. For this experiment, the number of clusters (k), is decided as fifteen by maximizing silhouette score and also confirmed by the domain expert.
According to the expert's validation, eight clusters show the main characteristics of continuous control (COC). The presence of the difference in their heat load patterns points out interesting heterogeneity in the behaviors of the buildings controlled with the same strategy. For example, Figure 4(a) and Figure  4(b) show cluster examples assigned with COC. It is clear that the first example shows low variations during the day while the second one has much higher variation and lunch valleys.
Furthermore, the domain expert identified two clusters with night setback control (NSB), Figure 4(c) and Figure 4(d). Both of them form a reduced night load which is a typical characteristic of NSB. However, the second pattern clearly constitutes larger morning peaks in comparison to the other one, which could be attributed to difference in building isolation.
The remaining clusters are identified as time clock operation control with ventilation. Three of them are TCO5 and two of them are TCO7. Both Figure 4(g) and Figure 4(h) show clusters assigned to TCO5, highlighting the different properties of those patterns. For example, the effect of ventilation difference between day and night is much more dramatic in Figure 4(h) when compared to Figure 4(g). Moreover, in the latter, a decrease in heat demand can be observed in the afternoon, most likely due to incident solar radiation.
The clearest distinction between the two TCO7 clusters is the weekend behavior. Figure 4(e) shows typical features of TCO7 while Figure 4(f) has reduced weekend demand. This new pattern is quite interesting. According to previous studies [21], it is expected to observe that either there is no significant activity during the weekend in the building or the weekend behavior is similar to the rest of the week. This new pattern, however, shows that there is a considerable number of buildings with significant but reduced weekend activity.

Abnormal heat load profiles
Our method also identified buildings whose heat load profiles show significantly different characteristics than their expected heat load patterns. Figure 5 presents three examples that are detected as abnormal and further investigated by the expert.
The first building, Figure 5(a), shows a strange trend where demand increases from Monday to Saturday. It also has inconsistent daily variations where some days have higher night loads while other days are not. The further analysis revealed that this is a building with a restaurant and a nightclub. The nightclub is open on Fridays and Saturdays which explains the high heat demand in those days. The low heat loads on Sundays also indicate that there are not many customers on that day.
The second building ( Figure 5(b), on the other hand, has atypical behavior with increased weekend loads in colder seasons. This building belongs to a company working for graphical design. The reason behind the high heat load during weekends and nights can be that building is partly heated by excess heat from machines during daytime in weekdays. Moreover, return temperature measurements of this building are consistently high over a year [22].
The third building exhibits extremely sharp irregular afternoon peaks five days of the week. It is a building next to a football ground that contains locker rooms. The use of domestic hot water for showers could explain the peaks. However, it is still difficult to interpret why consistently no one takes a bath on Wednesdays and Sundays.
As a result, our method classified 163 heat load profiles as abnormal. Those profiles are of special interest for two main reasons. First, such strange heat load behaviors can give information about inefficient heat use in DH systems and require further root cause analysis. However, not all strange profiles indicate inefficiency. In many cases, they look much different than the others because activities and operations in those buildings are rare or unique. Those buildings with unique heat demand are also quite important to develop in-depth knowledge about the demand-side (customer-side).

Control Strategies
In this section, we present the comprehensive results of the control strategies assigned to each building by the domain expert after visual exploration. Table 1 shows the number of buildings assigned to each of four control strategies, i.e., COC, NSB, TCO5 and TCO7. Across the complete network, the majority of buildings (60%) are assigned with continuous control, while NSB and time clock operation reach 15% and 25%, respectively.
Majority of the multi-dwellings (70%) exhibit COC ( Figure  6). It is followed by NSB which is approximately 17%. Time clock operation control strategies are not very common in this customer category (around 6%).
The percentage of the buildings assigned as continuous control is lower in industrial demands category (below 60%). Time    Health-care and social services are also similar to industrial demands (52% of buildings have COC). The main difference in this category is popularity of NSB (21%).
In contrast to the previous categories, less than half of the The majority of school buildings and public administration buildings have TC07 with the rate of 73% and 38%, respectively.

Unsuitable Control
In district heating systems, some of the buildings may be controlled with a strategy which is not appropriate for that building category. According to the domain knowledge [22], we consider the following rules to identify unsuitable control strategies: • Multi-family dwellings that do not have a continuous control • Commercial and industrial buildings that do not have either time clock operation • Any building with nigh setback control Table 2 shows the number of buildings that have normal heat load profiles, abnormal heat load profiles and unsuitable control strategies. It also shows the number of buildings that are excluded from the analysis during data preprocessing (section 3.3) because of the missing or faulty meter readings.
Our analysis reveals that only 36% of buildings connected to the two different district heating networks produce normal heat load behavior. 26% of the buildings have either abnormal heat load profiles or unsuitable control strategies, while the rest of them are excluded from the analysis. The high rate of the excluded buildings, 38%, exposes the low data quality in smart meter readings.
Approximately 7% of the buildings produce abnormal heat load profiles, and 19% of them have unsuitable control strategies (Figure 7). Industrial demands reach the highest rate of having abnormal profiles, with 14%. It is followed by public administration and commercial buildings where 10% of the buildings in these categories show significantly strange heat load behavior. Multi-dwelling, health-care / social services and schools however, remain below the average with 4%, 5% and 3% respectively.
Similarly, the rate of unsuitable control is also high in commercial buildings (32%) in comparison to other customer categories. It is followed by industrial demands and public administration buildings where the rate of unsuitable control is approx. 25%. The lower rate can be observed in both multi-dwelling and health-care social buildings which is around 13%.
Among all customer categories, the school category has the highest rate of normal behavior. Only 3% of the school buildings are identified with either abnormal heat load profiles or unsuitable control strategies.

Discussions
According to [21], multi-dwelling buildings are assumed to show relatively homogeneous behavior and are expected to exhibit COC. According to the results of this study, the majority of the multi-dwellings are assigned with COC. Yet, there are a considerable number of buildings identified with other control strategies. Multi-dwellings with time clock operation haven't been discovered in previous studies. The reason to have such behavior is that some multi-dwellings can be rented as restaurants or office buildings that have just daytime activities and have heat load patterns with time clock operation control of ventilation and low domestic hot water use.
Health and social services, as well as commercial buildings, are heterogeneous concerning heat demand behaviors. In the category of health and social services, there are buildings like hospitals that have 24h activity similar to multi-dwellings. On the other hand, there are also offices that only have time-clock operation controls. Commercial buildings also consist of different customers, some of them with 24h activity with domestic water use like hotels, and some customers with only daytime activity like trading companies, restaurants, amusement and recreational services. Even though control strategies of those two categories are much more diverse in comparison to multi-dwellings, both of them are still dominated by COC.
Public administration and school buildings are the only customer categories where COC does not have the highest rate. It is an expected outcome considering most of the buildings in these categories are municipal buildings with daytime activities only. The school buildings are strongly consistent where very few of them are not controlled with time clock operation controls. This rate is much higher for public administration buildings because some of the buildings in this category remain active for 24 hours such as service buildings for seniors.

Conclusions
In this work, we study the problem of automatically discovering heat load patterns in district heating networks. We argue for the need of a data-driven approach and present three contributions for analyzing heat load behaviors of the district heating customers. The first is a method that enables clustering buildings by preserving the shape similarities in their heat load profiles and extracting patterns summarizing the typical behavior in each group. The second is detecting buildings with abnormal heat load profile, i.e., those that look significantly different than their expected heat load patterns. The third one is identification of buildings with control strategies that are unsuitable for their customer category, based on visual inspection and domain experts validation of discovered heat load patterns.
We conduct a case study on two district heating networks in the south of Sweden. To the best of our knowledge, this is the first large-scale, comprehensive analysis of the heat load patterns which gives an insight into heat load behavior of the entire network. Our method captured fifteen common patterns among the heat load profiles of all buildings in our dataset. The analysis of those patterns revealed that there are other factors except the existing control strategies that impact the individual heat load behavior. Moreover, our study showed that buildings with different customer categories often behave quite similarly, while ones within the same category can behave very differently. Neither the current customer categories (in Sweden, at least) nor the existing control strategies are enough for the categorization of the buildings in district heating. We believe that our approach has a high potential to serve this purpose in practice since it is automatic, can discover knowledge which was previously not known, and can deal with large-scale data.