Large-scale monitoring of residential heat pump cycling using smart meter data

Heat pumps play an essential role in decarbonizing the building sector, but their electricity consumption can vary significantly across buildings. This variability is closely related to their cycling behavior (i


Introduction
Alongside increasing energy prices and ambitious targets to reduce greenhouse gas emissions, considerable attention is being paid to the energy needs of the building sector.The International Energy Agency (IEA) states that almost half of buildings' energy demand is used for space and water heating.The associated CO 2 emissions reached an alltime high of 2.5 Gt in 2021 [1].Consequently, a large-scale deployment of clean and efficient heating technologies is needed to reach emission targets.Electric HPs are a central pillar in this context because they can reduce greenhouse gas emissions when replacing natural gas or oil furnaces [2,3].The amount of savings depends on the primary energy for the electricity that is consumed by the HP [4].Accordingly, this favors countries with a high share of electricity generation from renewables, nuclear power and/or hydropower.The authors of [5] show that per TJ of heating energy, HPs in Belgium could save up to 47 t of CO 2 , whereas in Switzerland it could even be up to 61 t.Consequently, HPs have become a backbone of many policymakers' * Corresponding author.E-mail addresses: tbrudermuell@ethz.ch(T.Brudermueller), mkreft@ethz.ch(M.Kreft), efleisch@ethz.ch(E.Fleisch), thorsten.staake@uni-bamberg.de (T.Staake).carbon mitigation plans.As a result, the global stock of HPs has achieved an average annual growth rate of 10% over the past five years, sometimes with the help of subsidies.More than 190 million HPs are in operation worldwide [1], and the IEA estimates that 600 million HP installations are required by 2030 to cover 20% of buildings' heating needs [6].
In theory, HPs are highly efficient devices.In practice however, many HPs have a significantly higher electricity consumption and lower efficiency than stated by their manufacturer [3,[7][8][9][10].Hence, there is a large potential to optimize HPs in the field.The authors of [10] showed that half of the 297 Swiss households studied achieved average savings of 1,805 kWh (15.2%) per year after their HPs were optimized by an energy consultant.The large gap in HP performance has multiple reasons.First, the seasonal performance depends on weather and climate conditions [11][12][13].Second, non-optimal planning in the form of overor undersized HPs leads to inefficient operation [12,14,15].Third, heat https://doi.org/10.1016/j.apenergy.2023.121734Received 28 April 2023; Received in revised form 16 June 2023; Accepted 6 August 2023 pumps can show faulty or unwanted behavior [16,17].Lastly and most commonly, HPs can operate without faults but suffer from misconfigurations.A study estimates that 40% of modern heating, ventilation and air conditioning (HVAC) systems are misconfigured [18].Such misconfigurations include unsuitable heating curve settings, unfavorable cut-off temperatures and wrong bivalent temperatures, which are all closely linked to HP performance [19,20].
The consequences of inefficient HP operation are severe.As the energy consumption of an HP accounts for 90% of its carbon dioxide equivalent emissions [21], the emissions savings associated with HPs highly depend on their electricity consumption.In this context, a study [21] finds a 4% decrease of life cycle climate performance when the coefficient of performance of an HP is improved by 5%.Moreover, HPs are typically associated with high upfront costs, so operational costs play a critical role in making HPs financially attractive compared to fossil-based heating solutions such as gas boilers [22].High operational costs can create mistrust in the technology and can slow down the endeavors to decarbonize the heating sector.Lastly, the inefficiencies have implications for the grid.Already now, utilities need solutions to respond to the increased electricity demand from HPs [23,24].A study which simulates an HP penetration of 100% in Great Britain [25] found that this scenario could cause the national annual electricity demand to rise to 189 TWh, which corresponds to an increase of around 60%.Another UK-based study found that the peak grid demand increases by 14% if 20% of households use HPs [26] -a phenomenon that is amplified when many HPs operate outside of their optimal working conditions.
As a result, many manufacturers develop digital solutions for remotely monitoring the systems of their own customer base.However, these solutions still face several problems.First, existing products focus more on breakdown prevention and less on efficiency improvements [27].Second, there are no standards for cross-manufacturer data exchange [28], nor is there a unified service, limiting the ability of owners to monitor their HPs if their manufacturer does not offer a service.Moreover, most of the currently installed HPs do not have a network connection [29].Consequently, these HPs are not covered by digital service offerings even though they will remain in operation for years or decades.
However, a parallel development in recent years can serve as an important transitional technology.The deployment of smart electricity meters (SMs) could form the basis for more widespread efficiency services.A study reports that in 2019, 94.8 million SMs were installed in the United States, and in 2016, China already counted more than 350 million devices [30].Another recent report from 2022 also shows significant progress in smart meter deployment in Europe [31].Most EU countries are expected to achieve a deployment rate of at least 80% by 2025.Ten countries have already achieved this and are in the process of adopting second-generation devices.As a result, smart meter data (SMD) is becoming increasingly available at the household level, offering new opportunities for monitoring high-load devices in the grid, such as HPs [27].
In this context, our work is a first step towards a unified service for remotely monitoring HPs in residential buildings using SMD.We focus on evaluating HP cycling behavior (i.e., the frequency of switchon and switch-off operations) as it affects performance and mean time to failure, and is an indicator of improper sizing and non-optimal settings [32][33][34][35].Therefore, we analyze and describe the cycling of 503 HPs in the field in Switzerland over a 21-month period.To the best of our knowledge, this is the first study to monitor HP operation on such a large scale with SMD.To this end, our work particularly makes the following contributions: • We show that cycling behavior can be derived from common smart meter data.We present an algorithm that extracts key indicators from 15-minute resolution data available for the vast majority of smart electricity meter installations.
• We show that operational characteristics related to cycling allow for the detection of outliers with respect to energy consumption and (in)appropriate sizing of HPs.This is advantageous because it does not require contextual information.• We identify thresholds for atypical cycling behavior of HPs and describe behavioral differences with respect to building and HP characteristics.
The results are valuable to HP manufacturers and energy consultants, who can benefit from decision boundaries that allow them to identify malfunctioning equipment.The work is also relevant for companies developing remote services.The algorithms we propose can serve as a basis for monitoring HPs, even if they are connected not through the internet but through traditional SMs.This makes it possible to include most HPs in modern service offerings, since a large proportion of households is equipped with a SM.Finally, our work presents a promising use-case for utilities, which can use SMD to monitor HP activity and leverage their flexibility for demand response programs or targeted energy efficiency campaigns.A more detailed description of the benefits of our approach in terms of applicability to real-world use-cases is given in Section 5.3.
The remainder of this paper is structured as follows: First, we provide an overview of related work (Section 2).Afterwards, we introduce our data set (Section 3.1), cluster households by their building and HP characteristics (Section 3.2), and evaluate them in terms of energy efficiency and appropriate sizing (Section 3.4).Next, we derive information about each HP's individual cycling behavior (Sections 3.6 and 3.7) before including the effect of outdoor temperature (Section 3.8).This information is then used to detect HPs that cycle atypically (Section 3.9).In Sections 4 and 5, we report and discuss corresponding results considering different building and HP characteristics.Lastly, Section 6 provides a summary and conclusion.

Related work
In the following, an overview of existing work on HP cycling and HPs in the context of SMD is provided.

Heat pump cycling
The heat load of a building increases with decreasing outdoor temperatures.In order to ensure heating comfort, the heating curve defines increasing set-point temperatures as outdoor temperatures decrease [36].Usually, the heating control monitors the return water temperature to adjust it to the set-point temperature [33].Therefore, HPs show heating activity while the return temperature is lower than the set-point temperature.This leads to regular switch-on and switchoff events and thus, a cycling behavior.In the following, we refer to the time between consecutive switch-on and switch-off events as a cycle.
Power consumption during standby and off phases is negligible [37].Instead, a significant amount of energy is consumed during heating cycles.Therefore, a few studies have investigated the relationship between heating cycles and energy consumption and found a strong correlation between short cycles and energy losses [32][33][34].The study [32] puts these losses at 5%-30%, while [14] reports 12%.The reason for the cycling losses is that during the start-up phase of HPs, the delivered heating power is reduced until a steady state is reached [33].This time interval depends on the characteristics of an HP.For example, in the study of [32], the HP reaches 90% of the steady-state value after 3 min, which the authors consider relatively short.The same study [32] concludes that a minimum time of 15 min between two consecutive cycles should be ensured.The work of [38] comes to a similar conclusion with a minimum run time of 20 min per cycle.Since a recurring cause of short cycles is inefficient defrosting schedules [35], short cycles occur more frequently in air source heat pumps (ASHPs) than in ground source heat pumps (GSHPs) [34].Moreover, frequent cycles reduce the lifetime of HPs [39].However, it is not only startup losses that reduce performance; long run times [32], which often relate to weather, do as well.As outdoor temperatures decrease, the thermal output of an HP system also decreases, and consequently the run time of the HP increases to compensate for this [34].It can be concluded that neither short nor long cycles are desirable and that run time monitoring and optimization can improve the overall system performance [40,41].Further, an evaluation of cycling behavior must take outdoor temperatures into account.
In addition to outdoor temperatures, other factors influence the cycle length and number of cycles; for example, HP sizing, as described in [42,43].On the one hand, an undersized HP may run almost continuously, sometimes even alongside conventional water heaters, likely offsetting or even reversing the energy savings associated with the smaller HP size.On the other hand, oversized HPs tend to cycle on and off frequently, resulting in shorter cycles and negatively affecting performance.Another factor of influence is the type of HP installed in terms of its modulation capacity [44].In the work of [33], a 12% drop in performance is observed when a fixed speed HP is used compared to a variable speed HP.A fixed speed HP can only rotate the compressor at a single speed and tends to have more and shorter cycles [29].In contrast, a variable speed HP can modulate compressor speed to regulate for heat demand, which typically results in higher total run times but fewer on-off transients [29].Finally, the installation of a buffer tank can also increase the length of a single cycle and decrease the number of cycles [34].
The studies presented give a good indication of the fundamental importance of HP cycling for energy efficiency.However, they all have in common that they analyze only a few individual HPs under laboratory conditions or in test houses and up to now it has been unclear how HPs operate in different homes.This is a fundamental difference to our study because we analyze 503 HPs in the field to investigate commonalities and differences in terms of cycling behavior.Additionally, none of the existing studies explains which cycling behaviors are typical or atypical for HPs, whereas we identify HPs that cycle atypically.Moreover, our work differs from previous studies in terms of the type of data used.Existing work uses HP sensor data and known contextual information because both are available for test houses or in laboratory environments.However, for real-world applications, methods are needed that can deal with missing contextual information or use publicly available data sources.Therefore, we derive context using a national building register database and a geographic information system.While this information is useful for evaluating energy efficiency and appropriate sizing of an HP, our method to evaluate the HP cycling behavior does not require this contextual information.Further, instead of sensor data, we use smart meter data because it is commonly available, can cover HPs that are not connected to the internet, and is independent of the HP manufacturer, making it suitable for wide-spread energy efficiency services.In this context, below, we briefly review existing related work at the intersection of SMD and HPs.

Heat pumps and smart meter data
The presence of HP systems in residential buildings can be predicted by combining SMD with weather data [45,46].While [45] uses data at 15-minute resolution and [46] at daily resolution, both have in common that they extract features from time series to use as input for classification algorithms.A similar approach is used in [29], where variable and fixed speed heat pumps are distinguished using SMD.An extension of HP detection with SMD is non-intrusive load monitoring, also known as load disaggregation.In the context of heating, the goal is to isolate heating-related patterns from the SMD when the heating system is measured along with other appliances.Since this area is reviewed in [47][48][49] in general and in [50] with a particular focus on HVAC systems, we do not discuss it in detail here.However, we note that only a few works disaggregate HP patterns from low resolution energy data (e.g., 15 min as in our case).Most studies either use data with higher frequencies and dimensions [51][52][53][54][55], or they focus on other devices with rather constant power consumption such as electric resistance heaters [56][57][58] or electric water heaters [59][60][61].However, two studies extract HP patterns from low-resolution SMD using clustering algorithms [62,63].The first work [62] uses SMD with a resolution of 5 minutes and clusters dominant on and off activities found by a peak detection algorithm.This also means that the proposed solution cannot cope with variable speed HPs, where power consumption modulates in small steps proportional to compressor speed [29], and the study lacks an evaluation using SMD with a more common resolution of 15 min.The second work [63] uses hourly data, but does not aim to isolate and obtain a complete HP pattern.Instead, an HP pattern is automatically divided into flexible and non-flexible loads.What both studies have in common is that they are ultimately used to estimate flexibility as part of demand response programs.Other studies also look at HP scheduling in this context [64][65][66][67][68].For example, a study from the Netherlands investigates the flexibility of HP schedules to reduce operating costs by using dynamic tariffs [69] and another work uses SMD and Wi-Fi data to detect occupancy and propose HVAC schedules in commercial buildings [70].
In summary, previous work on HPs and SMD has focused on detecting HP installations, isolating their patterns from comparably high frequency measurements, or estimating flexibility for demand response programs.There is a lack of work monitoring active HPs in operation with SMD.However, more importantly there is no study to identify HPs which operate energy inefficiently, are inappropriately sized, or cycle atypically using SMD, which is addressed in this study.

Methods
In this section, the data set is described, the methods for evaluating the HP cycling behavior are explained, and the baseline models serving as ground truth with respect to HP energy efficiency and appropriate sizing are derived.

The data set
The data set includes 503 single-family houses in Switzerland with an observation period from January 2021 to September 2022 (21 months).Each household has an HP for heating purposes, but no photovoltaic system.
Smart Meter Data: For each household, we use the measurements of HP electricity consumption in kWh in 15-minute resolution.The HP is measured separately from other appliances.Each SMD time series differs in terms of start and end dates and missing data.The average data availability per household over the entire period is 61%.Each HP has an average of 395 days of data without outages.
Temperature Data: To enrich the SMD with data on average outdoor temperature in daily resolution, we use the address of each household and query a paid weather service.It individually finds the nearest weather station and returns the temperature measurements (in • C) as integers without decimal places, e.g.  = 5 • C.
Meta Data: For 342 households (i.e., 68%), the installed electric power of the HPs in kW is known through the utility company.Additionally, we use the address of each household to look up the building year and heated floor area from the Swiss building register [71].A study on behalf of the Swiss Federal Office of Energy [72] suggests applying a correction factor of 1.15 to the heated floor area and treating floor areas outside the range of 70-400m 2 as outliers.We follow this advice and treat the outlier households as if the information was not available.Accordingly, we do the same in terms of the building year with buildings that were built before 1700.Additionally, for some households, no entry is found in the building register, or the corresponding fields are not filled.In total, this leads to the heated floor area being available for 393 households (i.e., 78%) with a median of with a median of 2002.Lastly, we again use each household's address to extract the existence of a drilling profile through a publicly available geographic information system [73].If an entry is found, we define a HP with a drilling profile as ground source and without as air source heat pump.For 190 households (i.e., 38%), no entry is found.We mark the corresponding HPs to be of unknown type.The other 313 (i.e., 62%) are composed of 125 ground source (i.e., 40%), and 188 air source heat pumps (i.e., 60%).

Categorizing households by characteristics
Next, we form groups of similar households to describe and compare the behavior of their HPs in a later step.For this purpose, we use the contextual information of each household as derived earlier: the building year, the heated floor area, and the installed electric HP power (see the previous section on the meta data of our data set).If this information is unknown for a household, we immediately assign the household to a cluster named unknown.For all other households, we apply K-Means clustering in a one-dimensional manner to each variable individually and set the cluster size to 3 (initialization of K-parameter).The reasons are that the clusters become easily interpretable by their cluster centers and that they can be combined in a flexible way.Hence, a household is clustered multiple times but only once for each variable.The clustered heated floor area refers to the size of the house (small, medium, big, unknown).Similarly, the installed electric HP power serves as metric for the size of HP installation (small, medium, big, unknown), and the building year as metric for the building age (old, medium, new, unknown).Additionally, since the HP type is categorical, we interpret it as already being clustered (air source, ground source, unknown).Table 1 shows the results of this process and provides the final characteristics of each cluster in terms of cluster size, and minimum, mean, median, and maximum of the observations within a cluster.

Daily observations and heating applications
In what follows, the behavior of HPs is analyzed in daily observations for several reasons: First, by choosing a daily resolution for the extraction of cycling metrics, they can be matched to the weather variables that are available in daily resolution.Second, daily behavior is easier to interpret; for example, behavior on a cold winter day can be compared to behavior on a spring day with mild temperatures.Third, smart meters in our setting report the measurements of an entire day all at once.Hence, if data is missing, it is missing for an entire day and there are no gaps in SMD within a day.Fourth, heating systems have some inertia and do not respond immediately to changes in outdoor temperature because insulation keeps heat in the building.Therefore, interpreting the cycling response with a resolution of a few minutes or hours is not useful without detailed information about the underlying exergy models of individual buildings.
In addition, this work focuses on HPs in heating mode, but not on cooling or hybrid applications.We limit our descriptions and analyses to identify conspicuous HPs on days with average outdoor temperatures of 0-12 • C. We choose the upper limit of 12 • C because according to the Swiss standard, days with an average outdoor temperature below this value are heating days [74].Accordingly, we choose the limit of 0 • C because below this value, an HP may not operate in monovalent mode but may be supported by an electric auxiliary heater [19].In addition, days with average temperatures below 0 • C are rare in central Europe, so incorrect conclusions may be drawn from distributions with few observations at cold temperatures.

Evaluating energy efficiency and appropriate sizing of heat pumps
To evaluate the energy efficiency and the appropriate sizing of each HP, we use the corresponding distribution of daily energy consumption and contextual information on heated floor area and electrical power.

Normalizing daily energy by degree day:
Typically, an HP's energy consumption increases with decreasing outdoor temperatures.This means that the distribution of daily energy depends on temperature.To make the HPs comparable without weather effects, we use a common approach that normalizes the daily energy sums by a simple division by degree day, as done for example in [75].We use a base temperature   of 20 • C, which is the Swiss norm value of an artificial indoor temperature setting [74].Then we compute the degree day (in • C) from the average outdoor temperature   as follows:

Calculating energy intensity as a measure of energy efficiency
A common approach to assess the energy efficiency of residential buildings is to calculate energy intensity [75].This approach assumes that a building's energy consumption is proportional to its floor size.Therefore, for each HP, we divide the daily energy values, already normalized by degree day (Eq.( 1)), by the heated floor area.The median is then calculated from the resulting distribution.For better readability, we refer to this single value in the following only as energy intensity (in kWh∕m 2 ∕ • C).It can be viewed as a measure of energy efficiency that makes the HPs comparable in energy consumption regardless of building size and temperature.An HP with high energy intensity can be considered less efficient than an HP with low energy intensity.

Calculating utilization as a measure of appropriate sizing
To assess whether an HP is appropriately sized, we calculate its daily utilization   (in %) using the daily energy   (in kWh) and the installed electrical heat pump power   (in kW) as follows: Again, we divide the daily utilization values by degree day values (Eq.( 1)) to eliminate the effects of outdoor temperature and then calculate the median daily utilization of each HP from the corresponding distribution.In what follows, we refer to this value only as utilization (in %∕ • C) and consider it a measure of appropriate sizing that is independent of weather and HP size.An HP with very low utilization can be considered rather oversized, while an HP with very high utilization can be considered rather undersized.For better understanding, in the first graph the energy measurements (in kWh) are converted into power (in kW) by multiplying them by four.

The role of contextual information
The evaluation of energy efficiency and appropriate sizing of HPs is possible with SMD and contextual information.However, usually such required information (e.g., about the heated floor area and electrical power of each HP) is not known or difficult to obtain [76].In this study, the address of each household is used to extract these parameters from the Swiss building register.Nevertheless, even in this advantageous scenario, the heated floor area is unknown for about 22% of households, and the electrical power for about 32% of HPs.An even more common scenario is that due to privacy constraints, the address is either unknown or cannot be used, making the use of national building register databases impossible [77].Also, most countries do not offer a well-maintained building database like Switzerland.Therefore, we develop a solution to assess whether an HP requires special monitoring or exhibits conspicuous behavior by relying solely on SMD at common 15minute resolution and temperature data at daily resolution measured by a local weather station nearby.Not only the SMD but also the weather data is usually available to utilities because the city or region from which the SMD originates is known.The cycling behavior of an HP is a meaningful indicator of whether it is energy efficient and appropriately sized (see Section 2.1).Therefore, in the following, we develop a novel method to extract HP cycles from SMD, derive key indicators to describe the cycling behavior, and consequentially identify HPs that cycle atypically.In this way, our method is independent of additional contextual information and therefore suitable for large-scale real-world applications.

Extracting heat pump cycles
In the following, we explain how we extract heating cycles of the individual HPs from SMD.For better readability, we only briefly describe each step in this section and provide detailed descriptions of the algorithms in the appendix (Appendix A.1). Fig. 1 serves as supporting graphic, which provides an example of the different steps.
Step 1 -Estimating Baseload: Even in standby mode, an HP has an energy consumption that is slightly above zero [29].The control unit Step 2 -Deriving Activity States: We assume that any energy consumption above the estimated baseload is due to the HP performing a cycle.Therefore, we compare the energy consumption at each timestamp to this threshold to derive binary activity states (on-state: HP performs a cycle; off-state: HP is in standby or off).(Algorithm in Appendix A.1.2).
Step 3 -Determining On-and Off-Transients: Next, we determine the changes in activity states to find the time points associated with an on-transient and an off-transient of each cycle.It is possible that at certain times the energy consumption exceeds the threshold, but the previous and subsequent measurements do not.In this case, we mark the corresponding timestamp as both an on-transient and an offtransient and assume that it is caused by a cycle shorter than the 15-minute measurement interval.(Algorithm in Appendix A. 1.3).
Step 4 -Calculating Durations of Individual Cycles: From the activity states and transients, we calculate the duration of each cycle in hours.An HP can turn on or off at any time within the 15-minute measurement interval.Therefore, the measured energy of the switch-on and switch-off operations is usually lower than during the cycles [29].We assume a constant electricity uptake between an on-transient and the subsequent measurement, or an off-transient and the previous measurement.Therefore, by accounting for the difference between the observed energy measurements, we can calculate the exact times within a 15measurement interval that the HP turns on or off.While this assumption holds for fixed-speed HPs, it may be incorrect for variable-speed HPs because they can modulate compressor speed from one measurement to the next.However, we argue that the estimates using our method better reflect the actual cycle durations, since otherwise each on-and off-transient would be counted as a full 15 min, resulting in large overestimates.For better readability, we report all cycle durations in hours rounded to two decimals.(Algorithm in Appendix A.1.4).

Calculating daily cycling metrics
From the procedure described in the previous section, for each HP, a list is obtained, indicating the start and end time and duration of each cycle.Now, we convert this information into aggregated metrics that describe individual days and serve as key indicators for cycling behavior (recall that in Section 3.3, it is explained why a daily resolution is used).The metrics calculated per day are operating hours, number of cycles, ratio of number of cycles to operating hours, and average cycle length.Table 2 provides a description of these metrics, and the algorithm in Appendix A.2 of the appendix formalizes the calculation.

Deriving temperature curves
For days with the same average outdoor temperatures, a single HP shows similar but slightly varying cycling behavior.Therefore, the daily cycling metrics can be interpreted as distributions that depend on the average outdoor temperature.Below it is explained how to handle this, while Fig. 2 serves as a supporting graph.
First, for each HP and metric, we group the daily observations by mean daily outdoor temperature to derive the distributions.We then reduce the dimensionality and the effects of outliers by calculating the median of each distribution.In this way, one value per temperature and per metric is obtained.When grouping the medians by each metric and then sorting them in descending order by outdoor temperature, we obtain what we refer to as a temperature curve.Note that the analyses of this paper focus on 0-12 • C temperature range (as previously explained in Section 3.3), where an almost linear behavior can be observed (see descriptions in Section 5.2).Therefore, for each metric, we compute a linear regression on the medians between 0-12 • C. In this way, the cycling behavior of HPs becomes comparable through the slopes and intercepts of the linear approximations and independent of the observation period.

Detecting conspicuous heat pumps
We hypothesize that atypical cycling is a good indicator of whether an HP requires special monitoring or system optimization.Therefore, we determine which systems cycle typically and which cycle atypically by examining each metric individually and interpreting the slopes and intercepts of the linear regressions as bivariate distributions which span two-dimensional feature spaces.This allows us to apply an outlier detection method and interpret a system within the normal range as typical and an outlier system as atypical.We determine outliers in an unsupervised manner using the local outlier factor (LOF) [78].LOF considers the local density of each sample and compares it to the local density of its nearest neighbors.Then, it computes a score for how isolated the object is in the local neighborhood and identifies samples with much lower density as outliers.We use the implementation in [79], which relies on the k-nearest neighbor algorithm and the Minkowski distance to calculate the local density.We also use the default setting of [79] that considers 20 neighboring data points and algorithmically determines the number of outliers.However, note that the parameters can be set to classify a predefined percentage of data points as outliers, making the approach suitable for household selection in campaigns.The fact that an outlier score is returned for each data point also makes it possible to sort the HPs according to how typical or atypical they behave with respect to a single metric.
We apply LOF individually to each cycling behavior metric and classify an HP as conspicuous if it is determined to be an outlier by any of the cycling metrics.This is a rather sensitive approach that does not allow HPs to deviate in any parameter.To be less sensitive, a minimum number of parameters could be introduced that an HP must violate to be considered atypical.In any case, for each HP it can be listed which metrics cause it to be considered conspicuous.In addition, for each metric, we calculate the means and standard deviations of the slopes and intercepts of the identified inlier points.Based on the position of an outlier point relative to the ranges of mean ± standard deviation, additional clues can be provided as to what makes the system an outlier.Hence, it can be inferred whether the slope, the intercept, or both are atypical (too high or too low, or rather okay).Fig. 3 shows an example of the method when applied to the ratio of cycles to operating hours.Each point in the figure refers to a single HP.In the first plot, the radius of a circle represents the outlier value, i.e., the larger the circle, the more isolated the point.The color encodes whether an HP has been determined to be an outlier and thus conspicuous.The second graph shows only the identified outliers and indicates whether the slope or intercept is too high compared to the inlier points (indicated by the corresponding range of mean ± standard deviation).The third graph shows a two-dimensional kernel density estimate of outlier and inlier points for further insight.

Results
In the following, the methods proposed in the previous section are applied to the data set presented in Section 3.1.We compare the results of our outlier detection based on cycling behavior (Section 3.9) with the ground truth obtained from the baseline models (Section 3.4) that evaluate energy efficiency and appropriate sizing of HPs.Because the baseline models require contextual information on heated floor area and electrical power, the assessments in this chapter are based on a subset of 243 HPs for which both parameters are available.For all of these HPs, observations are available for at least 6 different average daily outdoor temperatures of the 13 possible values in the 0-12 • C temperature range.

Evaluating linear fit of temperature curves
To detect HPs that cycle atypically, we use the slopes and intercepts of the linear regressions on temperature curves (see Section 3.8).Therefore, we first assess the fit of these regressions by examining their residuals-versus-fitted plots and calculating the  2 value of each regression.The descriptive statistics of the resulting  2 distributions for each metric are reported in Table 3.All medians of the  2 distributions range from 0.73 to 0.96, which we interpret as an indication that all cycling metrics can be well approximated linearly in the 0-12 • C temperature range.The best approximation is obtained for the hours of operation and the worst for the number of cycles.

Identified conspicuous heat pumps
The algorithm automatically finds a total of 41 conspicuous HPs, which corresponds to 16.9% of the 243 HPs evaluated.Note that the algorithm allows a system to be classified as an outlier with respect to several key indicators simultaneously.In 12 of the 41 cases (i.e., 29.3%), an outlier system is classified as such by more than Fig. 3. Example of determining HPs that cycle atypically by applying outlier detection to the ratio of cycles to hours of operation.We apply this procedure to each cycling metric individually and classify an HP as conspicuous if it is an outlier by any metric.

Table 4
The 41 conspicuous HPs by clusters.The percentages represent how many systems of each population have been identified to cycle atypically.one key indicator.For average cycle length, 26 outliers are identified, followed by the ratio of number of cycles to operating hours (17 outliers), hours of operation (9 outliers), and number of cycles (5 outliers).Table 4 lists the clusters to which the identified systems belong.To account for the different cluster sizes, not only the absolute numbers are reported but also the percentage of conspicuous HPs in each population.
It can be noticed that the proportion of identified outliers that are ASHPs is more than twice the proportion of GSHPs (24.0% vs. 9.2%).
In addition, the proportion of identified systems installed in middleaged buildings is significantly higher than in buildings of other ages (medium: 27.0%; new: 14.7%; old: 14.3%).In Fig. 4, examples of the original SMD of noticeable outlier and inlier HPs by operating hours and number of cycles are shown.The figure shows heat maps, with rows representing time of day and columns representing dates.This improves the ability to observe cycling patterns over time.The visualizations show that our algorithms are able to distinguish HPs with short cycles and very frequent on-offs from HPs with longer and healthier cycles.

Evaluating performance
Our method identifies atypical HPs by analyzing cycling behavior without requiring contextual information.To evaluate performance, we compare our results to baseline models that assess energy efficiency and appropriate sizing of HPs but require contextual information (Section 3.4).To this end, we treat the problem as a binary classification with binary labels indicating whether an HP is salient (label: 1; minority class) or not (label: 0; majority class).Consequently, we label the 41 HPs that are conspicuous for cycling behavior as salient and the other 202 HPs as not salient.At this point, note that the LOF-based outlier detection (as introduced in Section 3.9) can automatically select HPs with high or low activity as long as they behave atypically.Therefore, for ground truth, we treat the best and worst 10% of HPs in terms of utilization and energy intensity as atypical to ensure a fair comparison with the baseline models.Fig. 5 shows the resulting confusion matrices with absolute and relative scores.Table 5 shows the corresponding performance scores typically used for classification tasks.In addition, the table includes the Cohen's Kappa statistic, which is a measure of agreement between two annotators, rather than comparing a classifier's predictions to the ground truth values.
According to [80], the Cohen's Kappas of 0.40 for energy intensity and cycling behavior and 0.37 for utilization and cycling behavior indicate a fair agreement in both cases.In a machine learning setting, the high accuracy values of the model (0.80 and 0.81) combined with the rather low F1-scores (0.46 and 0.48) could be interpreted as an insufficient fit to an unbalanced data set.However, in this case, a more complex investigation and interpretation is required.The model does not necessarily fail to find conspicuous HPs that require special monitoring.Instead, it focuses on a different aspect of optimization.By identifying HPs that are conspicuous in cycling, it finds a proportion of HPs that are atypical in energy intensity or utilization, but some HPs can also cycle atypically without being inefficient in terms of energy use or inappropriately sized.In this case, cycling behavior could still have an impact on the lifetime of the entire device or its components, and improvements in operation could still be possible.Therefore, we consider special monitoring of HPs that are conspicuous by their cycling behavior to be appropriate in any case.

Discussion
In the following sections, additional descriptive analyses are provided as well as insights into factors that influence cycling.Further, limitations and future work are listed.Unlike the evaluation in the previous chapter, in this chapter all analyses are conducted for all 503 HPs rather than for just a subset of HPs.

Analyzing correlations
To investigate how the cycling metrics relate to each other, to temperature, to utilization, and to energy intensity, we calculate the correlations of the daily measures across all households.Fig. 6 shows the resulting correlation matrix.
We find that all measures of cycling are correlated with outdoor temperature, most strongly with hours of operation with a negative correlation of −0.60, followed by number of cycles (−0.43).In contrast, utilization and energy intensity show little correlation with temperature.This confirms that the normalization by degree day introduced  in Section 3.4 has the desired effect of removing weather dependency.However, utilization and energy intensity are strongly correlated, with a factor of 0.78.In contrast, the metrics for cycling behavior are relatively independent of utilization and energy intensity, as reflected in the low correlation factors.It could be that a combination of these metrics (as used in the outlier detection of our approach) has a stronger relationship with the energy intensity and utilization.However, this may not be well reflected here but supports the interpretation of the performance values reported in Section 4.3, where it has been concluded that HPs can also cycle atypically without a direct link to energy efficiency and sizing.Finally, operating hours and number of cycles are positively correlated (0.25) and so are average cycle length and operating hours (0.60).

Analyzing temperature curves
In Section 3.2, clusters of HPs are formed.Here, we use these to describe cycling behavior that is typical for each cluster.Since the features are static and categorical, correlations with daily cycling metrics cannot be computed as done previously.Therefore, we derive the temperature curves (see Section 3.8), but with different observations.Instead of calculating the medians for each HP separately, we calculate them again using all observations of the HPs that are part of the same cluster.This way, the plots in Fig. 7 are obtained, where each point reflects a In what follows, we explain our observations of typical cycling in all populations using this visualization.We restrict the descriptions to the temperature range of 0-12 • C (see Section 3.3).Note that also daily energy intensity and utilization from the baseline models (Section 3.4) are but without normalization by degree day, as this would eliminate the temperature effect, which is of interest at this point.

Typical pump cycling
From Fig. 7, it can be seen that most of the measured variables behave almost linearly in the 0-12 • C temperature range.Only the number of cycles shows saturation effects, as the behavior is linear between 5-12 • C and almost constant between 0-5 • C.This also explains why in Table 3, which evaluates the of the linear regressions, the number of has the smallest  2 values.The hours of operation increase with decreasing outdoor temperatures (about 4-7 h at 12 • C to Fig. 7. Visualization of HP cycling behavior, energy intensity and utilization of all 503 HPs over mean daily outdoor temperatures by different building and HP characteristics.The points represent the medians of all observations across each population, while the error bars are at 95% intervals.To not remove the temperature effect in the graph, the daily utilization and energy intensity values are not normalized by degree day as otherwise done in Section 3.4. 13-17 h at 0 • C).The number of cycles also increases, but to a lesser extent (from about 5 cycles at 12 • C to about 12 cycles at 0 • C).Also, the average cycle length increases with decreasing outdoor temperatures, while the ratio of cycles to operating hours behaves antiproportionally and decreases.As a rule of thumb, it can be deduced that for days with an average outdoor temperature of 12 • C, the typical ratio of cycles to operating hours of an HP is about 1.7 to 2.0 and decreases linearly until it is 0.8 to 1.2 at an average temperature of 0 • C. Finally, the utilization also increases linearly with decreasing outdoor temperatures, starting at about 20% at 12 • C. A similar behavior can be observed for energy intensity.

Influence of characteristics on cycling
Although in Fig. 7 the HPs behave very similarly across all populations, we describe below some notable differences we find between clusters.
Building Age: For almost all metrics, HPs behave similarly in buildings of different ages.However, since HPs are a technology mainly suited for well-insulated buildings, the clusters are quite unevenly populated with 309 new and 21 old buildings (see Table 1).We observe that energy intensity (i.e., total daily energy per heated floor area) behaves as we would expect it.In general, newer buildings are better insulated, resulting in lower energy consumption.In this context, newer buildings have up to half the energy intensity than older buildings.The fact that this factor is not more extreme is most likely due to the fact that a building must have a certain level of insulation for an HP to even be considered as a heating technology.We therefore assume that a comparison of the cycling behavior of different HPs is possible quite independently of the building age.
Building Size: In terms of building size, the diagram shows that HPs in smaller buildings tend to have more cycles with fewer hours of operation at the expense of higher energy intensity.One possible reason for this behavior could be that smaller buildings tend to have oversized HPs, which typically have more but shorter cycles, as described in Section 2.1.This hypothesis cannot be proven by looking at the utilization, which shows almost no difference between buildings of different sizes.Another reason could be that larger buildings may use other additional energy sources for heating or domestic hot water production.However, also this hypothesis cannot be proven with our data set.
Heat Pump Type: Fig. 7 shows that ASHPs have more operating hours, fewer cycles, and a slightly longer average cycle length than GSHPs.It is known that ASHPs are less energy efficient than GSHPs [43].Therefore, we conclude that ASHPs must operate more to compensate for the lower heat dissipation.However, we might have expected more cycles, since defrosting is required.Instead, the number of cycles of GSHPs below 5 • C even exceeds that observed for ASHPs.Nevertheless, the known overall efficiency is reflected in the energy intensity curves.The curve of GSHPs is below the curve of ASHPs.
Heat Pump Size: Larger HPs have more cycles with a shorter average cycle length compared to smaller HPs.As explained earlier in the context of building size, larger HPs may be slightly oversized, resulting in more and shorter cycles.Therefore, the striking difference in utilization is as expected.Small heat pumps are up to four times more utilized than large ones, especially at cold temperatures.

Contributions and benefits of our approach
The methods presented here are focused on practical applications to monitor real-world HPs in operation.We have demonstrated how common smart meter data in 15-min resolution can be used to derive key indicators about HP cycling and how HPs that cycle atypically can be identified.Additionally, we have shown that HPs which are conspicuous in terms of cycling behavior are often also outliers with respect to energy efficiency and appropriate sizing.However, while evaluating a heat pump's energy efficiency and sizing requires contextual information, its cycling behavior can serve as indicator when the context is unknown.This makes our approach suitable for many applications in practice.Further, we designed our methods in such a way that they are robust to common difficulties in practice as listed below.
1. Data Availability: The approach takes advantage of the large amounts of smart meter data that become available through advanced metering infrastructure for residential buildings of all types.In particular, it also covers HPs without network connectivity.Our method is designed to be able to deal with missing data and different observation periods of different households.It only requires measurements of a few cold days.2. Comprehensive View: By identifying heat pumps that are outliers in terms of energy efficiency, appropriate sizing, and cycling behavior, and comparing them against each other in these categories, the approach incorporates a comprehensive perspective on HP optimization.

Contextual Information:
To properly evaluate performance and sizing of HPs, contextual information on the heated floor area and electrical power is required.Usually, this relevant additional information is unavailable.In this case, we show that the cycling behavior of an HP alone can serve as a stand-alone indicator to identify and prioritize critical systems.Therefore, it is also suitable for situations where the context is not known, e.g., due to privacy constraints.4. Interpretability: The information derived from SMD (e.g., cycles per day) is easy to understand and can provide accurate feedback on HP behavior.5. Benchmarking: Our method makes it easy to sort, compare and benchmark HPs by the metrics of interest.This makes it a versatile approach suitable for any type of campaign.6. Expandability: The algorithm can be easily adapted by introducing additional key indicators of interest and treating them in the same way as the metrics already introduced.Furthermore, it can be easily adapted to handle other types of data, e.g., sensor data or different resolutions.

Limitations
This study is a first step towards investigating real heat pump cycling on a large scale.However, below we list some limitations.First, the analyses only cover single-family homes in Switzerland that are not equipped with a photovoltaic system, and the results could be different in other geographic regions or for other types of buildings.Second, this study focuses on HPs used for heating and does not further examine whether the HPs under investigation are used in hybrid applications, i.e., also for cooling.Third, it is not distinguished whether or not an HP is responsible for the production of domestic hot water, since this information is unavailable.Finally, the 15-minute resolution limits the scope of what can be observed.For example, two cycles that are shorter than 15 min but run consecutively would be observed as a single cycle.However, given these limitations, this study shows that SMD is a powerful source of data for monitoring HPs, especially given its already high and increasing availability.It may be impossible to obtain detailed information on cycling, and there may be errors in individual cycles, but since these problems are shared by the entire population, the overall trends of individual households can still be observed.These can provide valuable feedback to HP owners, utilities, and other stakeholders.

Future work
The results of this work are based on SMD, which measures HPs separately from other devices.Therefore, the methods presented can be extended to aggregate measurements, which requires an additional step of disaggregating the HP load from the total load.On the other hand, future work on non-intrusive load monitoring focusing on HP applications may also incorporate the typical HP behavior that has been identified in this study as prior information.Additionally, it remains an open task to investigate the impact of daily temperature range on HP cycling.We use only the average outdoor temperature of a day, but do not consider the difference between the maximum and minimum temperature of a day.However, the most important remaining part of the work to be addressed by the research community is to investigate the reasons for differences in cycling behavior.There is a lack of work explaining under which conditions short and long cycling occurs and how it can be optimized.Here, we see a limitation in the use of SMD, but believe that sensor data with higher resolution and more parameters can help.Additionally, more contextual information about the underlying buildings and system components is needed.

Conclusion
In this study, we aim to contribute to a longer lifetime and higher energy efficiency of heat pumps (HPs) in residential buildings.To this end, we use smart meter data (SMD) with a resolution of 15 min to monitor 503 HPs installed in Swiss single-family homes over a period of 21 months.We show how heating cycles can be extracted from the corresponding time series of energy measurements.Then, we use the identified cycles to calculate the following daily key indicators for each HP: operating hours, number of cycles, ratio of cycles to operating hours, and average cycle length.When grouping the daily indicators of each HP by average daily outdoor temperatures and calculating the median for each temperature, it can be found that the behavior is nearly linear in the temperature range of 0-12 • C. Therefore, we calculate a linear regression for each HP and indicator.Using the derived slopes and intercepts as inputs to an outlier detection algorithm, the algorithm identifies conspicuous HPs that behave atypically and differently than the overall population.In addition to this approach, which does not require contextual information, we also evaluate the energy efficiency and appropriate size of each HP.However, this requires additional information on heated floor area and electrical power, which is typically unavailable and here, is only available for 243 of the 503 HPs (i.e., 48%).To evaluate the approach, we apply it to this subset and compare the HPs that cycle atypically to the best and worst 10% of the HPs in terms of energy intensity and utilization.The resulting Cohen's Kappas of 0.40 and 0.37, a common measure to assess interrater reliability, indicate a fair agreement between cycling, energy intensity, and utilization [80].Therefore, the approach to assess cycling is also suitable for identifying conspicuous HPs in situations where the context is unknown, e.g., due to privacy constraints or missing data.In addition to identifying conspicuous HPs, extensive descriptive analyses of cycling across all 503 HPs are provided considering building age, building size, HP type, and HP size.While this work is the first to monitor real-world HP cycling with SMD on a large scale, our analyses are limited to the geographic conditions and heating scenarios of Switzerland.Future work could focus on using HP sensor data with higher resolution and with more parameters to verify the results and investigate the reasons for differences in cycling behavior.

Declaration of competing interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Tobias Brudermueller reports financial support was provided by Swiss Federal Office of Energy.

A.1.4. Step 4: Calculating durations of individual cycles
In the following, we treat  on and  off as ordered sets with ascending values, where each element receives an index  according to position.Hence, with  = | on | = | off | elements in each set, we can rewrite: on = { on  | 0 ≤  < } (10)  off = { off  | 0 ≤  < } (11) Now, we can interpret  as index referring to a single cycle and  as the total number of observed cycles.We define a single cycle to have a switch-on timestep  on  , switch-off timestep  off  and a cycle duration in hours   .Assuming that a cycle covers all  within the interval [ on  ,  off  ], we can write the set of cycles  cycle as follows: The easiest way to define   would be to count each  within the interval [ on  ,  off  ] as full 15 min, formalized as: However, as explained in Section 3.6, this would lead to overestimates because an HP can switch on or off at any time within a 15-minute interval.Therefore, we assume a constant electricity uptake between an on-transient and the subsequent measurement, or an off-transient and the previous measurement to calculate the exact durations at  on  and  off  .We count all other measurements of a cycle  on  <  <  off  as full 15 min.Hence, instead of using Eq. ( 13), we define   to be ''more exact" as: Note that the definitions above handle some special cases where calculating exact durations is difficult: equals  off  , a cycle is shorter than 15 min and the exact duration cannot be determined.Then, we count the cycle duration as half a 15 min interval, i.e. 0.125 h.
• If an on-transient is immediately followed by an off-transient, the cycle duration is between 15 and 30 min and cannot be well estimated.However, we treat this case as any other cycle with longer durations by considering the proportions of the two consecutive energy measurements.• If the energy of the on-transient is higher than the energy of the consecutive measurement, or the energy of the off-transient is higher and the energy of the previous measurement, we cannot determine the exact duration of a switch-on or switch-off activity.
In this case, we count it as full 15 min.

A.2. Calculating daily metrics as key indicators
In Section 3.7, we explain that we calculate daily metrics to match them with average outdoor temperatures.To this end, we apply step 1-4 of the previous procedure to each day of SMD individually, such that we gain a set of cycles  cycle per day.Then, we can calculate the daily metrics as follows: Operating

Fig. 1 .
Fig. 1.Example of the different steps of extracting cycles from smart meter data.For better understanding, in the first graph the energy measurements (in kWh) are converted into power (in kW) by multiplying them by four.

Fig. 2 .
Fig. 2. Example of daily operating hours observed for a single HP.The graph shows the observations grouped by the average daily outdoor temperature to obtain distributions, their medians, and a linear regression on the medians.The latter is limited to the temperature range of 0-12 • C (see explanations in Section 3.3).

Fig. 4 .
Fig. 4. Examples of HPs that the algorithm determines as typical or atypical with respect to operating hours and cycles.The visualizations show original smart meter data in 15-min resolution but transformed to power in kW (multiplication by four) and heat maps are used to display the cycling patterns over time.

Fig. 5 .
Fig. 5. Confusion matrices with absolute and relative scores.The matrices compare the conspicuous HPs identified by our method with the best and worst 10% of HPs in terms of utilization and energy intensity as given by the baseline models.

Fig. 6 .
Fig. 6.Correlations of cycling metrics, energy intensity, utilization, and outdoor temperature at daily resolution for all HPs.

Table 1
Characteristics of the clusters of households, when applying K-Means clustering to each of the meta data variables individually.

Table 2
Description of the metrics that serve as key indicators for daily cycling behavior.

Table 3
Statistics describing the distributions of  2 values of the linear regressions on the temperature curves as derived in Section 3.8 and used in Section 3.9.The statistics are reported separately for each cycling metric across the distributions of the 243 HPs under evaluation.They therefore indicate how well a metric can be linearly approximated in the 0-12 • C temperature range.MetricMedian  2 Mean  2 Standard deviation of  2

Table 5
Performance scores of outlier detection by cycling behavior when compared to baseline models evaluating energy efficiency and appropriate sizing but requiring contextual information.
125 if  on  ==  off