Does data visualization affect users’ understanding of electricity consumption?

ABSTRACT Different data visualizations are investigated for how they enable occupants to learn about domestic energy consumption. Smart metering can potentially encourage householders to change their behaviour and save energy. However, concerns exist about whether users understand domestic energy feedback. Two challenges are addressed: feedback displays typically show aggregate consumption and they show time-series data visualizations, which are difficult to relate to everyday actions in the household. A laboratory experiment (N = 43) assessed changes in participants’ knowledge of how much electricity everyday actions consume after being exposed to different forms of energy-consumption data visualizations: (1) an aggregated time-series line graph, (2) a disaggregated time-series line graph and (3) a normalized disaggregated visualization that deemphasized time. Participants played an energy game both before and after they saw the simulation. Participants in condition (3) were more accurate and more confident in their post-test judgments about everyday domestic electricity consumption than other participants. These findings suggest that the type of data visualization affects users’ understanding of domestic electricity consumption. The visualization of disaggregated energy feedback at the appliance level should be considered for future generations of technology.


Introduction
Prior research suggests that people have a poor understanding of how much electricity domestic appliances in their home use (Chisik, 2011;Kempton & Montgomery, 1982;Mettler-Meibom & Wichmann, 1982). Attari, DeKay, Davidson, and De Bruin (2010) asked participants in an online survey to indicate the most effective thing they could do to conserve energy and to estimate the energy used by nine household appliances. They found that participants were unreasonably biased towards curtailment strategies (i.e. using appliances less) rather than replacing inefficient appliances, and that they were unaware of large energy differences across appliances and activities. Due to availability heuristics, participants systematically overestimated energy use for low-energy activities that are very salient (such as overestimating the energy consumption of a light bulb) and underestimated energy use for high-energy activities. Overall, the authors found a moderate positive correlation between participants' estimates and actual energy consumption of the appliances; however, they concluded it was too weak to support sound decision-making. In the hope of enabling such sound decision-making, there is a general push to provide homes with smart meters with in-home displays (IHDs) that show real-time and historic energy consumption. The smart meter IHD should be designed to 'enable the information displayed on it to be easily accessed and presented in a form that is clear and easy to understand' (DECC, 2014, p. 97). However, it has been questioned how clear and easy it is to understand domestic energy data on many current-generation smart meter IHDs (Roberts & Baker, 2003;Wever, van Kuijk, & Boks, 2008).
One potential issue is that smart meter IHDs typically report aggregate energy consumption across all appliances in the home rather than providing separate, disaggregated consumption data for different appliances (Froehlich et al., 2011). Separating out, or disaggregating, electricity consumption for different household appliances presents a significant technical challenge. Recent research has shown promising progress using a computational approach of non-intrusive load monitoring (NILM) to differentiate between the electrical signatures of various appliances (Armel, Gupta, Shrimali, & Albert, 2013;Batra et al., 2014;Gonçalves, Ocneanu, Bergés, & Fan, 2011;Reinhardt et al., 2012). Assuming these technical challenges can one day be met, there is a lingering issue concerning how people should be shown disaggregated home electricity data. Will they be able to make sense of it? Will it prompt changes in consumption behaviour?
A second potential issue is that many IHDs use timeseries data visualizations. These graphs, or data visualizations, usually show time on the x-axis and power usage on the y-axis, meaning they show changes in consumption over time rather than summaries of consumption (Costanza, Ramchurn, & Jennings, 2012). The scale (or temporal granularity) is often varied ranging from minutes to days. A benefit of this approach is that it readily shows the peaks of a household's electricity consumption. For example, it might show consumption peaks in the early morning and throughout the evening for a domestic home in which the residents are out at work and school during the day and sleeping at night. Yet, how useful is it for a household to know that their electricity consumption peaks during the early morning and the evening? How does this information help them to make sense of which appliances are consuming the most electricity and what steps might be taken to reduce consumption? Understanding electricity consumption from time-series data requires understanding the concept of power consumed over time, which is a difficult cognitive task for most people (Kidd & Williams, 2008).
The aim of this paper is to investigate how different data visualizations of residential electricity data enable users to learn about the consumption of everyday activities (e.g. making a cup of tea or running the dishwasher). The results are reported of a laboratory experiment that assessed changes in participants' knowledge of how much energy different everyday domestic appliances use after being exposed to different forms of energy consumption data visualizations. To assess this, participants saw a simulated pattern of domestic appliance use. The electricity used by this simulated pattern of appliance usage was fed back to participants using one of three different data visualizations ( Figure 1): (1) an aggregated time-series line graph, (2) a disaggregated time-series line graph and (3) a novel disaggregated visualization that deemphasized time by showing the total electricity consumed for a standard usage cycle of the appliance (i.e. a single boil of the kettle or a single run of the dishwasher). This final visualization is somewhat similar to other area-based visualizations that have been used in previous research, e.g. in the study by Costanza et al. (2012), which also highlight the total electricity consumed by various domestic appliances. The central question of concern is whether participants draw different conclusions about electricity consumption depending on which of the data visualizations they are exposed to.
The contribution that this research makes is in providing an empirical assessment of whether the choice of data visualization has an impact on the level of peoples' understanding of how much energy everyday domestic appliances use. Before describing the method and results of the experiment in more detail, a review of important related research is presented: the three most relevant fields of work are disaggregation, data visualization and objective measures of energy literacy.

Disaggregation
There is a growing body of research that has investigated how people make sense of their domestic electricity consumption. Prior research has shown that people naturally think of their energy consumption in terms of everyday activities and actions (Álvarez & Vega, 2009;Darby, 2001;Stankovic, Stankovic, Liao, & Wilson, 2016). For example, people will think of making a cup of tea rather than using electricity for the purpose of boiling water in the kettle (Entwistle, Rasmussen, Verdezoto, Brewer, & Andersen, 2015;Rego Teixeira, 2014). This suggests that people need information to be provided at key decision points in activities and that feedback is more actionable for householders when provided on the appliance level (Chetty, Tran, & Grinter, 2008;Darby, 2001;Froehlich et al., 2011;Kelly & Knottenbelt, 2015;Yun et al., 2010).
However, a recent review of the literature shows that behavioural research has failed to provide compelling empirical evidence that disaggregated feedback is superior to aggregated feedback (Kelly, 2016). The lack of empirical evidence does not convincingly show that disaggregation has no advantage, as the studies suffer from methodological biases of varying degrees. For example, in Sokoloski (2015), one group received disaggregated feedback through a website, while another group received aggregated feedback on a smart meter IHD. This means that the sensing, processing and visualization of the data were based on completely different sources and tools, which is confounding the results of the study.
In Schwartz, Denef, Stevens, Ramirez, and Wulf (2013) people were only able to ascribe meaning to visualized data and establish relations between energy consumption and specific activities thanks to their knowledge of what they typically do in the home. Equally, Herrmann, Brumby, and Oreszczyn (forthcoming) interviewed several UK households on their residential electricity usage. In the study, households recorded their domestic electricity consumption using a power clamp meter and inspected these data using a web-based tool that showed aggregated consumption using a time-series line graph data visualization. Interviews revealed that participants often explained peaks in electricity consumption with reference to time of day and everyday activities that they could recall happening at that time. Herrmann et al. (forthcoming) found that participants tried to explain the data patterns, even if they could not decode unambiguous information. They drew from memory and routines they had, but failed to identify reliably appliances or activities in the data visualization. When asked how the feedback could be improved, disaggregation was the main theme that emerged across all participants' ideas. The results suggest that it is hard for users to comprehend aggregated energy feedback because it does not relate to their everyday lives. Therefore, disaggregation must be crucial, as comprehension of the feedback is a necessary precondition for behaviour change.
It is known that users' energy data comprehension greatly depends on the design of the interface (Chiang, Natarajan, & Walker, 2012;Wever et al., 2008). Yet, the manner of presentation of the feedback information to consumers is a core consideration, which has been much overlooked in the literature: Roberts and Baker (2003) argue that there is value in thinking about the kind of information that would help inform domestic consumers more about their energy use, and how such information would best be presented to consumers to maximize opportunities for improved understanding and behaviour change.
Line graphs are common representations in residential energy feedback. Horizontal time-series graphs are popular because the visual system is practised in detecting deviations from the horizon (Tufte, 1983). However, a simple line graph can only represent one measure and will miss information due to aggregation (Loorak, Perin, Kamal, Hill, & Carpendale, 2016). Further, Costanza (personal communication) found that time series are useful in a context with experts, but they are difficult for the untrained user. In FigureEnergy (Costanza et al., 2012), users get to tag peaks of consumption in a time-series interface, but then the software sums up the usage per event and feeds it back to the user in boxes, where the size (i.e. the area) of each box is proportional to the consumption. The purpose of this simplified visualization is to make comparisons between events easier for the user.
One suggestion that emerged from the interview study conducted by Herrmann et al. (forthcoming) is that people would greatly value stacked or superimposed line graphs in domestic electricity data visualizations so that they can code the usage of the separate appliances. Such graphs contain several layers or patterns on a shared timeline, adding up to a cohesive chart while still encoding the specific attributes of the data with values encoded on the vertical axis. Brewer (2013) has argued that energy literacy involves the understanding of energy concepts necessary to make informed decisions on energy use at both individual and societal levels. Besides content knowledge, other definitions of energy literacy involve behavioural and affective characteristics such as attitudes and values (DeWaters & Powers, 2011). The focus of the present study is on the cognitive characteristics: if users do not understand the information, they cannot change even if they want to (Mettler-Meibom & Wichmann, 1982). Attitudes, e.g. the belief that it is important to reduce energy consumption to mitigate climate change, and actual behaviour change are pivotal aspects in researching people's energy consumption. Literacy as a concept is defined as knowledge, or a competence in a certain area, such as writing and reading. However, for the operationalization and results of this study, this is reframed as participants' understanding of how much electricity typical appliances in the household consume. The measured construct is not 'energy literacy' the way the term has been defined and used in the literature. Below, the relevant measures for quantifying energy-related understanding are reviewed. Chiang et al. (2012) used a change detection task: participants were presented with a reference image that showed a display with five components representing five states of energy consumption. Participants were to look at the information and memorize it. After seven seconds, the reference image would be replaced with the test image. The test image, too, showed five components representing five states of energy consumption.

Measurement of energy literacy
The task was to indicate if any of the components had changed, i.e. if the level of consumption had gone up or down. Dependent measures were response accuracy and response time. While accuracy and response time are valid measure, the authors themselves point out that the change detection task itself is very specific. The experiment lacks ecological validity, since information changes unpredictably in a real-world setting and users would not be focused on detecting them.
Semantic questionnaires are another tool that has been used to measure data comprehension. Peebles, Ramduny-Ellis, Ellis, and Bonner (2013) investigated how people interpret unfamiliar diagrams. In a laboratory experiment, they asked participants to think aloud as they attempted to comprehend the presented diagrams. To quantify participants' understanding, they were asked to identify and describe 'something interesting' about the given dataset. Further, a set of 21 questions of varying difficulty was produced to test participants' ability to find specific information in the diagram.
Galesic and Garcia-Retamero (2011) use three levels of comprehension questions to measure graph literacy: reading the data (i.e. being able to find specific information in the graph), reading between the data (i.e. making connections), and reading beyond the data (i.e. being able to extrapolate). However, semantic comprehension or literacy questions have to be tailored to the experimental context and are not standardized or validated. Moreover, energy literacy is very broadly defined and we are looking specifically at the understanding of how much electricity everyday actions in the household consume.
One of the challenges for research is to be able to assess peoples' understanding of how much electricity different everyday domestic appliances use. A playful approach to this problem is the ENLITEN energy game (see http://www.cs.bath.ac.uk/enliten/), which is part of a multidisciplinary collaborative research project between the universities of Bath and Oxford (Lovett, Gabe-Thomas, Natarajan, O'Neill, & Padget, 2013). When playing the game, participants are shown two appliances at a time and asked to click on the appliance they believe typically consumes more energy over a given period of time. Using this approach, it is possible to assess participants' response accuracy: how often do they correctly identify the appliance that uses more energy? This approach of having people make pair-wise decisions between appliances offers a useful and practical method for assessing understanding, especially as people often struggle to use and explain consumption using more formal units of measurements (e.g. kilowatt-hour). Yun et al. (2010) have taken a similar approach. In their study, they had people rank appliances according to their energy consumption. Yun et al. used this ranking approach before and after an experimental manipulation to determine the effect of the manipulation between conditions on participants' understanding. Anderson and White (2009), too, successfully employed a power-rating quiz of different household appliances. Overall, it can be seen that a variety of approaches have been taken to assess peoples' understanding of domestic electricity consumption. In the current experiment we use a modified version of the ENLITEN energy game.

Purpose of the current study
Many field studies have been conducted to explore how feedback on domestic energy data affects energy usage and saving behaviours in the household (e.g. Sokoloski, 2015;Van Dam, Bakker, & Van Hal, 2012). However, there are far fewer experimental studies that have been carried out on how people read and make sense of domestic energy data visualizations (e.g. McCalley & Midden, 2002;Yun et al., 2010). For the success of smart home technologies, examining the cognitive sense-making process and the suitability of graphic feedback is highly relevant when confronting users with domestic energy data. The present authors have identified a need for direct comparisons between aggregated and disaggregated feedback where potentially confounding variables are strictly controlled. To grant insight into these sensemaking processes, an experimental task is deployed that allows for valid quantification of participants' comprehension of the data.
The primary research question of concern in this paper is how different data visualizations affect peoples' understanding of domestic electricity consumption. The energy data came from the UK-DALE dataset (Kelly & Knottenbelt, 2015) that logged consumption over time for a variety of everyday domestic appliances. Three different but comparable data visualizations are tested. The first data visualization shows aggregated time-series data. The second data visualization shows disaggregated time-series data. The third data visualization shows disaggregated data and deemphasizes time by showing the total electricity consumed for a standard usage cycle of the appliance (i.e. a single boil of the kettle or a single run of the dishwasher). To assess learning, participants were asked to make a series of decisions about which of two everyday activities consumed more electricity. Their response accuracy, response confidence and response time were assessed. These measures were taken both before and after participants were exposed to one of the three different data visualizations, hence allowing for an analysis of a pre-test versus post-test understanding of domestic electricity consumption.
It is expected that the kind of data visualization that a participant is exposed to will affect their level of understanding of how much energy everyday domestic appliances use. Based on the findings of Costanza et al. (2012) and Herrmann et al. (forthcoming), the general expectation is that participants who see the normalized data will perform best (i.e. achieve higher accuracy scores, higher response confidence and shorter response times in the post-test). The cognitive effort in decoding the information is lowest in this condition, as there is no need to estimate the area under the curve. Instead, similar to the boxes in Costanza et al., the usage is summarized in a simpler area shape. Further, based on the research of Darby (2001), Chetty et al. (2008) and Schwartz et al. (2013), we assume that participants who see disaggregated time-series data perform better than participants who see aggregate time-series data.

Method
Participants A total of 43 participants (12 male) were recruited through the University College London (UCL) Psychology Subject Pool. Ten participants were aged between 18 and 20 years, 31 were between 21 and 35 years, and two were 36 years or older. All were adults with normal or corrected to normal vision who were accustomed to reading from left to right and who pay their utility bills (or do so with the help of their partners or fellow tenants). Participants received course credit or a small payment for taking part in the study.

Materials
The experiment was designed to see whether participants' assessment of electricity consumption of common household appliances is affected by the design of the energy data visualization they use. Three energy data visualizations were used: a line graph with a single aggregated data line (representing total energy usage across multiple appliances), a line graph with multiple disaggregated data lines (representing energy usage for each of the individual appliances), and a novel disaggregated graph that has been normalized over time (representing the total energy usage of an appliance over a single usage of that appliance). In the course of this paper, we refer to the three conditions as aggregated, disaggregated and normalized.
Both line graphs show time-series data. Duration of usage is represented on the x-axis as time (minutes) and electricity consumption is represented on the yaxis as power (Watts). Figure 1(a) is a line graph that shows how the aggregated power consumption of three different appliances (a kettle, a vacuum cleaner and a dishwasher) varies over time. In contrast, Figure 1(b) shows the same data, but here the power consumption of these three appliances is represented as different coloured data lines. The intention of the disaggregated line graph is to make it easier for the user to distinguish how the power consumption of each appliance varies over time throughout a period of usage.
A novel visualization was developed, which is shown in Figure 1(c). A critical issue with the line graphs is that appliances run for different periods of time. For example, a kettle will run for a short period, using a lot of power per unit of time, whereas a dishwasher will run for a much longer period, using less power per unit of time. When using a line graph visualization that shows energy usage as a function of time, it is difficult for users to determine the cumulative energy usage of a given appliance over time, potentially making it difficult to determine which appliance uses more cumulative energy over a standard usage cycle. The normalized visualization we developed attempts to alleviate this problem by showing cumulative consumption over a single usage of the appliance. This allows the user to see readily which of the appliances is using more energy over a standard usage cycle.
To assess participants' judgment of electricity consumption, an energy game was deployed. This was a two-alternative forced-choice task (Figure 2). In this energy game, participants had to indicate which of two appliances consumes more electricity during a standard usage cycle, e.g. making coffee or running the dishwasher. For each pairwise comparison we recorded response accuracy and response time in seconds. In addition, response confidence was assessed by asking participants how confident they were about their decision on a scale from 1 to 5 (1 being low confidence, 5 being high confidence). The pairwise comparison task and the icons that participants click to indicate their answer were based on the ENLITEN energy game (see http://www.cs.bath.ac.uk/enliten/).
For making both the energy visualizations and the pair-wise comparison in the energy game, we used the same set of nine common household appliances: radio, lamp, microwave, toaster, kettle, coffeemaker, vacuum cleaner, washing machine and dishwasher. To model the energy consumption of these appliances, we used data from the UK-DALE dataset (UK domestic appliance-level electricity; Kelly & Knottenbelt, 2015). Specific data are drawn from house 1 in the dataset (a London end-of-terrace house, built c.1905). For each appliance, we identified the typical duration of use and the power usage over time. All materials were presented on a 27inch iMac (2560 × 1440, graphics: ATI Radeon HD 4850 512 MB).

Research design
The experiment is a single-factor between-subjects design in which the independent variable was the graphic representation of the electricity data feedback (Figure 1). The dependent measure is participants' knowledge about the electricity being used to perform a typical behaviour in the household (e.g. making coffee or running the dishwasher). The change in knowledge for the nine appliances from pre-to post-test in the energy game is measured by response accuracy, response confidence (on a scale from 1 to 5), and response time (seconds) in the energy game ( Figure 2).

Procedure
Participants were informed they would be taking part in a study about domestic energy usage. Participants were randomly assigned to one of the three visualization conditions. Participants completed the study in a small private office with a desktop computer placed on a table. The office was quiet and free from external interruptions and distractions. After arriving at the laboratory, participants were first asked to complete a simple questionnaire to gather basic demographic information.
The experiment was separated into three stages: a pretest assessment of domestic energy usage understanding using the energy game, a period of exposure to energy usage visualizations, and a post-test assessment of domestic energy usage again using the energy game. Participants made a series of 36 two-alternative forced choices. The 36 comparisons crossed each of the nine appliances in the dataset. As described above, participants' response accuracy, response time and decision confidence were recorded. In general, the energy used by the different appliances fell into several categories. The dishwasher, washing machine and vacuum cleaner were relatively high-energy consumption appliances. The light and radio were relatively low-energy consumption appliances, while the microwave, toaster, coffee maker and kettle were in between. This meant that some comparisons were relatively easy (e.g. dishwasher versus light) and others were more difficult (e.g. dishwasher versus washing machine). This range in difficulty meant that participants would have a range in decision accuracy and confidence. The focus here was to assess changes in participants' decisions between the different visualization conditions.
For the middle part of the experiment, participants saw a simulated pattern of appliance usage and were given feedback about the associated energy usage through the visualization (which varied depending on which condition the participants were assigned to). Each participant saw the same simulated pattern of appliance use, which was divided between 30 frames. A summary of this simulation is given in Table 1. As can be seen, the simulation was designed to give periods in which different appliances were being used, sometimes together, sometimes in isolation. The idea was to give a complex and rich pattern that mimicked domestic appliance use. Participants were free to look at each frame of the simulation for as long as they wanted to, proceeding through the experiment by clicking the continue button. For each given frame of the simulation, the nine household appliances were shown on the left side of the screen. Different combinations of appliances would be switched 'on' and 'off'. Figure 3 gives an example frame in which the dishwasher is 'on' (frame 11), represented by a green background colour change, while all other appliances are 'off'. On the right side of the screen, the data visualization shows the associated energy usage for the appliances that are 'on' in the current frame. The visualization used is dependent on which condition the participant was assigned to. Each participant saw the same simulated pattern of appliance use. Once they had finished the simulation, participants again completed the post-test energy game.
After completing the main part of the experiment, the first author conducted a brief interview with participants to explore how they made their decisions in the energy game and how they made sense of the data visualizations in the simulation. Two open-ended questions were asked and participants had the opportunity to add any further comments: . How did you estimate in the pre-test of the energy game which action consumes more electricity? . How did you make sense of the data feedback in the simulation?

Quantitative data
Participants' decisions in the energy game were considered with a focus on their response accuracy (i.e. the proportion of correct decisions out of the 36 pair-wise comparisons in percentages), response time (seconds), and decision confidence estimates (on a scale from 1 to 5, where 1 is low confidence and 5 is high confidence) between the different visualization conditions. For statistical analyses a between-subjects analysis of variance (ANOVA) with a significance level of 0.05 was used for judging the significance of effects. A first check was undertaken to ascertain whether there were any differences in pre-test performance on the energy game between participants assigned to each of the different data visualization conditions. Mean response accuracy was consistent between the different conditions (mean = 77.97%, standard deviation (SD) = 8.56%, mean = 73.22%, SD = 7.03%, and mean = 76.78%, SD = 10.97% for aggregated, disaggregated and normalized respectively). A similar pattern was found for both response confidence (mean = 3.69, SD = 0.51, mean = 3.68, SD = 0.54, and mean = 3.74, SD = 0.52 for aggregated, disaggregated and normalized respectively), and response time (mean = 8 s, SD = 1.9 s, mean = 9.41 s, SD = 3.03 s, and mean = 8.06 s, SD = 1.93 s for aggregated, disaggregated and normalized respectively). Statistical analysis showed that there was no significant effect of assigned condition on any of these pre-test measures of performance on the energy game (all Ftests have p > .05). Given that there is no difference in the base level of knowledge of domestic electricity consumption between participants, the second consideration was whether there were any differences in post-test performance on the energy game after participants were exposed to different data visualizations. Results show that response accuracy in the post-test stage of the energy game was significantly higher in the normalized condition (mean = 93.86%, SD = 6.08%) than in the aggregated condition (mean = 86.86%, SD = 9.47%) or in the disaggregated condition (mean = 89.69%, SD = 4.67%). Statistical analysis revealed a significant effect of visualization condition on response accuracy, F(1, 41) = 7.14, p < .01. A similar pattern of results was found for response confidence. Participants had higher confidence in their responses in the normalized condition (mean = 4.69, SD = 0.22) than in the aggregated condition (mean = 4.37, SD = 0.34) or in the disaggregated condition (mean = 4.49, SD = 0.34), and there was a significant effect of visualization on response confidence, F(1, 41) = 8, p < .01. While participants were marginally faster at giving responses in the normalized condition (mean = 5.63 s, SD = 0.79 s) than in the aggregated condition (mean = 6.24 s, SD = 1.06 s) or in the disaggregated condition (mean = 6.26 s, SD = 0.86 s), there was no significant effect of visualization condition on response times, F(1, 41) = 3.11, p = .09.

Qualitative data
Estimates in the energy game Participants were asked how they had tried to estimate which actions consume the most electricity in the pretest of the energy game. Participants often reported using simple heuristics to make their decisions based on the usage duration and the size of the appliance. For example, P39 thought: 'the longer the time the more electricity will be consumed' and 'the bigger the size [of the appliance] the more energy'.
A second simple heuristic that participants used to make their decisions was based 'how hard the appliance has to work' (P17) or 'what the objects [appliances] actually do, how much movement they involve and stuff like that' (P10). For example, P30 thought that a vacuum cleaner would consume more because 'it uses more electricity to get the dust'. Likewise, P9 thought that washing laundry 'takes a lot more, it does a lot more stuff, if spins and washes'. Ten of the participants thought that generating heat 'uses a lot of energy' and so would use this as a simple heuristic for choosing appliances that generated heat over appliances that did not. For example, P8 said that 'Kettles need a lot of energy; they boil water. Versus a radio that doesn't really produce the equivalent of boiled water'.
Third, there were participants who based their estimates on experience and previous knowledge. A couple of participants said they knew the wattage from the appliance labels and power ratings; others remembered information they had heard or learned from their parents. A couple of participants also mentioned that their bills had noticeably increased since they performed a specific activity more frequently and therefore they inferred it must consume a lot of electricity.

Sense-making in the simulation
Participants were then asked how they had made sense of the data visualization in the simulation. The differences between the three conditions are described below.
Participants in the aggregated condition reported looking at how much the separate activities consume and for how long they lasted. When multiple devices were on at the same time, they tried to 'see how they add up' (P4) and 'how much they consume all together minus individual ones' (P3). To estimate the total consumption of one activity, they 'add[ed] up the energy they use in different periods' (P4) in order to estimate the area under the curve. P10 stated that 'when they were combining, it made it more difficult to see and remember which one is more'. Particular difficulties were reported with activities that were similar in the amount of electricity consumed, such as the coffee maker and the kettle. A couple of participants mentioned they were thinking about the particular patterns of the activities, such as the 'hot cycles' of washing machine and dishwasher, which are mirrored in the 'the peaks and trough of the graph' (P13).
Participants in the disaggregated condition reported 'looking at how the energy level changes. For comparable time, [I] look at the difference in height and kind of estimate the total area' (P18). P25 found that 'Of course many things became clear.
[ … ] With the graphs you could estimate how much and the times when they consume. It was accurate in determining the pattern', while P26 found it 'difficult to judge, there are all those spikes'. Just as in the aggregated condition, the difficulty depended on how similar the activities were in the amount of electricity consumed. P27: 'Some things were quite obvious like the radio, it's not consuming anything at all.' P25: 'I was confused between laundry and dishwasher.' P29 describes her memorizing strategy as: trying to think of how it works, how the piece of technology works […] I found [the graph of the dishwasher] interesting 'cause I thought it has two peaks and in the middle it is low so I was thinking okay so what does it do? It sprays water at the beginning; then in the low bit, does it mean that the dishes stay in soap? For whatever, 30 minutes. And then has another peak of rinsing. Maybe it's not true but that's the explanation that I gave myself.
Participants in condition three did not have to compare visually, as the visualization provided the ranking by consumption. They reported their strategy as 'see the curve and try to remember the sequence' (P37), particularly trying to remember 'which ones took less […] when there where small differences' (P33). P40 thought 'the curves were pretty transparent, it was easy to see which one was higher […] with kettle, lamp, coffee maker and toaster it was easy, they were one above another'. On the other hand, data from participant P31 were excluded from the quantitative data analysis because she reported that the graph 'didn't make sense' and she was unclear 'what the whole thing, the curvy shape was' and admitted she had just clicked through the experiment. P44 was unsure 'if they [the graphs] were cumulative. I think they were not cumulative' and would have liked to see the pattern that the appliances produce over time, yet he 'liked they were standardized over time, that was nice'.

Main findings
The main finding is that people draw different conclusions about electricity consumption depending on the kind of the data visualization to which they are exposed. This is in line with previous research results showing that data comprehension greatly depends on the manner of presentation and the design of the display interface (Chiang et al., 2012;Roberts & Baker, 2003). Participants in the present study were more accurate and had greater confidence in their judgments about how much electricity domestic appliances used after seeing a simulation of these appliances. Learning was significantly better in the normalized visualization. It was expected that the normalized condition would yield the best results, as area-based graphs are more suitable at summarizing consumption over time (Costanza et al., 2012) than line graphs (Loorak et al., 2016). The benefit of area-based visualizations is that they make the information about how much electricity an appliance is using over time more readily available and salient to participants, who were therefore able to use it to make more accurate judgments about consumption patterns in the energy game.
As opposed to the expectation from the literature, disaggregated feedback is not per se superior for people to learn from than aggregated feedback (Darby, 2001;Froehlich et al., 2011;Yun et al., 2010). Participants' performance was no different between the aggregated and disaggregated condition. Only given the simplified (meaning normalized) visualization did disaggregated feedback lead to improved learning. These results show that care is needed when implementing disaggregated energy visualizations. Using a disaggregated time-series line graph is still challenging for many people, as it requires them to integrate visually the area under the curve, which is a fairly difficult cognitive task. The normalized visualization is similar to the ideas that participants brought forward in our precedent field study (Herrmann et al., forthcoming) and increased learning due to its simplified shape (Costanza et al., 2012). Where studies carried out in the field could not generate unambiguous evidence (Kelly, 2016), our experiment has tested aggregated versus disaggregated visualizations, based on the same dataset, free of any confounding variables.
The qualitative data yielded insights into the cognitive sense-making processes that participants went through in the different conditions. P10 in the aggregated condition said: 'When they were combining, it made it more difficult to see and remember which one is more.' This could be expected and is in line with the quantitative resultsit is more difficult to decode the information and learn from it. P29 in the disaggregated condition elaborated on the system status and the technical processes involved. She expressed concern as to whether her explanations were correct. However, this ideation seemed to help her interpret the data and remember it better. While the normalized visualization yielded the best results in the quantitative data, it did not make sense to one participant whom we had to exclude from the analysis. For participants in this condition, the strategy lies in remembering the ranking of the appliances. It is assumed that none of the deeper processing in terms of thinking about what the appliances do took place. This type of sense-making could, however, be relevant to engagement and long-term retention. The qualitative data challenges the quantitative findings with regards to the superiority of the normalized condition.

Limitations
One of the study's limitations is the homogenous sample: 36 out of 43 participants were students (under-and postgraduates). This meant the sample was relatively young and probably more highly educated with better computer literacy than the general population. Moreover, the majority of the sample was female. Locoro, Cabitza, Actis-Grosso, and Batini (2017) found that the ability to understand infographics might be subject to age, gender and educational background. The demographics of the present study's participants imply that caution should be exercised when generalizing the findings to the general population (Sturm et al., 2015). It would be necessary to replicate the experiment with participants showing a wider variation in age and education and a balanced gender distribution.
Also, as Rogers, Yuill, and Marshall (2013) point out, it is unclear to what extent findings from laboratory experiments can transfer to uncontrolled settings in the real world. For example, Chiang et al. (2012) replicated their laboratory energy display study in the field (Chiang, Mevlevioglu, Natarajan, Padget, & Walker, 2013) and found slightly different results. Therefore, in order to test for ecological validity, a replication in the field would be required. A key challenge to deploying the normalized visualization in the field lies in how to deemphasize the representation of time. A major strength of the normalized visualization is that it makes it easier to compare total energy consumption. However, the normalized visualization necessarily deemphasizes information concerning how often and how long the appliance has been in use for. Arguably, a system that preserves all this information, such as the FigureEnergy representation (Costanza et al., 2012), would be most beneficial to end-users. While it is desirable for the user to know that the washing machine uses a lot of electricity, it might not be possible to run it any less when the household uses it sparingly already. On the contrary, if a kettle consumes less over a month but runs unreasonably often, the householder might wish to reconsider this.
Potential limitations arise from the use of the energy game. In the experiment participants were given a single specific task, namely that of learning the relative consumption of appliances in comparison with others. Operationalization was limited in that it only assessed changes in performance accuracy at the energy game. In a real-world context, there are various other ways people can learn from domestic energy feedback, e.g. understanding what is contributing towards their baseline consumption (Kidd & Williams, 2008;Van Dam, Bakker, & Van Hal, 2010).
Finally, the focus of this study was on energy-usage comprehension rather than behaviour change. The validity of this study is limited to the impact of different methods of visualization on comprehension only. The experiment does not address if or how householders would go about changing their behaviour in order to reduce domestic energy consumption. However, as Mettler-Meibom and Wichmann (1982) point out, a household that does not even know about its inefficiency in the first place is certain not to reduce consumption. Future research could explore how knowing more about energy consumption potentially affects daily behaviour and the decision on how to use electronic devices.

Conclusions
Comprehension is a key factor that plays a significant role in the extent to which IHDs are likely to result in changes to energy consumption. While field studies have highlighted the cognitive difficulties involved in understanding IHDs, relatively few have conducted controlled experiments. Human-computer interaction research should play a role in the design and evaluation of eco-feedback technology (Froehlich, Findlater, & Landay, 2010). The key contribution of this study is that the efficacy of different graphic visualizations was compared along with their effect on data comprehension.
Although experiments 'might not adequately capture perception in a real-world setting', they do provide 'a useful upper-bound on people's ability' (Chiang et al., 2012, p. 478). Simulations have proven to be an appropriate method to test cognitive abilities and decisionmaking processes (Gonzalez, Thomas, & Vanyukov, 2005). The strong internal validity of a highly controlled setting allowed a comparison to be made of aggregation and disaggregation free from confounding factors, which makes the findings more rigorous than those from previous field studies.
The findings suggest that the choice of data visualization used in smart meter IHDs or web-portals can affect the kinds of inferences people will make about their domestic electricity consumption. Currently, these are often time-series based and aggregated. First, the experiment shows the importance of choice of visualization. Second, it shows the significance of disaggregation that allows for easy and direct comparisons between appliances. This is crucial for people to learn how much electricity they use for everyday actions. Time series are difficult to interpret and, indeed, disaggregation alone did not yield significantly better results: only disaggregation in combination with a simplified visualization that facilitated comparisons between activities resulted in significant learning advantages.
This paper highlights the impact that the choice of data visualization can have on people's ability to interpret domestic energy usage data. This is important given that many countries around the world are currently deploying smart meters. Many current-generation IHDs use time-series data visualizations. The results of this work suggest this is not the most appropriate data visualization to enable people to understand their domestic energy usage data. Instead, summary overviews are recommended as these better support data comprehension of how different daily tasks relate to their energy use.
This study's findings give new insights into the usability of different visualizations and are therefore relevant to researchers studying eco-feedback, developers, contractors and utilities proving IHDs or other online ecofeedback as well as government offices working on these issues.