A decision tree method for explaining household gas consumption: The role of building characteristics, socio-demographic variables, psychological factors and household behaviour

This research aims to develop a decision tree model for understanding actual gas consumption in residential buildings. Extending previous studies, this study examined to what extent four different type of factors, building characteristics, socio-demographics, psychological factors and household behaviour can explain actual gas consumption of Dutch households in 2017 and 2018. Data were collected from 601 households. A novel approach, a decision tree method, revealed that household gas consumption was related to building characteristics, socio-demographics, and psychological factors, while energy-related behaviour in households was not uniquely related to gas consumption. Specifically, house size, building age and residence type (building characteristics), household income and employment status (socio-demographics), and most notably egoistic values, hedonic values, environmental self-identity, perceived corporate environmental responsibility of the energy provider, and social norm (psychological factors) predicted total actual household gas consumption. These results illustrate that the novel integrated framework introduced in the paper yields a better understanding of actual household gas consumption. The results have important practical implications and suggest that it would be important to target these three type of factors in policy aimed to reduce household gas consumption.


Introduction
Energy consumption represents a major contributor to climate change and global environmental problems. Natural gas usage causes environmental problems including global warming caused by the emission of greenhouse gases. In the Netherlands, natural gas is one of the primary energy sources, making up to 41% of the total energy consumption [1]. Gas consumption is responsible for 75% of the total energy consumption in Dutch household. Gas is mostly used for heating dwellings, which made up 80% of the total household gas consumption [2]. Household gas consumption reduced by 28% between 1990 and 2018 in the Netherlands, as houses are better insulated [3]. This is a major part of household energy consumption. Besides, it is important to better understand which factors affect household gas consumption, as this reveals which strategies could be effective in reducing gas consumption and related CO 2 emissions that cause global warming and climate change [4]; interventions will be more effective when they target key antecedents of gas use.
The aim of this paper is to study which factors are related to household gas consumption. In doing so, a wide range of factors is considered that have typically been studied in isolation. Specifically, extending previous research, this paper aims to study the relative importance of four types of factors that may affect actual household gas consumption: building characteristics, socio-demographic variables, gas use behaviour and psychological factors. These factors are explained in more detail below.
Regarding the first type of factor, building characteristics, studies have shown that residence type affects household gas consumption. For example, Dutch studies found that those living in detached house and semi-detached house used more natural gas than those living in other type of dwellings [5,6]. A study in Ireland showed that households living in detached houses use more gas than those in semi-detached houses, Abbreviations: MAE, Mean Absolute Error; RMSE, Root Mean Squared Error.
probably because detached houses have more external walls and more heat loss [7]. Other studies also showed that semi-detached houses have higher gas consumption levels [8].
Furthermore, building age affects gas consumption. Some studies found that residential gas consumption was higher in older buildings, probably because of lower energy efficiency level of older buildings [9][10][11]. Similarly, gas consumption is higher is houses built before 1981 compared to houses built between 1981 and 2000, which could be related to newer houses having higher energy efficiency standards [7]. Another study reported that houses built between 1965 and 1974 have a higher gas consumption than older houses, which could be caused by the poor building envelope's thermal resistance of houses built in that period [6].
Regarding the impact of dwelling size on household gas consumption, studies found that a larger floor area in dwellings is associated with higher energy consumption for heating [10,12]. However, another study showed that although a larger floor area is related to higher levels of actual gas consumption in Dutch dwellings, the gas consumption per square meter is lower in larger houses [8]. Moreover, there is a positive relationship between the number of rooms heated and gas used for heating [10].
The second relevant type of factors that can influence household gas consumption are socio-demographic variables. In general, larger households use more gas [8,13,14]. In Dutch dwellings, family households use most gas while single households have the lowest gas consumption, and household without children use less gas than those with children [15].
One study showed that older people are likely to use more gas than younger respondents, probably because older people have higher comfort standards and spend more time at home [7,8]. Likewise, a study in the Netherlands, Norway, UK and Sweden reported that elderly people spend more time at home, resulting in a higher gas consumption [16].
Income is another relevant predictor of household gas consumption. Some studies reported a strong positive correlation between income and household gas consumption, presumably because higher income households live in larger dwellings [11,13,14]. In contrast, one study reported that household with lower incomes consume more gas than those with higher incomes, probably because higher income household are less often present in their home [15].
Another relevant socio-demographic variable is education level of household members [17]. Most studies found that there is a positive relationship between household education level and gas consumption [7,18,19]. This may be because households with a higher education level have a higher income and are likely to live in larger dwellings, which is associated with higher gas use.
Studies showed that household with at least one member employed used less gas than those in which no one is employed, probably because the house is less occupied when someone works [15]. Yet, not surprisingly, households who work from home are more likely to use more gas [12]. Moreover, self-employed households are likely to consume more gas than employed households, probably because self-employed households tend to earn more money than employees, and thus have more resource to afford to use more gas [7].
Third, there are many different behaviours that affect gas consumption. Particularly, as space heating accounts for a substantial proportion of overall household gas use, indoor temperature settings is a relevant behaviour that might affect household gas consumption. Indeed, studies have found that a higher average indoor temperature setting resulted in a higher energy use for space-heating [8,10].
Little is known about the relationship between psychological factors, the fourth type of factor we consider, and household gas use. It is likely that household gas consumption is influenced by various psychological factors, and particularly motivational factors. One relevant type of motivational factor are values that reflect general desirable and transsituational goals, varying in importance that serve as guiding principle in individual's life [20]. Four type of values are particularly related to environmental behaviour, including energy use [21][22][23]: egoistic values (i.e. focusing on increasing one's personal resources), altruistic values (i. e. reflecting concern for other human beings), hedonic values (i.e. focusing on pleasure and comfort), and biospheric values (i.e. focusing on valuing the environment). These four types of values appear to affect a range of environmental behaviours, and are therefore also likely to affect household gas consumption. Generally, people with stronger altruistic and particularly biospheric values are more likely to engage in pro-environmental behaviour. Therefore, it is likely that strong biospheric and altruistic values are related to lower levels of gas use (cf [22,24]). Strong hedonic and egoistic values are often negatively related to pro-environmental behaviours, possibly because such behaviours can be associated with less comfort and more costs [25,26]. Therefore, it is likely that strong hedonic and egoistic values are associated with higher levels of gas use.
Another relevant type of psychological factor is environmental selfidentity that reflects the extent to which someone perceives himself or herself as the type of person who acts pro-environmentally [27][28][29]. Specifically, people with a stronger environmental self-identity are more likely to see themselves as the type of person who engages in pro-environmental actions and consequently will be more likely to act pro-environmental as people are motivated to act in line with how they see themselves. Research found that environmental self-identity is indeed related to a lower energy use [27,[30][31][32]. Therefore, it is expected that a stronger environmental self-identity will be associated with lower levels of gas use.
Another relevant motivational factor that can affect gas use is personal norm that reflects the extent to which one feels a sense of personal obligation to engage in a certain behaviour [33]. Studies have shown that stronger personal norm to act pro-environmentally encourages pro-environmental actions [34,35]. Therefore, it is expected that a stronger personal norm to save energy is related to lower levels of gas use.
Social norm is another relevant type of motivational factor that can affect energy use [36,37]. Social norm includes injunctive norms, reflecting perception of what most people approve or disapprove, and descriptive norms, reflecting perceptions of what most other people do [38]. People are motivated to act in line with injunctive norms to get social approval or to prevent social sanctions, while they are motivated to act in line with descriptive norms as they think that what most people do is probably the most sensible thing to do [38]. It is expected that people who think that others try to reduce their gas use or expect them to reduce their gas use will use less gas.
Another relevant type of motivation factor is perceived corporate environmental responsibility of one's energy provider. Corporate environmental responsibility implies that an organisation (in this case the energy provider) has the goal to enhance its environmental performance and reduce its environmental impact [39]. When people think that their energy provider is committed to reduce its environmental impact and implemented procedures to achieve this goal, they may be more likely to think it is important to protect the environment, and they may be more motivated to engage in pro-environmental actions at home themselves. Therefore, it is expected that the more people think their energy provider endorses corporate environmental responsibility, the less gas they will use.
The aim of the research presented in this paper is to examine to what extent the factors described above are related to actual household gas use in the Netherlands. This paper extends previous research in three ways. First, as yet, most studies have focused on understanding electricity consumption. Far less studies aimed to understand household gas consumption, and in particular actual gas consumption, while this is a major energy source, particularly in the Netherlands. Second, this paper aims to study how a wide range of factors that are likely to affect household gas use, including building characteristics, sociodemographics, psychological variables, and household gas use behaviour. Notably, until now, studies typically focus on limited set of variables, so little is known about the relative importance of these four types of factors in explaining actual household gas consumption. Specially, the effect of psychological variables on gas consumption has been understudied, while these may be relevant, as explained above. To address this gap in the literature, this study aims to test an integrated framework to enhance the understanding of factors influencing household gas consumption. Third, a novel methodological approach is used to examine to what extent these factors are related to household gas consumption: a decision tree method. A decision tree method is as a data-driven approach that classifies different groups of households on the basis of the characteristics included in the analysis (i.e., the four types of factors discussed above) to predict household gas consumption. The decision tree method is further explained in Section 2.3. This is a promising approach, as a decision tree provides a graphical illustration of which factors are related to household gas use that is relatively simple and easy to understand and interpret. A decision tree method reveals how households with different levels of gas consumption can be characterised, and is useful for exploratory purposes. Notably, a decision tree visualises the relationships between variables in large data sets, and not only identifies the most significant variables related to gas consumption, but also interactions between two or more predictor variables included in the model. This is a major advantage above multiple regression analysis and therefore a decision tree is likely to provide a more comprehensive insight into factors related to household gas use. Moreover, unlike other statistical tools that aim to assess relationships between variables, decision trees remain flexible to handle both continues and categorical variables and variables with some missing variables. Additionally, unlike linear regression and logistic regression models, a decision tree does not require any assumptions of linearity in the data.

Data collection and pre-processing
A questionnaire study was conducted among households having a smart meter and that had been customer of a Dutch energy provider, Qurrent, for at least 6 months, in the Netherlands in 2017. Participants were recruited via an email sent by the energy provider, and were asked to fill out a questionnaire and to share their gas consumption data collected via their smart meter with the research team. One person per household filled out the questionnaire. The study was part of the EUfunded project "Psychological, social and financial barriers to energy efficiency". 1 Data were collected on building characteristics, sociodemographic variables, gas use behaviour and psychological variables. Additionally, household gas use data were collected based on smart meter readings, reflecting bimonthly gas use of the years 2017-2018 (from the first of January 2017 to the first of July 2018). A total of 2318 households completed the questionnaire. However, data on gas use, the dependent variable, were available for 1211 households only. Of those 1211 households, 610 were excluded from data analyses based on the following three criteria: (1) When households answered "don't know" to the following questions: "how big is your house in term of square meters?", "in which of the following periods was your house originally built?", "which of the following best describe your household type?", "how much was your household's total monthly gross income in the following years?", "what is the usual temperature in your living room during winter at day time and night time in winter?", as these questions tap on important predictors of household gas use. In total 160 cases were excluded for this reason. (2) When households selected "apartment" as their residence type.
This study aimed at focusing on "detached house", "semi-detached house" and "terraced house" as these type of residences are the most common dwelling types in the Netherlands. Moreover, people living in an apartment completed some different questions than those living in detached house, semi-detached house and terraced house, so including them would have resulted in many missing data on key variables. In total 283 respondents lived in apartments and where therefore excluded. (3) When data on total gas consumption in the years 2017 and 2018 were missing. In total 167 cases were excluded for this reason.
Hence, in total 601 households were included in the analyses. 2

Measures
As explained in the Introduction, variables from four categories of predictor variables were included: building characteristics, sociodemographic variables, gas use behaviour and psychological factors. Tables 1-4 give an overview of all variables included in the analyses.

Building characteristics
Households were asked to indicate the type of dwelling they live in (i. e. detached house, semi-detached house and terraced house), the size of dwelling they live in (in square meter), the number of rooms in the dwelling including habitable spaces such as living room, dining room, bedroom and office, and the year of construction of the dwelling. Table 1 shows the building characteristics that were used and their response percentages. Table 2 displays descriptives for the socio-demographic variables. In total 183 females and 418 males participated in the study. Age ranged from 19 to 55 (M ¼ 37.99, SD ¼ 9.37). Respondents indicated the highest educational degree they have completed, their employment status and the total household monthly gross income in year 2016 as well as their household type. Household type included five categories, namely "single person", "single parent with one or more children", "couple, without children", "couple with one or more children" and "other type of household". To facilitate the interpretation of the result, household type was recoded into two dichotomous variables, namely "single" and "couple". Next, for singles, "single without children" (0) and "single parent with one or more children" (1) were distinguished. Similarly, "couple without children" (0) and "couple with one or more children" (1) were distinguished. None of respondents indicated to live in an "other type household".

Gas use behaviour
Respondents were asked to indicate the usual temperature in their living room during day time and night time, respectively, in winter, in degrees Celsius. Table 3 shows descriptives of room temperature settings as an indicator of gas use behaviour. The largest proportion of the sample reported room temperature settings during day time to be 20 � C degree (35.2%), while the room temperature settings during night time was mostly below 16 � C (33.7%).

Psychological factors
Respondents filled in a brief value questionnaire to measure their biospheric, egoistic, hedonic and altruistic values (see Ref. [22]) consisting of 16 items. A brief explanation was given of the relevant values. The scale included four biospheric values (Respecting the earth: harmony with other species; Unity with nature: fitting into nature; Protecting the environment: preserving nature; Preventing pollution: protecting natural resources, four altruistic values (Equality: equal opportunity for all; A world at peace: free of war and conflict; Social justice: correcting injustice, care for the weak; Helpful: working for the welfare of others), five egoistic values (Social power: control over others, dominance; Wealth: material possessions, money; Authority: the right to lead or command; Influential: having an impact on people and events; Ambitious: hardworking, aspiring), and three hedonic values (Pleasure: joy, gratification of desires; Enjoying life: enjoying food, sex, leisure, etc.; Self-indulgent: doing pleasant things). Respondents were asked to indicate to what extent these values were important to them as a guiding principle of their life, on a 9-point scale (À 1 opposed to my values to 0 not important to 7 extremely important). Following Schwartz [20,40], respondents were advised to differentiate as much as possible between the items, and to rate no more than two values as extremely important, to ensure that participants distinguished between the importance of the different values. The items of the biospheric value scale formed a reliable scale 3 . Thus, all value scales had sufficient internal consistency. The mean scores indicate that people generally rather strongly endorse egoistic, hedonic and altruistic values, while biospheric values were relatively less important to people.
Corporate environmental responsibility was measured with the following three items reflecting the extent to which people think their energy provider has the aim to improve its environmental performance and to reduce its environmental impact (cf [39]): I think that my energy provider has the goal to minimise its impact on the environment; I think that my energy provider has implemented policy and procedures to minimalize its impact on the environment; I think that my energy provider has stated in its mission to implement sustainable (pro-environmental) policy. Respondents indicated to what extent they agree with the items on a 7-point scale ranging from 1 (totally disagree) to 7 (totally agree). Cronbach's alpha for this scale was 0.85 (M ¼ 5.53, SD ¼ 1.09), again indicating that the items formed a reliable scale. The mean score indicates that respondents generally believe that their energy provider has a clear mission to reduce its environmental impact.
A validated scale was used to measure pro-environmental self-identity, comprising three items: Acting pro-environmentally is an important part of who I am; I am the type of person who acts pro-environmentally; I see myself as a pro-environmentally person [27,32]. Respondents rated each item on a 7-point scale, ranging from 1 totally disagree to 7 totally agree. The items of this scale formed a reliable scale as well: α ¼ 0.88 (M ¼ 5.48, SD ¼ 1.12). The mean score indicates that respondents generally see themselves as a person who acts pro-environmentally.
Personal norm to save energy was measured with four items: I feel morally obliged to save energy; It is my moral ideal to save energy; I would act according to my principles if I save energy; I feel personal responsible to try to save energy. Respondents indicated to what extent they agree with the items on a 7-point scale ranging from 1 (totally disagree) to 7 (totally agree). Cronbach's alpha was again good: 0.84 (M ¼ 4.56, SD ¼ 1.11). The mean score indicates that respondents generally do not experience a very strong personal norm to save energy.
Social norm was measured with three items which reflect injunctive norms and descriptive norms: Most of the people who are important to me think I should try to use less energy; Most of the people who are important to me will approve that I try to use less energy; Most people who are important to me try to use less energy. Respondents rated each item on a 7-point scale, ranging from 1 'totally disagree' to 7 'totally agree'. The items of this scale formed a reliable scale as well: α ¼ 0.72 (M ¼ 5.63, SD ¼ 1.14). The mean score indicates that respondents generally experience relatively strong social norm to save energy. Table 4 reports descriptives for the psychological variables.   1.00 7.00 3 Cronbach's Alpha (α), a measure that reflects to what extent the items in the scale are likely to reflect the same underlying construct, was used to determine whether the items formed reliable (or: internally consistent) value scales.
Higher scores of α indicate that the items included in the scale are likely to measure the same underlying construct [41]. The general rule of thumb is that a Cronbach's alpha of .70 are higher is good.

Total gas consumption
Gas consumption data were derived from smart meters. Fig. 1 visualises the distribution of the bimonthly household gas consumption and its probability density between January 2017 and July 2018. It displays the distribution shape of household gas consumption for each month. Wider sections of the violin plot represent a higher probability that households in the sample will use the given amount of gas, while the skinnier sections represent a lower probability that households in the sample have consumed the given amount of gas.
The total household gas consumption from January 1, 2017 to July 1, 2018 in cubic meters (m 3 ) was included as dependent variable in the analyses. The mean gas consumption in these 1.5 years was 2047.25 (SD ¼ 1465.40). Gas consumption of the participating households in 2017 (M ¼ 1015.68) was very similar to the average gas consumption of Dutch households in 2017 (M ¼ 998.46). 4 Total gas consumption was transformed into its cube root using Box-Cox transformation 5 to achieve normality of the gas consumption distribution. The mean Box-Cox transformed gas consumption was M ¼ 12.01 with a standard deviation of SD ¼ 2.94.

Statistical analysis: decision tree
To identify the most important factors related to household gas consumption, a decision tree model [42] was performed for the bimonthly household Box-Cox transformed gas consumption (in m 3 ) between January 2017 and July 2018. Building characteristics, socio-demographic variables, household behaviour and psychological factors were included as predictors of total gas consumption.
Decision tree learning methods involving continuous variables, like in the current study, are called regression trees and date back to Ref. [43]. Although this machine learning approach is widely used in other fields, they are relatively unknown in psychological research and energy research despite their clear conceptual advantages over regression models.
Decision trees are tree shaped diagrams, where through a series of dichotomous classifications the data set is split into a number of subsets. When such classifications are made for continuous predictors, an algorithm finds some optimal threshold value for classifying households based on total gas use: values of the continuous variable below the threshold, and those above the threshold, are classified into two distinct branches. With each sub split, the proportion of unexplained variance is reduced, at the cost of a more complicated model. The final result is a tree with so-called decision nodes and leaf nodes. Each decision node (reflecting a predictor variable) has two branches, defining a binary classification of lower versus higher gas consumption on the basis of the relevant predictor variable. The nodes in the final layer of the tree are denoted leaf nodes. Hence, a decision tree reflects which are the most important predictors of gas consumption, and which values of the predictor variable distinguish households with a lower versus higher gas consumption.
The decision tree reveals the average gas used by households having the specific characteristics (first number below the nodes in Fig. 2), as well as the percentage of households falling into each category (second number below the nodes in Fig. 2). Notably, by following a path from the root note (the top) to a leaf node (the bottom), one obtains a set of decision rules that result in the class of households with the given average gas consumption. Therefore, to understand which factors can explain gas consumption in the decision tree, one needs to go from the top to decision tree via each branch to the bottom, which reflects how households included in each category can be characterised.
In order to check whether the results were robust, the dataset was randomly split into a training set and testing set. Typically, the training set comprises a larger proportion of the sample, while a smaller portion of the data is used for testing. The training set is used to construct the model and the test set is used to validate the model derived on the basis of the training set. The dataset was divided into a training set and a test set, respectively, with the ratio 80:20. After a model has been processed by using the training set, the prediction accuracy of the decision tree model can be evaluated. As the target variable, total gas consumption, is continuous, Mean Absolute Error (MAE 6 ) and Root Mean Squared Error (RMSE 7 ), which provide a reliable indication of the fitness of the model, are both used to test the accuracy of the model. Specifically, MAE and RMSE are computed for both training and testing set. If the value of MAE and RMSE are very similar for the training set and the test set, then it can be concluded that a good model was built [44,45] In this study, the model was fit and visualised using the rpart and rpart.plot packages [46,47] in R [48].

Results
The decision tree was built on the basis of the training data set which included 487 respondents. Fig. 2 displays the decision tree model to explain gas consumption. It shows the mean gas consumption of households in each branch (first number in each node), and the percentage of households in the sample that end up in each branch (second number in each node). The decision tree has a total of 33 nodes among which 17 are leaf nodes. The results indicate that building characteristics, socio-demographic variables and psychological factors are all important predictors in explaining household gas consumption. Specifically, house size, residence type, and building age were the main building characteristics that are related to household gas use. Income and employment status were the main socio-demographic variables explaining gas use. Egoistic values, hedonic values, environmental selfidentity, corporate environmental responsibility and social norm were the main psychological factors that are related to gas consumption. Interestingly, room temperature setting as an indicator of gas use household behaviour was not significantly related to household gas consumption.
House size appears to be the best predictor of gas consumption, as it is the starting point of the decision tree. Specifically, the decision tree splits households into two branches: those living in a house with 160 square meter or larger use more gas than those living in a house smaller than 150 square meter.
For houses larger than 160 square meter, income was the next best predictor of gas consumption: households with an income higher than 12,000 euro use more gas than households with a lower income. In the next step of this branch, for households with a lower income, employment status is the most important predictor: respondents who are either full-time employed, self-employed, seeking work, student, house-wife/ man or retired use more gas than respondents who are part-time employed or those have other types of employment. For respondents who are part-time employed or have other types of employment, hedonic values are the next best predictor of gas use: stronger hedonic values (i.e., score higher or equal to 3.3) imply that a household uses less gas than when hedonic values are weak.
For respondents with an employment status of full time employed, self-employed, seeking work, student, house-wife/man or retired, the building year of the house is the next best predictor of gas use: houses built before 1940 are associated with a lower gas consumption than houses built later. For households living in a house built before 1940, corporate environmental responsibility is the next best predictor of gas use: a weaker perceived corporate environmental responsibility (i.e., 4 Gas use data in 2018 could not be compared with the gas use of Dutch households in 2018 as the latter data were not available yet. 5 The Box Cox transformation implies that the non-normal total gas consumption distribution was transformed into a normal distribution. ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 n � P ðY obs À Y pred Þ 2 q score lower than 4.8) is associated with a lower gas consumption, while a stronger perceived corporate environmental responsibility is related to a higher gas consumption. For households living in a house built between 1940 and 1970 or later, employment status is the next most important predictor of gas use: respondents who are either full time employed, house-wife/man or retired use less gas than respondents who are part-time employed, self-employed, seeking work, student, or those having other types of employment. For high income households, environmental self-identity is the next best predictor of gas use: a stronger environmental self-identity (i.e., score higher or equal to 5.5) is associated with a lower gas consumption than a weaker environmental selfidentity. For houses smaller than 150 square meter, residence type was the next best predictor of gas consumption: households living in a terraced houses use less gas, compared to households living in a detached houses or semi-detached houses. In the next step, for households living in a terraced house, employment status is the most important next predictor of gas use: those who are full-time employed, part-time employed, retired, house-wife/man or seeking work use less gas than respondents who are self-employed, student or have other types of employment. For respondents who are full-time employed, part-time employed, retired, house-wife/man or seeking work, egoistic values are the next best predictor of gas use: stronger egoistic values (i.e., score higher or equal to 2.5) are associated with a lower gas consumption than weaker egoistic values. For respondents with relatively strong egoistic value (score 2.5 or higher), social norm is the next best predictor of gas use: respondents who report a weaker social norm to save energy (i.e., score lower than 5.8) use less gas than those who report a stronger social norm. For respondents with weaker egoistic values, hedonic values are the next predictor: stronger hedonic values (i.e., score higher or equal to 4.8) are associated with a lower gas consumption than weaker hedonic values. For households living in a detached houses or semi-detached houses, income is the next best predictor of gas use: households with an income lower than 4500 euro use less gas than households with an income of 4500 euro or higher. For households with a lower income, egoistic values are the next best predictor of gas use: stronger egoistic values (i. e., score higher or equal to 2.1) are associated with a higher gas consumption than weaker egoistic values. For households with a higher income, the building year of the house is the next best predictor: households living in a house built between 1971 and 2000 have a higher gas consumption than households living in a house built in 2001 or later.
Notably, the households with the highest gas consumption (mean transformed gas use of 15.62 m 3 ) live in a house of 160 square meter or more, their monthly gross income is less than 12,000 euro, they are fulltime employed, self-employed, seeking work, student, house-wife/man or retired, their house is built between 1940 and 1970 or later and they are part-time employed, self-employed, seeking work, student or have other types of employment; they represent 2% of the households in the sample. Households with the second highest gas consumption (mean transformed gas use of 15.48 m 3 ) live in a house of 160 square meter or more, their monthly gross income is more than 12,000 euro, and they have a weaker environmental self-identity (i.e., score lower than 5.5); they represent 4% of the households in the sample. The group of households with the lowest gas consumption (mean transformed gas use of 8.9 m 3 ) live in a house that is 160 square meter or larger, their income is less than 12,000 euro, they are full-time employed, self-employed, seeking work, student, house-wife/man or retired, their house is built before 1940 and they believe their energy provider does not strongly endorse corporate environmental responsibility (i.e., score lower than 4.8); only 1% of the households of the sample ended up in this branch. Appendix A provides a full overview of the decision rules derived from the obtained decision tree, resulting in different classes of gas consumption.
The result further showed that for the training set MAE and RMSE were 1.99 and 2.52, respectively. The MAE for the test set was 2.1, while RMSE was 2.76. As the MAE and RMSE are very similar for the training set and the test set, it can be concluded that the resulting decision tree model is accurate [45].
Finally, a multiple regression analysis was conducted for the same data set and these results were compared with the decision tree model reported above. The regression model explained 20% of the variance in cube root-transformed household gas consumption (adjusted R 2 ¼ 14.7%, F (31, 455) ¼ 3.707, p < .001); residence type and income were the strongest predictors. Table 5 shows that the regression analysis revealed that households living in terraced houses use less gas than households living in a detached house, while higher income groups use more gas. A larger number of rooms in the residence and a weaker environmental self-identity were related to a higher gas consumption too. Self-employed respondents use more gas than those who are employed-full time, while those with a postgraduate qualification use less gas than those with a lower secondary school degree. The results are partly similar to the results of the decision tree, with residence type, income, employment status and environmental self-identity being significant predictors in both models. Number of rooms and educational degree were only significant predictors of gas use in the regression analysis. Yet, the decision tree model yielded a more comprehensive picture of relevant predictors of household gas use, and also identified house size, building age, egoistic values, hedonic values, corporate environmental responsibility and social norm as relevant predictors.

Discussion
This paper examined whether gas energy consumption of Dutch households could be explained by building characteristics, sociodemographic variables, household behaviour and psychological variables using a novel method, a decision tree model. Extending previous research that typically studied the predictive power of these types of categories separately [ [6-8,10,14,19,49-51]], the results showed that building characteristics, socio-demographic and psychological variables are all significant and unique predictors of household gas consumption. This is an important and novel finding that clearly signals that an integrated approach is needed to better understand household gas consumption, as taking into account only one type of predictor will provide a limited understanding of household gas consumption.
In terms of building characteristics, particularly house size, and to a lesser degree residence type and building age were important predictors of household gas consumption. These findings are in line with earlier studies that revealed that these building characteristics are related to gas consumption [ [5][6][7][8]10,12]]. House size was the best predictor of household gas consumption. Not surprisingly, larger dwellings (i.e., 160 square meter or more) are more likely to have a higher gas consumption. Furthermore, households living in terraced houses use less gas, compared to households living in a detached houses or semi-detached houses. These findings are in line with previous studies that have found that those living in detached house and semi-detached house used more gas than those living in other type of dwellings [5,6]. This can probably be explained by the fact that detached houses and semi-detached houses have more external walls and a larger outside wall area, and therefore have more heat losses, resulting in a higher gas consumption. Building age was the least important building characteristic predicting gas use. Specially, households living in larger houses, with a lower income, of which the respondent is either full-time employed, self-employed, seeking work, student, house-wife/man or retired, and live in an older houses use less gas than the same group that live in a newer houses. However, households living in a smaller house, living in a detached house or semi-detached house, with a higher income and living in newer houses use less gas than those living in older houses. Interestingly, number of rooms in the dwelling was not a significant predictor of household gas consumption. Importantly, extending previous research, the results show that these building characteristics uniquely predict household gas consumption, when all other variables are included in the model as well.
Of the socio-demographic variables, income and employment status were the main predictors of household gas consumption. Income was a relatively important predictor of gas consumption for households living in a larger house (i.e. 160 square meter or larger): households living in larger houses with a higher income use more gas, suggesting that people with a higher income seem to particularly use more gas when they also live in larger dwellings. These findings are in line with previous studies [11,13,14], but extend it by showing that income is also related to household gas consumption when the other predictors are included in the model too. Interestingly, respondent age, household type and education level did not significantly contribute to the explanation of household gas consumption when the other variables were controlled for.
The most notable and novel result of this study is that psychological factors play an important role in explaining household gas consumption. Specifically, egoistic values, hedonic values, environmental selfidentity, perceived corporate environmental responsibility of the energy provider and social norm to reduce energy use were all uniquely and significantly related to household gas consumption. Interestingly, households living in larger dwellings, with a higher income and with a weaker environmental self-identity are more likely to be in a category with the second highest level of gas consumption. It may be that people living in larger houses with a higher income are less likely to see themselves as the type of persons who engage in pro-environmental actions, which may imply that they are less motivated to act proenvironmentally, which could explain their higher gas use. Moreover, people living in a smaller house, and a detached house or semi-detached house, with a lower income and stronger egoistic values are more likely to use more gas than the same group with weaker egoistic values. This is in line with earlier studies that revealed that egoistic values are often negatively related to pro-environmental behaviours [25]. However, households living in a smaller house, and a terraced house, who are either full-time employed, part-time employed, retired, house-wife/man or seeking work and have stronger egoistic values use less gas than the same group with weaker egoistic values. This is contrary to what would be expected. This may be due to the fact that this finding is specific to a subsample as reflected in this branch of the decision tree. Another explanation could be that respondents in this branch of the decision tree cannot afford to use more gas, or are forced by the circumstances to use less gas. Future research is needed to explore the negative relation of egoistic values with household gas consumption and why some groups with stronger egoistic values use less gas.
Interestingly, this study showed that perceived corporate environmental responsibility of the utility company can explain household gas Table 5 Results for the multiple regression analysis of the transformed total actual gas use (m 3 Þ of Dutch households, including building characteristics, sociodemographic variables, gas use behaviour and psychological factors as predictor variables.  .318) *p < .05; **p < .01; ***p < .001.
consumption. This is a novel and interesting finding indicating that households living in larger houses, with a lower income, who are either full-time employed, self-employed, seeking work, student, house-wife/ man or retired, living in a house that is built before 1940 and think their energy provider does not strongly endorse corporate environmental responsibility have the lowest gas consumption. Yet, only a very small group of households (1%) of the sample falls into this branch. This finding is contrary to the expectations, as a negative relation was expected between corporate environmental responsibility and gas consumption, with a stronger corporate environmental responsibility being associated with using less gas. To explore this further, the bivariate correlation was inspected between corporate environmental responsibility and gas use for the whole sample of the training set that revealed a similar positive relation between corporate environmental responsibility and gas consumption. Future research is needed to examine why a stronger perception of corporate environmental responsibility of the energy provider is related to using more gas.
In contrast to the expectations, it was found that people living in a smaller house, and a terraced house, who are either full-time employed, part-time employed, retired, house-wife/man or seeking work, with stronger egoistic values and weaker social norm to save energy use less gas than the same group with a stronger social norm to save energy. Besides, people living in a larger house, and with a lower income, who are part-time employed or have other types of employment and with stronger hedonic values use less gas than the same group with weaker hedonic values. Moreover, households living in a smaller house, and a terraced house, who are either full-time employed, part-time employed, retired, house-wife/man or seeking work, with weaker egoistic values and with stronger hedonic values use less gas than the same group with weaker hedonic values. Future research could explore why these phenomena occur.
In sum, extending previous research, these results indicate that psychological variables uniquely explain household gas consumption, next to building characteristics and socio-demographics. These findings are interesting and in line with earlier studies that reveal that psychological factors play an important role in explaining household energy use [52], and extend this research by showing that psychological factors are also important to understand household gas use.
Interestingly, indoor temperature settings during day time and night time did not appear in the decision tree, and is thus not uniquely associated to household gas consumption. This may be explained by findings from earlier results that show that room temperature setting can be predicted by socio-demographic variables, environmental values and building characteristics [53]. These factors were also identified as predictors of household gas consumption in the study. Thus, perhaps for this reason, indoor temperature setting room during day time and night time does not explain unique variance in household gas consumption. Table 6 provides a summary overview of the factors that appeared to be significantly related to household gas consumption.
This study aimed at exploring which factors are associated with actual household gas consumption using a decision tree model for selecting key determinants. The results show that the decision tree is an appropriate method for exploratory analysis by detecting important variables related to household gas consumption, and visualising the relationships between different predictor variables. One of the assets of decision tree method is identifying possible interaction effect between the predictors. This study revealed several novel and interesting interactions between variables predicting gas use, such as households living in (semi-) detached houses having a higher gas consumption when they live in larger dwelling size. Likewise, people with a higher income seem to particularly use more when they also live in larger dwellings.
Furthermore, the decision tree classified households on the basis of their total actual gas consumption, and revealed that different numbers of predictors were needed for identifying different classes of households differing in gas consumption. Indeed, fewer predictors were needed for explaining household gas consumption in some branches, whereas in other branches more predictors were needed to explain gas use. Therefore, with specific combination of these predictors, an accurate insight is gained into actual household gas consumption, which would not be attained through standard regression model. The decision tree model provided novel insights about interactions between predictors, and identified different predictors for different classes of gas consumption. Notably, the decision tree model identified more relevant predictors of household gas use than the multiple regression analysis did. Specifically, the multiple regression analysis including the same predictor and dependent variable revealed that the most important predictors of household gas consumption were residence type and income, while number of rooms in the residence, educational degree, employment status and environmental self-identity were also significantly but weakly related to household gas consumption. Hence, the decision tree model yielded more comprehensive and sophisticated insights into relevant predictors of household gas use than the multiple regression analysis, by also identifying house size, building age, egoistic values, hedonic values, corporate environmental responsibility and social norm as relevant predictors (while number of rooms and educational degree were significant predictors in the multiple regression analysis, but not in the decision tree model). Interestingly, the decision tree model particularly identified more psychological variables as relevant predictors of gas use than the regression model did. The decision tree model thus provided more nuanced and richer insights into which variables could best be targeted to encourage households to reduce their gas use and therefore become more sustainable.
Although this study included a wider range of factors that may explain household gas use than most earlier studies on household gas use, still other factors may be relevant to understand household gas use. Future studies could study to what extent other building characteristics (e.g. level of insulation), household characteristics (e.g. amount of time household members are present in the home), psychological variables (e.g. concern about climate change), and occupant behaviour (e.g. number of heated rooms) would be uniquely related to household gas consumption. Furthermore, the sample of this study only included Dutch households who are a client of a specific energy company (i.e., Qurrent). Future research could examine whether the results of this study would be replicated among other samples, including general population samples. In this study, household gas use data were collected via smart meter readings which is a far more accurate assessment of gas consumption than self-reports that are oftentimes used in research. As such, the findings reported in this paper provide relevant practitioners, such as utility companies and consumers, with better insights into predictors of actual household gas consumption.
The results of this study have important practical implications, and can support relevant stakeholders, including governments, energy supplier and companies in developing interventions that are aimed at reducing household gas use. The result suggest that policy aimed to reduce household gas consumption can best target building characteristics, socio-demographic variables and psychological factors. Specifically, interventions could particularly consider house size, residence type, building age, income, employment status, egoistic values, hedonic values, environmental self-identity, corporate environmental responsibility and social norm as these appeared to be the main factors that are related to household gas consumption. Particularly, interventions could try to change these predictors, or target groups having the relevant characteristics. For example, information can be provided about the extent to which others find reductions in gas consumption important or that many others try to reduce their gas use, or environmental self-identity can be strengthened, for example by making people aware of their previous sustainable actions [54]. Alternatively, interventions could best target high income groups or people living in large or (semi-) detached houses, as they have a relatively higher gas consumption. As such, the knowledge gained from the analyses reported in this paper support energy policy making by relevant agents aimed at reducing gas consumption in important ways, resulting in lower greenhouse gas emissions. Hence, the information and implications derived from this study provide essential insights into the design of energy policy that can promote reductions in gas consumption as policy strategies are more effective when they target key antecedents of gas use.

Conclusions
In this paper, a novel approach, a decision tree method was used to explain household gas consumption. The results show that household gas consumption was uniquely related to buildings characteristics, socio-demographics and psychological factors. Specifically, house size, residence type, and building age were the main building characteristics that are related to household gas use. Income and employment status were the main socio-demographic variables related to gas use. Notably, egoistic values, hedonic values, environmental self-identity, corporate environmental responsibility and social norm were the main psychological variables that are uniquely related to gas consumption. Hence, in order to get a comprehensive understanding of gas use, it is important to consider building characteristics, socio-demographic variables and psychological factors, as they all predict unique variance in household gas use.

Funding information
The research in this project is funded by the Netherlands Enterprise Agency, as part of the TKI Urban Energy project 'ENPREGA', grant number TEGB113027. We report data from the PENNY project (see http://www.penny-project.eu/) that was funded by the European Union's Horizon 2020 research and innovation programme under grant agreement No 723,791.