Do energy efficiency policies save energy? A new approach based on energy policy indicators (in the EU Member States)

.


Introduction
The recent IPCC Special Report on 1.5 � C has highlighted the role of end-use energy efficiency for climate mitigation in order to stabilise the temperature increase at 1.5 � C by the end of the century (IPCC, 2018;Bertoldi, 2018).In addition to climate change mitigation, energy efficiency provides an important contribution to security of energy supply and in increasing business competitiveness and citizen welfare (Fawcett and Killip, 2019;Mallaburn and Eyre, 2014).During the last three decades, the European Union (EU) and its Member States (MSs) have introduced policies to reduce energy demand and improve energy efficiency (Bertoldi, 2018).The EU has adopted in 2007 the EU Climate and Energy targets for 2020. 1 The targets include a GHG emission reduction target of 20% compared to 1990, a renewable energy target of 20% and an energy consumption reduction target of 20% 2 (Carvalho, 2012;Streimikiene et al., 2012).In 2014 the EU adopted Energy and Climate targets for 2030, 3 as follows: 40% reduction in greenhouse gas emissions compared to 1990 levels; at least a 27% 4 share of renewable energy consumption; and at least 27% 5 improvement in energy efficiency (deLlano-Paz et al., 2016).
In recent years, the EU has intensified its efforts of improving for energy efficiency.The Energy Service Directives (ESD -2006/32/EC) introduced the indicative end-use efficiency target of at least 9% by 2016 for EU MSs (Apajalahti et al., 2015;Streimikiene et al., 2012;Hull et al., 2009;Thomas et al., 2012;Boonekamp, 2011).Under the ESD, MSs were obliged to prepare National Energy Efficiency Plans (NEEAPs) every three years, starting from 2008 (Ringel and Knodt, 2018;Bertoldi and Economidou, 2018).The NEEAPs provided an overview of energy efficiency activities in each MS, including descriptions of national energy efficiency measures and quantification of achieved and forecast energy savings (Bertoldi and Economidou, 2018;Hull et al., 2009).
To reinforce the progress in energy efficiency, the Energy Efficiency Directive (EED -2012/27/EU) was adopted in December 2012 (Rosenow et al., 2017).The EED contains a set of binding measures such as: legal obligations to establish energy saving schemes in MSs (Malinauskaite et al., 2019;Fawcett et al., 2019), public sector to lead by example, energy audits (Nabitz and Hirzel, 2019), energy services (Bertoldi and Boza-Kiss, 2017), energy efficiency funds, efficient CHP, metering and billing information (Zangheri et al., 2019), consumer behavior, etc.
Other EU energy efficiency policies are: the Energy Performance of Buildings Directive (EPBD -2010/31/EU), 6 which is the main EU legislative instruments for reducing the energy consumption in new and existing buildings (Dascalaki et al., 2012;Burman et al., 2014); the Eco-design Directive (2009/125/EC) to improve efficiency in energy related products (e.g.domestic appliances) (Bundgaard et al., 2017;Hinchliffe and Akkerman, 2017); the Energy Labelling Regulation (2010/30/EU) for products (Bjerregaard and Møller, 2019) and the Regulations for the Reduction of CO 2 Emissions of vehicles (443/2009/EC and 510/2011/EC) (Thiel et al., 2016).It is important to note that the above EU policies are complemented by MSs policies as described in the NEEAPs (2008, 2011 and 2014) and in the MURE database (see www.measures-odyssee-mure.eu)These policies include: taxation, financial incentives, regulation, voluntary programmes for industry, information campaigns, etc. (Bertoldi and Economidou, 2018).
The purpose of this article is to estimate with an econometric model the effect of energy efficiency policies on energy consumption in the EU MSs and Norway in the period 1990-2013.The aim of the model is to answer two research questions: 1. Are EU and national energy efficiency policies effective in reducing aggregate energy consumption?Can we estimate the policy-induced energy savings in each year from 1990 to 2013, measured as a percentage of the energy consumption as it would have been in the absence of energy policies?2. Are sector-specific (household, services, industry, transport) energy efficiency policies effective in reducing sector's energy consumption?Can we measure effectiveness of energy policies in reducing consumption of energy in each sector?
The article's structure is the following: Section 2 reviews the existing literature on measurement methods of energy savings resulting from energy efficiency policies.Section 3 illustrates the methodology for constructing the Energy Policy Indicators.Section 4 outlines the structure of the econometric models.Section 5 illustrates the estimates, the simulation methodology used to isolate the effect of energy policy from the contribution of the other determinants of energy demand and reports the results of the simulations.Conclusions are drawn in Section 6, including some directions for further research.An Appendix provides details on the database created for this article.

Measuring energy savings induced by policies: a literature review
A key issue in energy efficiency policy analysis is the measurement of energy savings attributed to policies, in order to evaluate the impact of policies (Boonekamp, 2006).The ESD introduced the obligation for MSs to measure energy savings induced by policies (Hull et al., 2009).The European EMEEES project identified two complementary methodologies for evaluating the savings: the bottom-up (BU) and the top-down (TD) approaches (Thomas et al., 2012;Reichl andKollmann, 2010, 2011;Bukarica and Tomsic, 2017).The BU assesses the energy savings in each individual project covered by the policy and then sums the individual savings (Boonekamp, 2006(Boonekamp, , 2011;;Thomas et al., 2012).BU methods do not adequately capture behavioural changes, which may increase or decrease the calculated energy savings (Reichl and Kollmann, 2011).TD methods use an aggregate measure of energy consumption, normalized by an exogenous variable that adjusts for scale across cross-section observations (e.g.kWh/m2), usually derived from national statistical data (Boonekamp, 2011;Thomas et al., 2012).To calculate the energy savings, the aggregate measure is multiplied by the activity level (e.g. total floor area in m2) in different years.TD methods include all the policies covering the sector/equipment, the autonomous effects (e.g.technologies improvements not induced by specific policies) and structural effects (e.g.changes in activity) (Boonekamp, 2011).Therefore, TD methods capture all savings and corrections to calculate only the policy-induced savings are thus difficult (Horowitz, 2008).
Researchers have proposed the use of econometric models as an alternative to the BU and TD methods (Horowitz, 2011;Filippini and Hunt, 2012;Lang and Siler, 2013), to overcome the limitation of BU and TD methods.The objective of econometric models is to identify the energy savings induced by policies as compared to other factors such as economic growth, structural changes, populations, production levels, energy prices, etc. Filippini and Hunt (2012) estimated the efficiency of US residential energy consumption by using an econometric energy demand model with a stochastic frontier function.In another article, Filippini et al. (2014) assessed the level of energy efficiency of the EU residential sector against the potential for energy savings and estimated the impact of policies.Laes et al. (2018) reviewed the effectiveness of individual policies or policy packages for CO 2 emission reduction and/or energy savings by using a panel econometric model.Aydin and Brounen (2019) have assessed the impact of specific policies on electricity and non-electricity energy consumption by focusing on two types of regulatory measures: mandatory energy efficiency labels for household appliances and building standards.
A report by the European Commission JRC (Bertoldi and Hirl, 2013) has calculated the policy energy savings resorting to the counterfactual simulation approach proposed in Horowitz (2011).This is based on dividing the observed time span in a "pre-policy period", where policies are essentially absent, and a "policy period", characterized by the existence of relevant policies; the difference between actual energy demand and the forecasted energy demand is regarded as the (estimated) savings induced by policy (Horowitz, 2011).Horowitz and Bertoldi (2015) introduced an explicit measure of energy policy as an explanatory variable in the econometric model.In order to evaluate policy effectiveness, the estimated model is analyzed through simulation techniques to isolate the contribution of energy policy from the impact of other determinants (prices, level of activity, technology, etc.).
In this paper we introduce an explicit measure of energy policy, based on the MURE database.The idea of introducing this type of variable in the econometric analysis of effectiveness of energy policies is not new (Bigano et al., 2011;Saussay et al., 2012;Filippini et al., 2014).In those papers, the adoption of policy measures is essentially measured through dummies.Building on those seminal contributions, � O Broin et al. (2015) proposed a methodology to construct time series indexes, which increase as more policies are introduced and decrease as policies become obsolete.
The approach in this article shares, in line with the work done by Horowitz (2011), Bertoldi and Hirl (2013) and Horowitz and Bertoldi (2015), the idea of using a panel econometric model to evaluate policy effectiveness.While the energy policy indicator used in Horowitz and Bertoldi (2015) is based on a methodology for transforming the ODEX 6 Recently the EPBD was amended and entered into force on 9 July 2018.

The energy policy indicators
The purpose of this article is to analyze the effectiveness of energy policies by developing econometric models for energy demand where an indicator of energy policy intensity is introduced as a focal explanatory variable, along with control variables.
All the previous studies in this research line (Bigano et al., 2011;Saussay et al., 2012;Filippini et al., 2014, � O Broin et al., 2015), are based on the MURE database.� O Broin et al. ( 2015) discussed how their indicator relates to previous research: the differences of our indicator with respect to theirs are highlighted below.The MURE database classifies about 2000 energy policy measures adopted by European countries since 1970, reporting the year of adoption and some stylized characteristics (type of measure, expected impact, …), along with a description of each measure.The database is structured by final energy consumption sectors (household, tertiary, industry, transport).In the MURE database, each measure is also classified according to a semi-quantitative impact indicator based on experts' judgement: low, medium or high impact (see below).In the next three subsections, we define several Energy Policy Indicators which will be used in our econometric models, and provide some descriptive statistics.

Base indicator (unweighted, permanent)
The basic version of our indicators simply counts the measures.
These indicators are denoted by pol U;P h;i;t where U stands for "unweighted", P stands for "permanent", and h indicates the "sector" Assuming that country i has adopted N h;i measures in sector h over the period t ¼ 1;…; T, let us number them increasingly by adoption date by n ¼ 1, … N h;i , and define.

� a n
h;i : year of adoption of the n-th policy measure in sector h, country i � s n h;i;t : dummy indicating if the n-th policy measure in sector h, country i started in year t (i.e. if a n h;i ¼ t) Our proposed energy policy indicator is then given by To understand the formula, notice that the first difference is a simple measure of the energy policy effort in country i, year t, sector h, obtained by counting the measures implemented in that country, that year, that sector.
The policy indicator pol U;P h;i;t simply cumulates the measures over time.It is important to remark that the implicit assumption underlying the idea of cumulating is that energy policy measures have a permanent effect, so that pol U;P h;i;t is a non-decreasing function of time.This aspect of the indicator is better discussed below.
The methodology has been implemented for all 29 countries, using the energy policy measures listed in the MURE database, from 1975 to 2013.In order to provide some evidence about general tendencies in the EU, we illustrate below some descriptive statistics based on averages of pol U;P h;i;t taken on all 29 countries.Fig. 1 illustrates the dynamics of the average across countries of the sectoral energy policy indicators in the period 1990-2013.
The household sector has the highest number of the energy policy measures: the final value means that on average, European countries have adopted 18 energy policy measures in the household sector between 1975 and 2013.Notice that the series equals about 1 in 1990 and 18 in 2013, meaning that the number of measures adopted (on average) between 1975 and 1990 has been about 1, while 17 measures have been adopted between 1990 and 2013: this is the reason why, in the econometric analysis carried on in Section 5, we will start from 1990.
In Table 1, we provide a summary of energy policy intensity in each country considering the entire period 1975-2013, given by the last value of pol U;P h;i;t .In practice, the numbers in the table correspond to the total number of measures in each country and each sector.

Weighting by policy relevance
In the base indicator measures are simply counted.This does not mean that we have assumed that all measures have the same effect, but rather that our purpose is to estimate the average effect, without focusing on individual measures.However, having some reliable evaluation of the relevance of each measure, we can create a weighted version of the Energy Policy Index, where such information is embodied.If the weighting system is correct, the weighted indicator should outperform the unweighted one in terms of explanatory power (i.e.goodness of fit) in the econometric model.Let us define the weights as follows.

� w n
h;i : weight of the n-th policy measure in sector h, country i The weighted policy indicator may be obtained as Of course, when w n h;i ¼ 1 for all n, h and i, the weighted indicator collapses into the unweighted one.If instead the weights are different, then different policies have a different contribution to the policy indicator.Ideally each policy might have its own weight, although in practice it is more reasonable to group policies into categories of equal P. Bertoldi and R. Mosconi weight according to their relevance. 7 Although alternative weighting systems are possible, in this paper we will analyze the weighting scheme adopted by MURE.As mentioned above, MURE provides a semi-quantitative evaluation of the impact of each measure, based on quantitative evaluations or expert estimates; the following limits (in each case in % of the overall final energy consumption of a sector) are defined for the three impact levels: low ¼ less than 0.1%, medium ¼ 0.1%-0.5% and high ¼ greater than 0.5% savings.We have therefore considered w n i ¼ 0:05%, or 0.0005, for all measures whose semi-quantitative impact is low (the same weight is given also to the measures whose semi-quantitative impact is unknown), w n i ¼ 0:3%, or 0.003, for all measures whose semi-quantitative impact is medium, w n i ¼ 0:7%, or 0.007, for all measures whose semi-quantitative impact is high.In practice, using this weighting scheme, the weighted indicator may be interpreted as the percentage decrease in energy consumption expected to be achieved by the policy measures, according to the MURE's impact evaluation, as compared to the energy consumption the sector would have experienced in the absence of policies.We leave for future analysis different alternatives, where, for example, a different weight is given to different "types" of measures, which are instead equally weighted here. 8t is important to remark that the weighting scheme may alter the "within" and "between" variability of the EP indicators substantially.As an illustration of this point, Fig. 2 compares the Energy Policy Indicators pol U;P h;i;t and pol W;P h;i;t for Italy and Germany for the household sector.The time series are illustrated from 1990 to 2013, even if they are computed starting from 1975: notice that in Germany energy policies started before 1990, that is why the German series do not start at zero.Using the unweighted indicator (left), where the measures are simply counted and cumulated over time, the number of measures adopted up to 2013 in the two countries is not so different (22 in Italy, 27 in Germany).Conversely based on the weighted indicator (right), where the MURE semiquantitative impact indicator is used for weighting the measures, energy policy in the household sector in Germany in 2013 seems to be about three times more relevant than in Italy.The difference is due to the fact than the impact of most of the policies adopted in Italy before 2005 is classified by MURE as low, so that their weight is very close to zero, whereas the impact of most of the policies in Germany is classified as high.To clarify the metric on the vertical axis of the right plot of Fig. 2, it should be noted that using the "MURE inspired" weights, the weight is 0.05% for all measures whose semi-quantitative impact is low.Similarly, the weight is 0.3% for all measures whose semi-quantitative impact is medium, and 0.7% for all measures whose semi-quantitative impact is high.Cumulating the "weights" for all the 27 measures adopted in Germany in the household sector by 2013, one gets 11%, whereas all 22 measures adopted in Italy sum up to 3.5%.
In other words, trusting MURE's assessment of the impact of the policies, and assuming that policies have a permanent effect on energy consumption, one might argue based on the right plot of Fig. 2 that energy consumption in the household sector in Germany would have been 11% higher in 2013 in the absence of policies (3.5% higher in Italy).Our idea is not to use the indicator this way, but rather to introduce it as an explanatory variable in an econometric model where energy consumption is the dependent variable (see Sections 4 and 5).For example, finding a coefficient of zero would imply that energy policies are not effective, whereas a negative and significant coefficient, but different from 1, would imply that energy policies are effective, but MURE under or over rate their impact on energy consumption.

Permanent versus transitory effects
As remarked above, in the pol U;P h;i;t indicator, policy measures are assumed to have a permanent effect, since they are counted in every year since MURE's "Starting date", even after the "Ending date".This is one of the differences with respect to � O Broin et al. (2015), where the measures are assumed to have transitory effect, since their "policy counter" decreases when the measure is discontinued.To compare the approaches, let us define.

Table 1
Value of the Energy Policy Indicator in each sector and each country in 2013.þ ¼ high intensity (more than 1.5 times the average), -¼ low intensity (less than 0.5 times the average).  Notice that giving all policies the same non-unitary weight (for example 0.5 instead of 1) would have the only effect to multiply the energy policy indicator by a constant, with no other consequence than dividing the corresponding coefficient by the same constant when the indicator is used within a linear econometric model.Therefore, it is the heterogeneity of weights, rather than their absolute scale, which makes a difference.
normative, information, …).In Bertoldi and Mosconi (2015) , the energy policy indexes illustrated here are further disaggregated into five sub-indexes, measuring the intensity of each "type", but we do not present this here since the econometric models illustrated in the following do not use these sub-indexes.Notice that the classification of policies by type is very analytic in the MURE database, therefore different aggregations of MURE "types" could be considered.For example, � O Broin et al. (2015), building on Yearwood-Travezan et al. (2013), categorize the MURE types into three groups, according to their "degree of authoritative force" (in decreasing order of authoritative force: Regulatory, Financial, Informative).We refer to the MURE database for a more detailed illustration of the classification.We are currently exploring the usefulness of sub-indexes in an ongoing research project focused on the household sector.
P. Bertoldi In practice, pol U;T h;i;t excludes the ended policies, and represents the count of the measures in force in year t: therefore it embodies the implicit assumption that when the measure is ceased, energy consumption goes suddenly back to the same level where it used to be before the measure started: the effect of energy policies is therefore assumed to be transitory, whereas pol U;P h;i;t assumes is that the measure has a permanent effect even after it is discontinued.
Perhaps, neither of the extreme assumptions is always completely valid, and possibly different policies have a different level of persistence.For example, grants or subsidies for the installation of heat pumps might not last forever, but as they are discontinued some old and inefficient equipment has been replaced, so that energy consumption has been reduced.One might argue that also the new equipment will eventually become obsolete, and therefore in the absence of incentives consumers might switch back to the old and inefficient technology.However, this rebound might be partly or completely offset since, while incentives existed, the increased level of demand for efficient products might have reduced the price of the new efficient technologies, so that they remain competitive even when grants or subsidies are discontinued.Similarly, if an information campaign is successful in modifying consumer's behavior, it is not obvious that people will return to previous habits when the campaign ends.
It is important to remark that different assumptions about the degree of persistence may alter the "within" and "between" variability of the EP indicators substantially.As an illustration of this point, Fig. 3 compares the Energy Policy Indicators pol U;P h;i;t and pol U;T h;i;t for Finland and the Netherlands for the household sector.According to MURE, 10 out of the 25 measures adopted in the Netherlands have been discontinued, mainly between 2000 and 2005, whereas almost all of the 23 measures adopted in Finland were still active in 2013.As a result, the ranking of the two countries in 2013 is different depending on which of the two version of the indicator is used.
In this article we will only analyze the two polar cases of infinite persistence and no persistence, to check if any of the two alternatives outperform the other in terms of goodness of fit in the econometric model: a detailed analysis of the issue is left for further research.

The econometric model: structure of the model and methodological aspects
Previous studies using explicit measures of energy policy intensity have been mainly focused on the household (residential) sector, see Bigano et al. (2011), Saussay et al. (2012), Filippini et al. (2014), � O Broin et al. (2015).In this article, in order to get a more general picture of the effectiveness of energy policies on the whole economy, we consider a model with 4 equations, one per sector (Households, Services, Industry, Transport, labeled by h ¼ 1, …,4 respectively).Our approach allows us to measure the magnitude of policy-induced energy savings in each sector and hence, by aggregation, for the whole economy.
Below we illustrate the variables introduced in each equation, the mathematical structure of the econometric models, and some important methodological issues leading to the choice of the appropriate estimator.

The variables involved in the model
The dependent variable in each equation is (the natural logarithm of) q 3 h;i;t , i.e. the total energy consumption in sector h, country i, year t, measured in TJ, where the superscript 3 means that we consider the total consumption for 3 energy sources, namely electricity, oil and gas.9Details of this variable are given in the appendix along with the data sources and construction methodology.As illustrated in the Appendix, the average coverage of q 3 h;i;t in 1990-2013 is more than 80% in most countries, but in a few countries it is as low as 50% (especially in former planned economies and Nordic countries).Moreover, the quota of other sources has changed dramatically over the period considered in our study (usually decreasing).Of course, countries where the consumption of energy from other sources was initially large, and then decreased, have experienced a corresponding growth in consumption of oil, gas and electricity, not accounted for by other covariates.To overcome this problem, we have introduced in the model the control variable other 3 h;i;t , see below for a discussion.
As for the other explanatory variables, a short description is provided in Table 2, along with the expected sign in each sectoral equation.Details on the sources and the methodology are provided in the appendix.
The focal explanatory variable in each equation of the model is the sectoral Energy Policy Indicator, constructed along the lines illustrated in Section 3. We have considered alternative versions of the EP indicators in the model, as discussed in Subsections 3.1, 3.2 and 3.3.It is important to remark that the indicator introduced in the equation for each sector h is based exclusively on the measures that are relevant for the sector.Another key variable when the model is used to analyze the

Fig. 2. Energy Policy Indicators in Germany and Italy, comparing pol U;P h;i;t (left) and pol W;P h;i;t (right).
effect of energy policies is the lagged dependent variable, whose role and interpretation is discussed in Subsections 4.2, 4.3, 5.2 and 5.3.The other variables introduced in the model play the role of control variables, essentially chosen among the classical variables introduced in energy demand studies.These variables are used in the model essentially to avoid omitted variables bias potentially arising from correlation with due focal variable, i.e. the energy policy indicator.Therefore, we are not so interested in estimating or interpreting their coefficients, although it is clearly reassuring that the sign and magnitude of the coefficients is in line with the theoretical expectation and the typical finding in other studies.For the residential sector, an extensive literature review and comprehensive list of the variables may be found for example in Tsemekidi Tzeiranaki et al. (2019): the typical variables are sectoral energy prices, socio-economic indicators (e.g., population, income, etc.), climate, and more specific sectoral variables such as dwelling characteristics and household attributes.For the industry sector, models are based on the production theory; a review of the econometric literature on energy demand with the list of explanatory variables may be found in Adeyemi and Hunt (2007) and Khayyat (2015).The key variables are the level of activity and the sectoral price of energy; most studies add some indicator of capital formation (investment) and the price of the other production factors (capital and labour).For the services sector, we did not find many econometric studies on energy demand, although given the nature of the sector, one can get inspiration from the studies focusing on industry (since the economic theory of production is central to both sectors) and to some extent household (given the relevance of climate for both sectors).Finally, for the transport sector, a rather complete list of variables may be found for example in Gupta et al. (2019).The variables in this sector are sectoral energy prices, GDP, population, along with sector specific variables measuring the amount of transported passengers and freight.Linear or quadratic trend is usually introduced in all sectors to account for hardly measured variables which can be assumed to be smooth functions of time (technology, habits, …).

Structure of the econometric model
The structure of the four equations is identical, namely:    The variables are introduced and discussed in the previous subsection (see Table 2).SSV h;i;t is a vector of sector specific variables.Specifically, as seen in Table 2, for sector 1 (household) SSV includes: lnðarea s1 i;t Þ, percequip s1 i;t and lnðrcons s1 i;t Þ.For sector 2 (services) SSV includes: lnðrva s2 i;t Þ and lnðempl s2 i;t Þ.For sector 3 (industry) SSV includes: lnðrva s3 i;t Þ and lnðrginv s3 i;t Þ.Finally, for sector 4 (transport) SSV includes: lnðcars s4 i;t Þ and lnðgoods s4 i;t Þ.
Each equation has a country effect β h 0;i , accounting for country specific unobservable time invariant characteristics.This country effect is not handled by using fixed or random effects, but rather by differencing, usingthe Arellano-Bond estimator (see below).
The dependent variable q 3 h;i;t is log transformed to account for heteroscedasticity (clearly visible in preliminary estimates where the variable was not log transformed).We also introduced the lagged dependent variable lnðq 3 h;i;t 1 Þ as a right hand side variable in the model.This is an important difference with respect to previous papers introducing energy policy indicators in panel data models for EU countries; in fact, most of the aforementioned literature has used static models, mainly the fixed effect model or frontier techniques.Preliminary analysis carried out with our data using static models has shown clear evidence of residuals autocorrelation, and this is indeed one of the motivations for considering dynamic models.However, the role of the lagged dependent variable is not only to get uncorrelated residuals.It also represents the idea that the adjustment of energy consumption to changes in the policies (and in the other variables) is not instantaneous, but takes time.Dynamic panel models have been proposed in Balestra and Nerlove (1966) in the framework of gas demand analysis.They developed a panel model for total gas demand, where the key equation is Here G it represents the current total demand in year t and country i, which is decomposed into new demand, G * it , originated by new appliances, plus old demand, which is a proportion of the previous period's demand, r being the depreciation rate for gas using appliances (r is therefore assumed to be a real number between zero and one, so that ρ ¼ 1 r is also between zero and 1).New demand G * it is then modelled as a linear function of suitable vector of covariates, The presence of the lagged dependent variable induces inertia in the adjustment process, so that a permanent change in the x's takes time to transfer entirely on G, since only a proportion r of the appliances is replaced every year.A similar argument can be applied to buildings or, more generally, to habits.For more details and further references on the interpretation of the adjustment process in dynamic panel models see, for example, Greene (2012), Section 11.11.3, andBaltagi (2005), Chapter 8. 10  In order to estimate the possible effect of energy policies on energy consumption using the model, the parameter of interest is γ h .This parameter is expected to be negative.The parameter ρ h also plays an important role, discussed in Subsection 5.2, where the adjustment process implied by ( 5) is discussed through a simulation exercise.

Econometric issues and estimation technique
Evaluating the effect of policies on macro aggregates is problematic, (see for example Athey and Imbens, 2017), since the policy variable is potentially endogenous, i.e. correlated with the error term.In our framework, endogeneity might originate from omitted variables possibly affecting both energy policy and energy demand, and from reverse causality (policies might depend on past energy consumption).In our opinion simultaneity, which is another potential source of endogeneity bias, is less of a concern here.Using the right inferential tool is therefore crucial in order to avoid biased estimates, which would only identify correlations rather than causal effects.The paradigm of the statistical analysis of causal links, namely randomized controlled experiments, is not viable in macro settings (Cooper, 2018).Other tools like regression discontinuity design are also essentially designed for evaluating policies effectiveness using micro data.Conversely, the difference-in-differences method can be used in macroeconomic analsyis (Giavazzi and Tabellini, 2005;Papaioannou and Siourounis, 2004;Persson, 2005;Persson and Tabellini, 2006;Rodrik and Wacziarg, 2005).This method has been used for assessing the impact of specific carbon and energy policies (Filippini and Zhang, 2019;Meng et al., 2017;Lin and Zhu, 2019).However, differences-in-differences assumes that the policy intervention may be represented as a binary variable (on/off), which is not the case here.
Therefore, the appropriate tool in our case is the Instrumental Variables (IV) Estimator, whose importance for causal inference is illustrated, for example, in Angrist and Pischke (2009), Section 4.1.Before introducing IV in our case, let us first remark that in general, and even more so in the macro settings, an appropriate and complete set of control variables is the best antidote against omitted variables bias.The panel setting is to some extent helpful in this sense, since unobserved heterogeneity potentially correlated with the policy variable can be dealt with either by fixed effects or by differencing: therefore, any problem originating from omitted time-fixed variables is neutralized.However, introducing appropriate time varying control variables is extremely important.In the present research, we have included all the variables which are usually introduced in the energy demand literature.We also added two control variables that are not so usual, namely the lagged dependent variable (i.e.energy consumption in focal sector in the previous year), and the share of other sources on total energy consumption.We believe that including these two variables is extremely important, since.
(i) they might have an impact on energy policy, which can be expected to be more intense in a given year, everything else being fixed, if energy consumption in the previous year increased (due to reverse causality) and if the share of other sources increased (since our energy policy indicator is sector specific, but not source specific) (ii) they also might have an impact on the current consumption of energy: as most aggregate economic variables, energy consumption is likely to adjust slowly, and is therefore likely to be correlated with its own past; moreover, everything else being fixed consumption of a given group of energy sources is likely to be instantaneously negatively correlated to the share of other sources.
Excluding these variables from the analysis might therefore potentially bias the coefficient of energy policy.
Even when the set of control variables is generous, it is still important to test the null hypothesis (H 0 ) that the policy variable is exogenous (i.e.uncorrelated with the error term), versus the alternative (H 1 ) that it is endogenous (i.e.correlated with the error term).This can be done through the t version of the Hausman specification test (see Hausman, 1978;Wooldridge, 2002, p. 290).The test is based on the difference 10  O Broin et al. (2015) also address the issue of the delay needed for a policy measure to induce a reduction in energy consumption.However, their approach is different, since they introduce a lag in the energy policy indicator (up to 7 years) in their econometric model, selecting the best fitting lag based on testing.Their approach implies that the policy has absolutely no effect before the selected lag, after which it suddenly becomes fully effective.Conversely, with our approach we assume a gradual effect, starting on the same year where the policy is implemented: our empirical analysis suggests that this is very effective in capturing the "slow adjustment" features.

�
P. Bertoldi and R. Mosconi between the two estimates, b q ¼ b γ h 1 b γ h 0 where b γ h 0 is obtained using the policy indicators as instruments for themselves, and is therefore efficient under H 0 but inconsistent under H 1 , while and b γ h 1 is obtained introducing suitable instrumental variables for the policy indicators, and therefore, provided that the instruments are appropriate, it is a consistent estimator under both H 0 and H 1 , although inefficient under H 0 .If no misspecification is present, the probability limit of b q is zero: hence, when the two estimates are similar, the evidence is in favour of H 0 , and the practical implication is that b γ h 0 is consistent and efficient, and therefore its causal interpretation makes sense.Conversely, if the two estimators are too different, we reject H 0 , and the practical implication is that only b γ h 1 is consistent and can therefore be given a causal interpretation.It is important to remark, that this procedure rests upon an "appropriate" choice of the instrumental variables (see Angrist and Pischke, 2009); instruments should be.
(i) correlated with the potentially endogenous regressor (i.e. the energy policy indicator), otherwise we say that the instruments are "weak", and IV estimates will be very unstable (ii) uncorrelated with any omitted variable in the equation of interest (this assumption is sometimes called exclusion restriction or orthogonality condition), otherwise the IV estimates will be biased Following Keane (2010), an appropriate set of instruments can be determined based on an economic interpretation of the determinants of energy policy.In fact, instruments may be selected among the determinants of energy policy (this would avoid weak instruments) which the economic theory would exclude from the focal equation (so that the exclusion restriction is fulfilled).Energy policies have multiple objectives, not just energy savings, see for example Haydt et al., (2014).Among the goals of energy policy: reducing dependence on imported energy, preserving natural resources and minimize environmental impacts (reducing CO2 emissions and possible climate changes), reducing dependence on non-renewable sources, diversifying sources to reduce dependence on suppliers, increasing national production of energy, improving efficiency.Based on data availability, we have therefore introduced as instruments an indicator of energy dependence and an indicator of greenhouse gas emission, both evaluated in the previous year, ed i;t 1 , and ghge i;t 1 (see the appendix for details on these variables).One might argue that energy consumption in a given year might be correlated with energy dependence and greenhouse gas emissions in the previous year: however, it is reasonable to assume that such dependence vanishes once energy consumption in the previous year is controlled for, as we do.
Taking into account the previous argument, in this article, all equations have been estimated separately using the Arellano and Bond (1991) estimator for dynamic panel models, in short AB 11 ; AB estimator is specifically designed to deal with the econometric problems arising from the presence of the lagged dependent variable.In the panel data framework this estimation technique has several advantages over alternative candidates (see for example Wintoki et al., 2012): -Since AB is based on differencing, it is not affected by bias due to omitted country effects.In other words, (time-fixed) omitted country characteristics influencing both energy policy and energy demand are not expected to create any bias.In fact, estimators based on differencing, exactly as the standard fixed effect estimator, makes use of the within variability only, neglecting the between variability which is the source of bias in this case.-If the model includes the lagged dependent variable to account for partial adjustment, which is our case, GMM based estimators using "internal" instruments (i.e.lagged differences of the dependent variable), such as AB, are preferable to OLS based estimators, such as fixed effects.In fact, fixed effects estimates of the lagged dependent variable parameter would be biased towards zero (even if the bias vanishes as T goes to infinity, it can still be substantial for T ¼ 23 as in our case). 12-We strongly believe that estimator based on static models, such as CCEMG (Pesaran, 2006) or AMG (Eberhardt and Teal, 2010) should be avoided in this case: as clearly illustrated in Wooldridge (2002), Roodman (2008), Wintoki et al. (2012), when some of the regressors are potentially influenced by past values of the dependent variable (reverse causality), the lagged dependent variable also plays the role of control variable, and excluding it might severely bias the coefficient of the policy indicator.This is likely to be the case for the policy variable, which is the focal variable in our analysis.-Although possibly omitted time-fixed determinants are not creating any problem, since they are dealt with by differencing, the standard version of AB has the problem of not being robust to omitted time-varying variables possibly correlated with the policy variables (the motivation which stands behind CCEMG and AMG).However, we remark that, on top of the "GMM-style" instruments used for the lagged dependent variable, Stata command xtabond2 allows to introduce ordinary "IV-style" instrumental variables, to avoid inconsistency of the parameter estimates for the variables suspected of endogeneity.It is therefore possible to perform the Hausman specification test as illustrated above.

Parameter estimates and evaluation of energy policy effectiveness
In this Section we illustrate the estimates of the parameters in model (5), and discuss how the estimated dynamic econometric model can be used to evaluate the effectiveness of energy policy in each Sector and to estimate the total energy saving induced by energy policy in each country and in Europe over the period 1990-2013.

Parameter estimates
The AB estimates of model ( 5), using the unweighted permanent Energy Policy Indicator pol U;P h;i;t are summarized in Table 3. 13 The table also reports the results of the usual misspecification tests: the robust m 2 statistics (see Arellano and Bond, 1991) supports the assumption of 11 For a concise illustration of the Arellano Bond estimator, the assumptions behind this estimation technique, and how the estimator addresses the endogeneity problems see, for example, Greene (2012), Section 11.8.3.Computation has been performed using xtabond2 in Stata® 15, see Roodman (2009).
12 It is sometimes claimed that AB has been developed for large N and short T and not for moderate N and large T as the data set used in this paper, and that therefore AB would not be ideal in this setting.However, the only problem of AB with large T is that the number of instruments is quadratic in T, so that, when T is large, the default number of instruments might be extremely large, possibly overfitting the endogenous regressors.However, Roodman (2009) xtabond2 allows for the specification of the particular lags to be included in estimation, rather than relying on the default strategy to include them all: therefore, large T is not a problem in our case. 13We have used the "difference GMM" estimator, introducing all available lagged differences of Δlnðq 3 h;i;t Þ as instruments for the lagged dependent variable (the results actually do not change if we use just a few lags as instruments).All other regressors have been treated as exogenous, an assumption which seems to be reasonable in the light of the results of the "Difference-in-Hansen statistic", whose null hypothesis is that the variables which are assumed to be exogenous are in fact uncorrelated with the error term (see below for more discussion on this point).The "robust" option of xtabond2 has been used to get standard errors that are robust to heteroskedasticity and arbitrary patterns of autocorrelation within individuals.
P. Bertoldi and R. Mosconi serially uncorrelated errors in (5) for Industry and Transport, whereas for Household and Services a moderate correlation is detected (although lack of correlation is accepted at 1% significance level).The robust Hansen J statistics (Hansen, 1982) confirm that, the instruments, when jointly analyzed for over-identifying orthogonality restrictions, may be regarded as exogenous.Moreover, the difference-in-Hansen statistics, 14 which is designed to test the null hypothesis that the supposedly exogenous regressors are uncorrelated with the error term, accepts the null.
Before discussing the estimates of the key parameters, let us briefly observe that the coefficients of the control variables are all correctly signed, with the only exception of the negative sign on lnðpop i;t Þ and lnðrgdp i;t Þ in the household equation, although these negative parameters are compensated by the positive parameter on the other "scale" variable lnðrcons s1 i;t Þ.This suggests that the scale effect in the household sector is small.In the other sectors the scale effect (essentially measured by the sum of all the parameters of the scale related variables) is higher, especially for services and transport.The price elasticities are always negative, ranging from 0.054 (transport) to 0.40 (services).These figures might, at first sight, appear low as compared to other studies, but it has to be remarked that β h 1 is the impact elasticity, whereas the long run elasticity is given by β h 1 1 ρ h (see the discussion in subsection 5.2), and it is therefore almost twice as much, which is in line with usual results.The sector specific variables for services and industry are correctly signed but very insignificant, as well as floor area for household.The number of cars also appears insignificant, with a wrong sign and very small magnitude: we also tried to include passenger-km (source Enerdata) as an alternative way to account for passenger transport, but the result is similar; conversely, freight transport seems much more important and significant.
Focusing on the Energy Policy Indicator, the coefficient is negative in all four equations -which is consistent with effectiveness of energy efficiency policies.Before discussing these estimates in detail, as anticipated in Section 4.3, we wish to test for exogeneity of the Energy Policy Indicators by re-estimating the model treating this variable as endogenous, and therefore introducing suitable instrumental variables.As illustrated in Section 4.3, we use as instruments an indicator of energy dependence and an indicator of greenhouse gas emission, both evaluated in the previous year, ed i;t 1 , and ghge i;t 1 (see the appendix for details).Table 4 illustrate the result of the Hausman t-test, see section 4.3 for details and references on the test; the estimates when pol U;P h;i;t is treated as endogenous (and therefore instrumented) are very similar to the estimates reported in Table 3, and the Hausman test does not reject exogeneity 15 : therefore, in the following we will consider pol U;P h;i;t as exogenous, and analyze the estimates in Table 3, since they are theoretically more efficient.
The estimated coefficient for the energy policy indicator is negatively signed, as expected, in all sectors, although with different magnitude and significance.In particular, it is insignificant for Services (p-value 0.846), almost significant for Household (p-value 0.162), weakly significant for Industry and Transport (p-value 0.058 and 0.048 respectively).Low significance was expected, since clearly the current version of our energy policy indicator is a poorly measured proxy of the "true" policy effort.However, the fact that the sign of the coefficient is in line with expectations is very encouraging.As discussed in Section 3, the quality of the indicators can be improved, and our preliminary results in a follow up study limited to the household sector and a subset of countries seem to suggest that a better proxy increases magnitude and significance of the estimated coefficient, without altering its sign.Moreover, as discussed in Wasserstein and Lazar (2016), a p-value >0.05 does not imply that the null hypothesis is true, and "scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold" (see also McShane and Gal (2017), Amrhein et al. (2019), and most of the contributions to the 2019 special issue of The American Statistician "Statistical Inference in the 21st Century: A World Beyond p < 0.05").
We also checked whether the alternative versions of the Energy Policy Indicators discussed in Section 3, namely pol W;P h;i;t and pol U;T h;i;t , improve the fit of the model.Due to space constraints we do not report the results (available upon request), but essentially the performance of the two alternative indexes is quite similar in terms of sign, magnitude and significance as compared to the base version of the indicator.16

Interpretation of the policy coefficients
The EP indicators have negative sign, which supports the effectiveness of energy policies in all sectors except services, where the coefficient is very close to zero and insignificant.To interpret the magnitude of the coefficients, notice that the EP indicators in this version of the model are the unweighted ones.Therefore, a unit increase in pol h;i;t corresponds to the adoption of a single policy measure.In the household sector, the estimated coefficient b γ 1 ¼ 0:0017 implies that when pol 1;i;t is increased by 1 (i.e. a new measure is adopted in the household sector), the logarithm of energy consumption is reduced on average by 0.0017, which implies that energy consumption is reduced by about 1.7 per thousand in the same year (80% significant).b γ 1 is sometimes referred to as short-run elasticity.Due to the autoregressive component, the change in pol 1;i;t induces a dynamic adjustment leading, in the long run, to a reduction in the logarithm of energy consumption in the household sector equal to b γ 1 1 b ρ 1 ¼ 0:00354, i.e. about 3.5 per thousand, so that on average it takes about 3 measures -and some time -to reduce consumption by 1%.A similar argument may be made for each sector, leading to the short and long run saving associated to one (typical) measure, reported in Table 5.
The estimates suggest a much stronger effectiveness of energy policies adopted in the industrial sector, where the percentage saving associated to each measure is estimated to be, in the long run, about 2%. 14This test is also also called C statistic in the literature, see Baum at al. (2003) and Roodman (2009).As Roodman (2009) points out, "the Hansen test should not be relied upon too faithfully, because it is prone to weakness … the test actually grows weaker the more moment conditions there are".Since we have as many as 253 instruments, we tried to severely reduce the instruments count by reducing the number of lags of the differenced dependent variable to be used as instruments: the results remain essentially the same. 15The Hausman test could not be computed for the Industry sector, since in that case the standard error associated to the difference of estimates turns out to be negative (even using the option sigmamore in Stata command Hausman).This circumstance is usually referred to as "Heywood case".This is usually interpreted as evidence in favour of H 0 .However, following Schreiber (2008) who is sceptical about this recommendation, we also considered the absolute value of the statistic, which is equal to 2.15 (p-value 0.03): this would marginally reject H 0 for the Industry sector.Notice however that, the estimates based on Instrumental Variables for the Industry sector would indicate that the impact of the policy indicator is higher and more significant than what emerges from Table 3, therefore our conclusion would be unchanged.

Estimating policy-induced energy saving in each country and in Europe
Estimation of the policy-induced energy saving in each country in 1990-2013 may be obtained through a simulation exercise. 17To isolate the effect of the policy variable, we consider the estimated counterpart of equation ( 5), dropping all other regressors, the intercept and the error term.This gives the following simple equation, which is all we need to discuss the dynamic effect of pol h;i;t when the other variables are fixed: Based on ( 6) and assuming 0 < b ρ h < 1, it can be easily seen that when pol U;P h;i;t 1 takes on some value, and ; then, the entire historical path of the policy variable from 1990 to 2013 is given as an input, and the response of energy consumption is measured and analyzed.Assuming that the future values of all regressors, as well as the error term, are not influenced by the current value of the energy policy (otherwise neglecting them would not be appropriate), the final step of the simulation is a measure of the energy saving in 2013 induced by all energy policies introduced from the beginning to 2013.Moreover, the simulation provides a measure, in each year, of the percentage energy saving induced by energy policies adopted up to that year.This approach is like a negative image of the counterfactual simulation approach adopted, for example, in Horowitz (2011): there, the model is estimated without the policy variable using the pre-policy period, and then the energy policy is set to zero in the simulated period, leaving the other variables at their historical level.
Conversely, here we estimate the model using the entire period (we do not need a "policy free" period for estimation), and then we simulate the entire period as if the other variables are fixed, allowing only the policy variable to change.
As an example, Table 6 provides the details of the dynamic  simulation exercise for the Household sector in France, Germany and Italy; the simulations are then illustrated in Fig. 4. The point estimates suggest that the consumption of energy the household sector in France in 2013, due to all policies adopted since the nineties, was 15.9% less than what it would have been in the absence of policies.In Germany, the percentage of energy savings was 9.4% in 2013, whereas in Italy it was 7.0%.Of course, confidence intervals around these figures would be useful, but the computation is not easy, and we leave it for future research.The differences between France, Germany and Italy are due to the fact that, according to MURE's database, the number of measures adopted in the household sector are different in the three countries (47 in France, 27 in Germany and 22 in Italy).
It may be worth noticing that these figures are likely to slightly underestimating the effect of policies.It is well known that measurement error in the independent variables determine an "attenuation bias", i.e. a bias towards zero, in the estimates (see for example Fuller, 1987), and clearly, the Energy Policy indicators proposed here suffer from measurement errors.Investing on the indicators to improve their quality might reduce the problem.
Table 7 reports the final value of the simulation for each country and each sector, i.e. the estimated percentage policy-induced energy savings in 2013 (we have changed the sign from negative to positive for readability).By multiplying the percentage savings by the actual consumption in 2013 one gets the absolute saving (in TJ), which may then be aggregated across countries to obtain the total saving in 2013 for EU28 plus Norway, and across sectors, to obtain the estimated savings in each country for all sectors.
The estimated percentage policy-induced energy savings aggregated on EU28 plus Norway are about 10% for Households and Transport, about 20% for Industry, and about zero for Services, which corresponds to 12.1% when we aggregate all sectors, and is equivalent to about 4.9 million TJ.The aggregate figure for EU is consistent with other studies (Horowitz and Bertoldi, 2015), whereas the comparison among MSs sometimes seems to contradict the common sense on the level of commitment of national governments with respect to energy policies.As we already pointed out, the results reported in Table 7 are mainly related to the number of measures reported in the MURE database for each country, which deserves further investigation.For example, according to the MURE database the average number of energy policy measures adopted by European countries between 1990 and 2013 was 66; Spain is reported to have taken 139 measures while Denmark only 27.Even worse if we focus on "high impact" measures: according to the MURE database, the average number of "high impact" energy policy measures between 1990 and 2013 was 21; Spain is reported to have taken 91 "high impact" measures while Denmark only 3.
This discussion underlines the problem of treating a policy (as defined in MURE) as equivalent to another: how does this affect our model results and thus our simulations reported in Table 7? To answer this, let us first restate that the purpose of our model is not to analyze which policy is more effective: what we try to do is to estimate the average effect of one policy, being aware that individual policies are different, and therefore might have different effect.Clearly, what we mean by "one policy" has to be defined properly, and our paper leaves this burden to MURE and their guidelines (Schlomann et al., 2016): our attempt is to stimulate the discussion on this, pointing out that, quite likely, national MURE teams interpret the guidelines, and the notion of "one measure", differently.Although this is not the perfect situation, we think that the results in Table 7 are interesting, although preliminary.It is worth remarking that systematic differences in the interpretation of the guidelines among countries are to some extent accounted for by the "country effect" β h 0;i in equation ( 5).In other words, countries interpreting the notion of "one measure" more restrictively, hence reporting fewer measures, are likely to have (everything else being fixed), a lower country effect.The opposite is true for countries adopting a more extensive notion of measure.Unfortunately, country effects also take into account other omitted variables, therefore disentangling the specific component of β h 0;i associated with country specific mis-measurement of the policy indicator is difficult (although country-specific heterogeneity of the parameter γ h might be considered in the future).However, assuming that extensive and restrictive interpretations of the notion of measure balance across countries, the estimate of the overall European policy-induced energy savings (12.1%) can be regarded as plausible (although possibly underestimating the real effect due to attenuation), even if the contribution of the individual countries to the overall result has to be taken with caution.Indeed, a more careful analysis of the MURE database, possibly correcting this unbalance using the IEA database, might lead to a better and more reliable energy policy database, resulting in a more precise assessment of the amount of the policy-induced energy savings in the individual MSs.

Conclusion and policy implications
In this article, we have developed new econometric models aimed at providing some suggestive evidence of the effect of energy efficiency  The models are then used to try to estimate a quantitative measure of the policy-induced energy savings from 1990 to 2013, measured as a percentage of the energy consumption as it would have been in the absence of energy policies.
In order to avoid any bias in the estimates, we have introduced a suitable set of control variables and, following a standard approach in the macro-econometric literature, we have used suitable Instrumental Variables (IV) for the energy policy indicator (an indicator of energy dependence and an indicator of greenhouse gas emission).IV might fail to deliver unbiased estimates if the instruments are correlated with the error term in the focal equation, and if the instruments are weak: however, proper tests suggest that neither of these problems is likely to affect the proposed model.Therefore the figures given in Table 7 can be tentatively interpreted as estimates of policy induced energy saving, confirming suggestive evidence of the effect of policy measures on energy demand.This paper hopefully will stimulate further research providing additional evidence on the issue.
Although the energy policy indicator is very aggregate, being given by the total number of policies adopted by each country in each year, the estimated parameters suggest that the "average" energy policy measure is associated to a non-negligible percentage decrease in energy consumption.In order to assess the effectiveness of different policies we have replaced the basic policy intensity indicator with a weighted version, where the weights are given by the MURE "impact indicator", however this did not prove superior to the basic one in estimating energy savings.This highlights that more research is needed to find out rigorous and operational impact evaluation criteria.Alternative weighting schemes may be tested (for example based on the "policy type").
In summary, the results are the following: (i) Energy policies contributes to reducing energy consumption.In the absence of energy policies consumption in EU28 plus Norway would have been approximately 12% higher in 2013, in line with findings by other authors.(ii) The energy savings induced by policies seems to be higher in Industry (20.4% savings in 2013 for EU28 plus Norway), intermediate for Household (8.5%) and Transport (11.9%), 18 while for Services the magnitude and significance of the effect seems negligible.(iii) For most MSs the ranking based on energy policy-induced energy saving seems in line with previous findings (Filippini et al., 2014), while in some cases the results diverge.The possible explanation is linked to the number of policy measures reported in the MURE database, which for some countries seems too low or too high.
The present research points to some preliminary policy conclusions: the policies adopted by EU MSs have been more effective in the industrial sector (Malinauskaite et al., 2019), composed by a relative small number of operators and dominated by large energy intensive plants.Active collaboration between private organizations and public authorities, e.g.voluntary agreements in the Netherlands and Finland (Rezessy and Bertoldi, 2011), and clear competitiveness advantages for companies have contributed to successful implementation of policies.The service sector results confirm the difficulties to have effective policies, due to the large number of SMEs in the sector, the low economic benefits for private operators, the landlord-tenant dilemma and the large number of public buildings (Schlomann and Schleich, 2015).In the residential and transport sectors, the analysis confirms the important contribution of policies on mitigation of energy demand despite the large number and The average impact of one policy measure in the transport sector is estimated to be higher than in the household sector (see Table 5), but the number of policies is higher in the household sector in most countries (see Table 1): this results in a similar energy saving is the two sectors.
P. Bertoldi and R. Mosconi fragmentation of decision makers.However, there are some limitations on the policy implication derived by these preliminary results given the aggregate nature of the indicators containing all policy measures, for the example the impact of different type of policies (e.g. market based instruments, regulation, information, etc.) cannot be derived.
In this article, we have identified possible areas for future research focusing on energy policy indicators and models: 1. Improving Energy Policy Indicators: this is a key issue, since a more reliable and accurate database, might reduce the attenuation effect, resulting in a more precise assessment of the amount of policyinduced energy savings in the individual countries.The results obtained in this research are encouraging, since the sign and magnitude of the indicators based on the MURE database "as is" are coherent.Improvements might come from using the information about the type of measure (financial, regulatory, information, etc.).2. Adapting the model to address further issues: It would be interesting to analyze whether the impact of sector specific energy efficiency policies changes according to the type of energy source.Ideally, it would be possible to introduce "energy source specific" EP Indicators.With these indicators, it would be possible to estimate source specific equations (e.g.electricity consumption in the household sector), and within this class of disaggregate models one might also explore the empirical relevance of inter-fuel substitution between coal, oil, gas, and electricity, see Hall (1986), Urga and Walters (2003), Stern (2009).Another interesting aspect to be considered for the industrial sector is the presence of composition effects.The changes of energy intensity in the industrial sector is also due to the relative decline of energy intensive manufacturing.This shift has also increased the use of electricity with respect to oil, coal, gas.Estimated input demand function at an aggregate level should therefore include an indicator accounting for changes in the weight of energy intensive industry over time.3. Improving the econometric methodology: one interesting issue in the dynamic analysis of energy panels, pointed out for example in Madlener et al. (2011), is the non-stationarity of most of the variables involved in energy studies.To exclude possible inconsistency in the estimates and spurious regression problems, it would important to carry out appropriate unit roots and cointegration analysis within the panel framework (see among others Baltagi et al., 2001).It would also be possible to explore possible heterogeneity of the coefficients across countries, as in the case of Greene (2012).The evidence from single country energy demand suggests for example that price elasticities might differ substantially across countries.Hsiao and Pesaran (2008) illustrates how panel models may allow for heterogeneous coefficients.Allowing for parameters heterogeneity would enable researchers to explore whether energy policy effectiveness is equal in all countries.Finally, it would be interesting to explore the use of additional or alternative instrumental variables to the ones used in this paper, such as political variables (see for example Datta and Filippini, 2016).
� q 4 i;t ¼ P 4 h¼1 q 4 h;i;t : total energy consumption in the four sector mentioned above from the 4 sources mentioned above in country i year t; it is not total energy consumption in the country, since it excludes other sources and other sectors � q 3 i;t ¼ P 4 h¼1 q 3 h;i;t : (excludes solid fuels also) � q h;i;t : total energy demand (all sources) for sector h, provided by Eurostat � q i;t : total energy demand (all sources, all sectors), provided by Eurostat The average coverage of q 4 i;t and q 3 i;t on total (including all sectors and all sources) energy consumption q i;t in each country is given in Table 8.

Table 8
Average coverage (1990-2013) of q 4 i;t and q 3 i;t on total energy consumption q i;t .- ¼ low coverage (less than 70%)

Price variables
The source is Enerdata, and all prices have been converted in KEuro/TJ from the original unit.Despite the effort in reconstructing many points based on reasonable assumptions, many missing values remain (mainly in the 90's in former planned economies countries).

� p h;s;
More specifically, we have chosen the following series from the Enerdata database: O p 1;1;i;t : "Price per toe in € of electricity for households (taxes incl.)",divided by 0.01163 to convert in KEuro/TJ.This series from Enerdata is very similar to the Eurostat series for household consumer band DC, which is the median band with the highest number of electricity and gas consumers in the majority of Member States. 19We have opted for the Enerdata series since it has fewer missing values.
O p 1;2;i;t : "Price per toe in € of natural gas for households (taxes incl.)NCV", divided by 0.01163 to convert in KEuro/TJ.This series from Enerdata is very similar to the Eurostat series for household consumer band D2, which is the median bands with the highest number of gas consumers in the majority of Member States. 20We have opted for the Enerdata series since it has fewer missing values.
O p 1;3;i;t : "Price per toe in € of light fuel oil for households (taxes incl.)",divided by 0.9 � 0.01163 to convert in KEuro/TJ.O p 1;4;i;t : "Price per toe in € of bituminous coal for households (taxes incl.)",divided by 0.01163 to convert in KEuro/TJ.We have not considered the price of other solid fuels since the series are too incomplete.The coverage of bituminous coal on solid fuels seems high.The series has many missing values, especially in those countries where the weight of solid fuels for household is low.O p 2;1;i;t : There is no official time series for the price of electricity for services.Therefore, we use the average (equally weighted) of the prices for household (p 1;1;i;t ) and the prices for industry (p 3;1;i;t ).
O p 2;2;i;t : There is no official time series for the price of gas for services.Therefore, we use the average (equally weighted) of the prices for household (p 1;2;i;t ) and the prices for industry (p 3;2;i;t ).O p 2;3;i;t : There is no official time series for the price of oil products for services.Therefore, we use the average (equally weighted) of the prices for household (p 1;3;i;t ) and the prices for industry (p 3;3;i;t ).O p 2;4;i;t : There is no official time series for the price of solid fuels for services.Therefore, we use the average (equally weighted) of the prices for household (p 1;4;i;t ) and the prices for industry (p 3;4;i;t ).
O p 3;1;i;t : "Price per toe in € of electricity in industry (taxes incl.)",divided by 0.01163 to convert in KEuro/TJ.This series from Enerdata is very similar to the Eurostat series for industrial sector band IC, which typically represent medium size enterprises. 21We have opted for the Enerdata series since it has fewer missing values.
O p 3;2;i;t : "Price per toe in € of natural gas in industry (taxes incl.)NCV", divided by 0.01163 to convert in KEuro/TJ.This series from Enerdata is very similar to the Eurostat series for industrial sector band I3, which typically represent medium size enterprises. 22We have opted for the Enerdata series since it has fewer missing values.
O p 3;3;i;t : We use the average (equally weighted) of "Price per toe in € of heavy fuel oil in industry (taxes incl.)" and "Price per toe in € of light fuel oil in industry (taxes incl.)".We have not considered the price of other oil products since the series are too incomplete.The coverage of these two products seems high, and the weight, although varying across countries and years, is similar.In most countries, the price of light fuel is approximately twice the price of heavy fuel.We have then divided by 0.01163 to convert in KEuro/TJ.
O p 3;4;i;t : "Price per toe in € of bituminous coal in industry (taxes incl.)",divided by 0.01163 to convert in KEuro/TJ.We have not considered the price of other solid fuels since the series are too incomplete.The coverage of bituminous coal on solid fuels in industry seems high.The series has many missing values, especially in those countries where the weight of solid fuels for industry is low.O p 4;1;i;t : We have not collected any price, since the weight of electricity for transport is extremely low.O p 4;2;i;t : We have not collected any price, since the weight of gas for transport is extremely low.O p 4;3;i;t : "Price per toe in € of premium gasoline (taxes incl.)",divided by 0.01163 to convert in KEuro/TJ.We have not considered the price of other fuels since the series are too incomplete.The coverage of premium gasoline on oil products for transport is high, and the price of other fuels, when available, is highly correlated.O p 4;4;i;t : We have not collected any price, since the weight of solid fuel for transport is extremely low.h;s;i;t < 0:1 then source s is excluded for that year and that country, and p 4 h;i;t is computed as a weighted average of just the other prices (the small weight is set to zero and the others are readjusted to sum up to one).The reason for excluding sources with small weight is that, in many countries, the price of sources whose weight is small are missing (or unreliable).If the prices of sources whose weight is larger than 0.1 is missing, p 4 h;i;t is also considered as missing.h;i;t p 3 h;i;t : reference price for q 3 i;t , obtained as a (time varying weighted) average of sectoral energy prices.The weights represent the relevance of sector h in "the whole economy" (meant as 4 sectors, 3 sources), and are therefore given by ω 3 h;i;t ¼ q 3 h;i;t P 4 j¼1 q 3 j;i;t .

Policy variables
The source is the MURE database, and the methodology is illustrated in Section 3.

Other control variables
We divide the other control variables in 5 groups.
1. Control variables included in all models � pop i;t : population, source Eurostat, no missing values.For France, we have considered metropolitan France only (i.e.excluding overseas territories).� rgdp i;t , ngdp i;t , def i;t : real GDP, nominal GDP, GDP deflator, source Eurostat, available for all 29 countries from 1995 with few missing, available for a subset of countries before 1995.For some countries (Bulgaria, Hungary, Latvia, Lithuania, Luxembourg, Malta, Romania, Slovenia) the initial part of the time series is available for either RGDP or NGDP, but not both.For these countries, we have reconstructed DEF by applying the average inflation in Europe, and then we have used DEF and the available time series to work out the other.Finally, RGDP and NGDP are unavailable for Greece in 2013, and have been reconstructed by applying the average growth rate for Greek RGDP and NGDP in the period 2006-2012 to the 2012 value.� hdd i;t : source Eurostat, availability 1980-2009.Heating degree day (HDD) is a measurement designed to reflect the demand for energy needed to heat buildings.Eurostat calculates heating degree days as (18 � C -T mean ) if T mean is lower than 15 � C (heating threshold) and zero if T mean is greater than or equal 15 � C; T mean is the mean daily outdoor temperature, calculated as T mean ¼ TminþTmax

2
. Unfortunately, at the time this research has been concluded Eurostat did not provide cooling degree days which would be useful for the regression analysis for countries in Southern Europe.According to the European Environment Agency (www.eea.europa.eu/data-and-maps/indicators/heating-degree-days-1/assessment),"the number of heating degree days has decreased by 13% over the last 3 decades, yet with substantial inter-annual variation.The decrease in HDD has not been homogeneous across Europe: the absolute decrease has been largest in the cool regions in northern Europe where heating demand is highest.Temperatures in Europe are projected to continue to increase.Hence, the trend of decreasing numbers of HDD is very likely to continue, and most likely to accelerate.For example, the heat demand for space heating in 2050 was projected to decrease by 25% in the UK, and by 9% in the EU".Therefore, since HDD IS available at Eurostat for the period 1980 until 2009, the series has been extrapolated up to 2013 using an ARMA(1,1) model with constant and trend estimated for each country The following variables are used as instruments for the policy indicator in Table 4: � ed i;t : energy dependence, source Eurostat.The indicator is calculated as energy imports minus energy export divided by the sum of gross inland energy consumption. 23Energy dependence may be negative in the case of net exporter countries while positive values over 100% indicate the accumulation of stocks during the reference year.� ghge i;t : greenhouse gas emission intensity, source Eurostat and United Nations.The indicator is computed dividing greenhouse gas emissions by real GDP.Eurostat provides an index representing annual total emissions in relation to 1990 emissions; the absolute values in 1990 (and every 5 years) are provided by UN (the data appear to be coherent, since applying the growth rate derived from Eurostat index to UN 1990 starting poins one gets almost exactly the subsequent UN values).The "Kyoto basket" of greenhouse gases includes: carbon dioxide (CO2), methane (CH4), nitrous oxide (N2O), and the so-called F-gases (hydrofluorocarbons, perfluorocarbons and sulphur hexafluoride (SF6)).These gases are aggregated into a single unit using gas-specific global warming potential (GWP) factors.The aggregated greenhouse gas emissions are expressed in units of CO2 equivalents.

P
. Bertoldi and R. Mosconi bottom-up energy efficiency indicators (Cahill and � O Gallach� oir, 2010) into empirical estimates of policy impacts, in this paper we develop a direct indicator based on the MURE database of energy policy measures, as in � O Broin et al. (2015).

Fig. 3 .
Fig. 3. Energy Policy Indicators in Finland and the Netherlands, comparing pol U;P h;i;t (left) and pol U;T h;i;t (right).

Table 2
Short description of the explanatory variables introduced in the model (more details in the Appendix).
trend, capturing omitted variables behaving as a smooth function of time (e.g.technology, culture, …) P. Bertoldi and R. Mosconi where h ¼ 1,

Table 4
Hausman t-test for the exogeneity of the Energy Policy Indicator.

Table 5
Estimated short and long run elasticities of policy measures on energy consumption in each sector.

Table 6
Dynamic simulation of the estimated effect of energy policies in the household sector in France, Germany and Italy.

Table 7
Estimated Policy-Induced Energy Savings based on the model (percentage and absolute value, in TJ).
: reference price for q 4 h;i;t , obtained as a (time varying weighted) average of energy prices for sector h.The weights represent the relevance of source s in sector h (4 sources, excluding the others), and are therefore given by α 4 h;s;i;t ¼ q h;s;i;t P 4 j¼1 q h;j;i;t .If the weight α 4 , where solid fuels are also excluded.The weights represent the relevance of source s in sector h (3 sources, excluding the others), and are therefore given by α 3 , obtained as a (time varying weighted) average of sectoral energy prices.The weights represent the relevance of sector h in "the whole economy" (meant as 4 sectors, 4 sources), and are therefore given by ω 4 : share of sources different from (1 ¼ electricity, 2 ¼ gas, 3 ¼ oil, 4 ¼ solid fuel) for sector h, based on Eurostat quantities : share of sources different from (1 ¼ electricity, 2 ¼ gas, 3 ¼ oil) for sector h, based on Eurostat quantities 2. Other control variables (household): � dwell i;t : stock of dwellings (thousand), source enerdata � floor i;t : average floor area of dwellings (m 2 ), source enerdata � area s1 i;t ¼ dwelli;t �floori;t 1000 : total floor area of dwellings (km 2 ) � percfreez i;t : Rate of equipment ownership for freezers (%), source enerdata (interpolated) � percwash i;t : Rate of equipment ownership for washing machine (%), source enerdata (interpolated) � percdish i;t : Rate of equipment ownership for dishwasher (%), source enerdata (interpolated) � percequip s1 i;t ¼ percfreezi;t þpercwashi;t þpercdishi;t 3 : Average rate of equipment ownership (%) � rcons s1 i;t : Real private consumption (M€2005), source enerdata 3. Other control variables (services): � rva s2 i;t : Real value added of tertiary sector (M€2005), source enerdata � empl s2 i;t : Employment of tertiary sector (thousand), source enerdata 4. Other control variables (industry): � rva s3 i;t : Real value added of industry (M€2005), source enerdata � rginv s3 i;t : Real gross investment of industry (M€2005), source enerdata 5. Other control variables (transport): � cars s4 i;t : stock of cars (milions), source enerdata � goods s4 i; t : trafic of goods (tkm), source enerdata �goods s4 i;t : trafic of goods (tkm), source enerdataInstrumental variables