Moving from final to useful stage in energy-economy analysis A critical assessment

Given the climate change emergency, reducing energy consumption, which is responsible for most greenhouse gases emissions worldwide, is a priority. However, the strong historical link between energy consumption and economic growth questions whether continued economic growth is compatible with energy conservation targets. Conventional final energy analysis (common analysis methods applied at the final energy stage) has provided limited insights to this nexus. In response, this paper explores the extent to which useful stage analysis provides additional insights using three common methods: aggregate energy-economy analysis (growth rates, energy intensities, and Index Decomposition Analysis), energy-GDP causality testing, and Aggregate Production Function modelling, using Spain (1960–2016) as empirical case study. The results reveal that of the three methods investigated, aggregate energy-economy analysis provides the greatest insights, including that Spain is far from achieving absolute energy-GDP decoupling. Further, moving to the useful stage indicates that the extent of decoupling is even less than suggested at the final energy stage, and that increasing final energy consumption has historically fully offset efficiency gains. In contrast, whether applied at the final energy or useful stage, energy-GDP causality testing and Aggregate Production Function modelling reveal little about the energy-economy nexus — the results even suggest that these tools are not appropriate and may mislead. Thus, useful stage analysis is necessary but not sufficient for delivering further energy-economy insights; there is also a need for exploring alternative, reliable, energy-economy analysis methods. Indeed, the lack of robustness of Aggregate Production Function modelling and energy-GDP causality testing is worrisome.


Energy conservation and economic growth, compatible targets?
Energy conservation has been a crucial element of energy policy since the oil crisis in the 1970s, as energy prices soared and energy availability decreased in industrialised countries [1][2][3]. Currently, energy conservation is predominantly pursued through energy efficiency policies, and is particularly driven by the climate change emergency, as energy consumption is responsible for most of greenhouse gases emissions worldwide [4]. In addition, future energy availability constraints, notably due to fossil fuel depletion 1 [5][6][7][8] and declining energy returns of energy industries [9][10][11], also support the need for energy conservation measures [12].
Concurrently to energy conservation targets (see for instance [13,14]), economic targets are aimed at more and faster economic growth. Examples can be found with UK's ''Clean Growth Strategy'' [15], the French ''law for energy transition and green growth'' [16], or E. Aramendia et al. Economic data from the World Bank [21], and energy data from the International Energy Agency (IEA) [22]. Computed R squared between primary energy and real GDP is 0.9940, and between final energy and real GDP is 0.9935. not [28] -in which case the economic growth mantra may need to shift towards a post-growth economics paradigm [29].

Moving from the final to useful stage for energy-economy analysis
Considering the apparently contradictory energy and economic targets, and the possible implications, there is a need for an in-depth understanding of the energy-economy interplay. The energy-economy nexus can be studied at the primary energy stage (i.e. the energy extracted from nature), at the final energy stage (i.e. the energy purchased by the end user), or at the useful energy stage (i.e. the energy actually exchanged for energy services, which depends on the efficiency of the end-use device utilised). Conventional energy-economy studies typically adopt a final energy perspective, which presents two main limits. First, the boundary of final energy analysis prevents assessment of end-use efficiencies, and therefore fails to quantify the useful energy exchanged for energy services [30]. Second, using the final energy stage fails to account for the thermodynamic quality of energy, 3 thereby implicitly assuming that all forms of energy are equally productive. In response, the exergy economics community [32] bases the analysis on the useful stage of the Energy Conversion Chain (ECC), which is closer to delivered energy services and thus to the economic productivity of energy [33,34], and on the exergy quantification of energy, which is a thermodynamically consistent measure of the quality of energy [35,36]. Fig. 2 shows the ECC adopted by exergy economics studies.
Appendix A introduces each of the three energy-economy analysis methods explored in this paper (aggregate energy-economy analysis, energy-GDP causality testing, and energy extended Aggregate Production Function (APF) modelling) in terms of rationale, objectives, as well as historical background. Next, insights gained from each of the three methods are presented, both when applied at the final and useful stage.

Aggregate energy-economy analysis
Final stage: decreasing final energy intensities. Conventional final stage studies tend to find results supporting a decreasing final energy intensity (i.e. the ratio of final energy consumption per economic output) over time, and therefore a decreasing dependence of the economy on energy consumption [38][39][40][41], as well as converging final energy intensities across countries [38,42]. Such studies investigate the role of different factors on energy intensities, such as the effect of energy prices or taxes on energy intensities [41] and the effect of structural and technological change through Index Decomposition Analysis (IDA) [38][39][40].
Useful stage: relatively constant useful stage intensities and levelling off efficiency gains. The exergy economics literature finds that, conversely to final energy intensities, useful stage intensities have been remarkably constant over time, although most studies focus on EU countries [43][44][45] -note that Heun and Brockway also find constant intensities for Ghana [28], and that Guevara et al. find increasing intensities for Mexico [46]; the only known case of clearly decreasing useful exergy intensity so far being China [47]. Such results suggest that national economies may be, in most cases, more reliant on energy consumption that final stage studies suggest. In addition, the exergy economics literature, by extending IDA studies to the useful stage, found that the aggregate thermodynamic efficiency 4 is levelling off in industrialised countries [28,48,49], which suggests that the decreasing trend in primary and final energy intensities may not persist. However, studies regarding developing countries remain rare, which prevents from generalisation of findings.

Testing for energy-GDP causality
Energy-GDP causality testing at the final stage: an extensive, and yet inconclusive, literature. The literature regarding the causal relationship between energy consumption and economic growth has considerably expanded since the initial controversy opposing Kraft and Kraft [50] and Akarca and Long [51] (see Table A.1). Forty years later, and despite numerous studies published, the literature remains inconclusive [52][53][54]. Indeed, causality tests have proved to be very sensitive to the parameters of the test carried out, including time span covered, econometric technique used, and energy consumption metric [54,55].  Energy-GDP causality testing at the useful stage: an unexplored area. Only one of 158 studies reviewed by Kalimeris et al. [54] adopts a useful stage perspective, more precisely, using useful exergy as energy metric [56]. Warr and Ayres find a unidirectional causality running from useful exergy consumption to economic output, thereby supporting a situation where energy has a key role in driving economic growth [56]. This study opens up the prospect that extending causality testing to the useful stage may have the potential to unlock new insights and contribute to the so-far inconclusive causality literature.

Energy extended Aggregate Production Function modelling
Final stage in APFs: a popular approach. Energy extended APFs are currently found at the core of numerous models informing energy and climate policy [57,58], particularly when coupled with mainstream Computational General Equilibrium (CGE) models [59,60]. They are notably used in academia, but also in governmental agencies [61,62] and central banks [63,64]. Despite the ubiquity of energy extended APFs, their specification often lacks empirical foundations, and APFs may be modelled as suits the analyst best [57,65]. It is worth noting that APFs usually rely to a large extent on an exogenous Total Factor Productivity (TFP) that represents technological progress to account for economic growth. Recent efforts have focused on developing quality adjusted metrics for Factors of Production (FoPs) 5 to reduce the TFP [66][67][68][69], but ''there is still a TFP [...] that cannot be 'explained away''' [70].
Useful stage in APFs: rare studies. The useful stage (in terms of useful exergy) has been adopted in APF modelling on rare occasions. First, Ayres and Warr [34,71] were able to account for economic growth using the Linex APF throughout the last century in the US and Japan without relying on a TFP, thereby suggesting that technological improvements can be understood in terms of increasing useful exergy available to the society. Second, Santos et al. [72] identified, with a cointegration-based approach, economically plausible Cobb-Douglas APFs for Portugal , within which useful exergy exerts a key hold over economic output. Third, Heun et al. [65] found little change when moving from final energy to useful exergy with Constant Elasticity of Substitution (CES) functions for the UK and Ghana. Fourth, Court et al. [73] used a Leontief-type APF within a long-term endogenous growth model, and found that exergy efficiency exerts a crucial hold on economic growth. Overall, however, it is rare for studies to discuss the useful stage in APFs.

Research objectives, novelty, and case study
Conventional final energy analysis has provided limited insights to the energy-economy nexus. There is therefore a simultaneous, urgent need to investigate both the limitations to conventional final energy analysis, and to provide additional insights using the useful stage. In response, this paper sets two objectives: • Compare the merits of conventional energy-economy analysis methods: aggregate energy-economy analysis (growth rates, energy intensities, IDA), energy-GDP causality testing, and energy extended APF modelling; • Identify the merits of moving from the conventional final energy perspective to a useful stage perspective on the three energyeconomy analysis methods.
Following the exergy economics community, useful exergy is adopted as the quality adjusted useful stage quantification of energy; however, results obtained using useful energy are also presented throughout the paper.
Spain ) is selected as case study, for three reasons. First is the availability of aggregate datasets including useful energy and exergy. Second, Spain had a late industrialisation compared to other Western European countries, which can be fully captured in the covered period 1960-2016 [74,75], during which its real GDP increased sevenfold. Third, the Spanish economy has suffered from an acute and long economic recession (approximately 2008-2014, and designated as the 2008 economic crisis in the rest of the paper) in the wake of the 2008 global economic crisis. Most energy-economy studies so far focus on growing economies; the Spanish case therefore broadens this literature Table 1 Outline of Spain's recent economic history and energy systems development specificities.

Recent economic history
The Spanish industrialisation and liberalisation of foreign trade during in the 1960s and 1970s was accompanied by an important economic growth as well as a surge in industrial production, so that the period is sometimes referred to as the ''Spanish Miracle'' [74][75][76]. In the mid-1970s, Spain entered a period of economic stagnation, due to the combination of the oil shocks causing global stagflation as well as changes in the Spanish political regime, which shifted from a dictatorship towards a representative democracy in the late 1970s [77]. Economic growth eventually came back in the mid-1980s as a result of both the 1980s oil glut and the integration of Spain within the European Union (EU) in 1986 [78]. The period of growth remained roughly until the 2008 economic crisis that approximately lasted until 2014 -disregarding the short early 1990s recession.
Recent energy system development The Spanish energy transition from organic energy sources (water, firewood, muscle work) to mineral resources (particularly coal and oil) was nearly completed by 1960, although organic energy sources remained non-negligible until the early 1970s, with the decline of traditional agriculture [79]. During the whole covered time period, fossil fuel sources are dominant, with oil becoming the prevailing energy source at the expense of coal in the 1960s, despite the uptick in coal use in the aftermath of the oil shocks [79]. It was only since 1997, when the liberalisation of the electricity sector began as a consequence of EU policy, that effective implementation to support modern renewable energy (particularly, wind and solar technologies -hydroelectricity plants having a long history in Spain) was implemented [80]. The period of favourable legislation entailed a rapid surge of renewable energy supply [81], until being reversed in the aftermath of the 2008 economic crisis, which resulted in a renewable energy development paralysis [82], which did not prevent new actors, such as renewable electricity cooperatives, to emerge throughout the country [80].
to cases of economies facing recession. Table 1 outlines the specificities of the recent Spanish economic history and energy system development. This study contains several novelties. First, the paper presents the first critical assessment of applying energy-economy analysis methods at the final stage versus at the useful stage. Second, it provides the second (after Warr and Ayres [56]) energy-GDP causality test that uses the useful exergy metric as a quality adjusted measure of energy consumption. Third, instead of performing a single causality test, an innovative methodology for conducting a wide range of causality tests (varying many parameters) is introduced, and a meta-analysis of the results is conducted. Fourth, the exogenous TFP, which is usually included in APFs, is removed from the specification of APFs, as its inclusion seems a priori unjustified. Fifth, an original method for assessing the robustness and predictive capacity of fitted APFs is introduced and tested.

Data
All data introduced hereafter are indexed to 1960, which enables display of time series for different variables in a single graph, without altering results. The R language is used as programming environment for conducting all calculations and statistical tests [83]. Input data as well as R code are available in the Data Repository associated with the paper [84].

Economic data
Economic data are taken from the Penn World Table (PWT) 9.1 [85].
GDP data. Three different GDP metrics are used in this study. Firstly, the real GDP (rgdpna) metric is used for APF modelling as it is appropriate when studying economic output over time in a single country: ''if the sole object is to compare the growth performance of economies, we would recommend using the rgdpna series'' [85, p. 3157]. Secondly, the expenditure-side real GDP at chained Purchasing Parity Power (PPP) (rgdpe) and output-side real GDP at chained PPPs (rgdpo) are used alongside rgdpna for causality testing. rgdpe and rgdpo are both expressed in real GDP in PPP and are therefore relevant for cross-countries as well as over time studies [85].
Capital data. Several options can be used for quantifying capital data. The Perpetual Inventory Method (PIM) calculates capital stocks by adding new capital stocks and subtracting obsolete stocks. When outof-date stocks are defined as the existing assets retired, a Gross Capital Stocks (GCS) metric is adopted. Conversely, when the depreciation of current assets is taken into consideration, one can define the Net Capital Stock metric (NCS), that equates GCS minus depreciation of existing stocks [58]. The NCS metric is adopted as a quality unadjusted metric for capital using directly the values of the capital stock (rnna) metric.
NCS does however not account for the heterogeneity of capital and for the differences in its economic productivity. The concept of NCS has therefore been extended to capital services in order to account for the productivity of capital. Capital services are defined as a ''flow of productive services from the cumulative stock of past investments'' [68, p. 7]. Capital services (rkna) is consequently adopted as the quality adjusted metric for capital.
Labour data. Likewise, total hours worked is defined as the quality unadjusted metric for labour, which is calculated by multiplying the emp and avh variables (see PWT), where emp is the variable for people engaged in labour, and avh is the variable for the average yearly hours worked by a labourer. The quality adjusted metric for labour is defined by multiplying the quality unadjusted metric by the human capital index variable hc. The quality adjusted metric accounts for skills and knowledge of labourers.

Energy and exergy data
Primary and final energy data. These data are taken from the International Energy Agency (IEA) Extended Energy Balances dataset [22].
Useful energy and exergy data. These data are taken from the Societal Exergy Analysis (SEA) conducted by Aramendia [86].

Energy intensities calculation
The primary energy intensity in each year is defined as follows: where stands for GDP, and for primary energy. The final energy, useful energy, and useful exergy intensities of the Spanish economy (noted , and ) are defined in the same way.

Growth rates calculation
An average -year growth rate over time for a variable is defined as Following Heun and Brockway [28], growth rates refer to the average 5-year growth rates ( = 5) throughout the paper, unless stated otherwise. Sensitivity tests are available in Appendix B.

LMDI decomposition
LMDI decomposition is a divisia based IDA method that has been introduced by Ang in 1998 [87] and has been further developed since then [88][89][90]. A LMDI-I decomposition analysis of the Spanish useful exergy supply is conducted in order to shed light on the evolution of the economic useful exergy intensity. The LMDI-I method is selected, as recommended by Ang [88,89] due to its simpler formulation as well as two mathematical properties: consistency in aggregation and perfect decomposition at the subcategory level. Following previous studies [28,47], the Spanish useful exergy supply is specified as a sum of different quantities: where  is the subset of economic sectors considered, and  refers to the subset of end-uses considered in the Spanish SEA conducted (subsets are defined in Supplemental Information (SI), Section 1). Hence, , is the useful exergy consumed by a particular end-use ∈  within a particular economic sector ∈ . The LMDI-I decomposition is conducted in multiplicative terms [89], which enables to write, at any time : where refers to the influence of the aggregate final energy supply, to the effect of variations in the share of final energy consumed by each economic subsector ∈  -i.e. structural changes -refers to end-uses changes within each economic sector, and refers to efficiency gains in the conversion of final energy into useful exergy. The formulae for these coefficients are presented in SI, Section 1.

Testing for energy-GDP causality
From the broad family of causality tests, the Granger causality test [91] is selected for our study. 6 The methodology described thereafter enables to identify three mutually exclusive types of Granger causality: causality running from economic output to energy consumption (conservation hypothesis), causality flowing from energy consumption to economic output (growth hypothesis), and bidirectional causality (feedback hypothesis), which is identified when the two first causality coexist. The energy consumption and economic output 1-year growth rates are computed for conducting causality tests.

Selected Granger tests
Three different implementations of the Granger causality test are used: firstly, the causality function from the vars package [93], secondly, the grangertest function from the lmtest package [94], and thirdly, the Toda-Yamamoto (T-Y) procedure [95], for which the implementation described by Pfeiffer [96] is adapted.

Obtaining stationary time series
Conversely to the T-Y implementation, the causality and grangertest functions require stationary time series, otherwise they may lead to spurious results [97]. The energy consumption and GDP growth rates time series were made stationary by differentiating once before using the causality and grangertest functions, while time series are used as such for the T-Y procedure. (See SI, Section 2.) 6 See for instance Sims for a different econometric technique [92].

Choice of metrics
Three energy metrics are used for conducting causality tests: final energy, useful energy, and useful exergy. Likewise, three economic output metrics are used: rgdpna, rgdpe and rgdpo. Each test is conducted for each of the nine possible (energy metric, economic output metric) combinations.

Considered time period
The covered time period 1960-2016 corresponds to 56 data points once the growth rates are calculated. The time period has been varied in order to cover each possible time period of at least 30 years. The total number of time periods possible for both the T-Y ( T-Y ) procedure and the direct implementations of the Granger causality test ( Granger ) are therefore defined as: The three different implementations of the Granger causality test, for each of the nine combinations of energy and economic output metrics, are conducted for each possible time period.

Specification and validity of the Vector Autoregression (VAR) model
The VARselect function of the vars package was used in order to determine the number of lags to be included in the VAR model. This function returns the optimal number of lags according to four different criteria. (See SI, Section 2.) For each of the three procedures, of the nine possible combinations of metrics, and of the possible time periods (351 or 378 depending on the procedure), a VAR model is defined for each optimal lag returned (the number of which is between one and four depending on the optimal number of lags specified by the four criteria). Thereafter, each VAR model specified was tested for both misspecification and stability. (See SI, Section 2.) Only VAR models which were found stable and non serially correlated were deemed valid and included in the results.

Summary of the tests conducted
A summary of the number of causality tests carried out for each of three procedures is presented in Table 2. A total number of 16,849 tests have been carried out, of which 14,821 are regarded as valid and 2,028 as void. Only tests regarded as valid are included in the meta-analysis conducted.

Formulating APF specifications
Two functional forms were selected for conducting our APF modelling. Firstly, the CES, which is increasingly used in the energyeconomy modelling field [58], and secondly the Linex function, that has been used in key studies on economic growth [34,71,98]. The Cobb-Douglas functional form was disregarded, as it implies perfect substitution between the FoPs (elasticities of substitution equal to unity), and seems to be an illegitimate assumption. Next, the neoclassical assumption (cost share theorem) that output elasticities of the FoPs equate their cost shares in the national accounts was a priori rejected. 7 Instead, we determine output elasticities empirically by fitting APFs to historical data.
In addition, the use of an exogenous TFP for technological progress was dismissed. Indeed, it was considered that technological progress E. Aramendia et al. can be accounted for by the estimates of the sole FoPs. This paper argues that technological progress can be captured independently by using quality adjusted FoPs that account for technological change.
Indeed, by using useful exergy, technological improvements within the energy system and the ECC are accounted for, while by using capital services, the increase in productivity of capital entailed by technological improvements and new technologies are accounted for. 8 To conclude, assumptions lead to two different functional APFs. Firstly, the three FoPs CES APF is defined as where ( 1 ; 2 ; 3 ) stands for the triplet of the FoPs. Three different configurations, or nesting structures can therefore be defined [58,65]. These will be referred as the ( ), , ( ), and ( ), configurations, where stands for energy consumption, for capital inputs, and for labour inputs. Therefore, three different nesting structures are tested for the single CES APF. The fitted parameters are the coefficients , , 1 , and 1 . Then, the Linex APF is defined as where FoPs are not interchangeable. Fitted parameters are the coefficients , 0 and 1 .

Fitting to historical data
For each functional form, one fit with both quality unadjusted and adjusted FoPs is performed. An Ordinary Least Squares (OLS) fitting technique is used, which is described in Heun and Brockway [65]. This approach seeks to find the parameters that minimise the Sum of Squared Errors (SSE), defined as: wherẽstands for fitted economic output in year . These fits are conducted using the micEconCES package [103] for the CES APFs and the MacroGrowth package [104] for the Linex APFs. A structural break is identified in the Spanish real GDP time series with the beginning of the 2008 economic crisis. Therefore, two fits are conducted for both quality unadjusted and adjusted FoPs. Firstly, APFs are fitted over the whole time period 1960-2016. Secondly, APFs are fitted over the time period 1960-2008 predating the structural break, and thereafter extrapolated for the period 2009-2016. The reason is twofold: first, to explore whether the fitted econometric parameters vary significantly depending on the fitting time period, and second, to test the ability of fitted APFs to both cope with structural breaks, and to account for the 2008 economic crisis.

Comparing fitted APFs
Two different values are used as indicator of the quality of the obtained fit. First, when the fit is conducted over the whole time period, the SSE 1 is defined as Conversely, when the fit is conducted over the time period 1960-2008 and thereafter extrapolated, the focus is on the ability of the extrapolated values to fit actual GDP. Therefore, the SSE 2 value is defined as 9 Fig. 3 presents the aggregate measures of energy consumption alongside real GDP. Aggregate energy consumption (whether analysed at the primary, final or useful stage) and economic output tended to evolve together in the whole time period (see Table 1). An important increase in energy consumption is observed in the 1960s and 1970s (during the so-called ''Spanish Miracle''), that lasted until the period of economic stagnation in the 1970s. Energy consumption and economic growth came back simultaneously in the mid-1980s as a result of the oil glut and of the European integration [78], until the serious 2008 economic crisis (roughly 2008-2014), during which both aggregates dropped. Fig. 4 shows the tight correlation between energy consumption and economic output growth rates. Most data points correspond to a situation of hypercoupling, whether final energy or useful exergy is used. Only a few data points correspond to a situation of absolute decoupling; their number decreases from six when using final energy ( Fig. 4.a) to four when using useful exergy ( Fig. 4.b). Thus, absolute decoupling is absolutely unprecedented in the Spanish case, and, leaving apart a few odd data points, further economic growth has always meant further energy consumption. In addition, the number of data points corresponding to a relative decoupling situation drops from twelve when using final energy to nine when using useful exergy. Hence, useful exergy analysis provides less optimistic insights on the extent of historical decoupling than final energy analysis. Appendix B shows that observations hold when varying the calculation of the average growth rate, Appendix C shows similar graphs using primary and useful energy, Appendix D shows the minor influence of the GDP metric, and SI (Section 3) shows energy and GDP growth rates time series.

Final energy consumption and thermodynamic efficiency growth rates: evidence of high energy rebound?
Fig. 5 shows final energy consumption growth rates versus finalto-useful national energy (Fig. 5.a) and exergy (Fig. 5.b) efficiency improvements -equally measured in growth rates. Firstly, Fig. 5 shows that final energy consumption tends to evolve in positive correlation with national thermodynamic efficiency. Secondly, Fig. 5 can be analysed with energy rebound lens. Most of the data points correspond to a situation in which energy efficiency improvements have been fully offset by an increasing final energy consumption. Such data points are identified in the ''Backfire'' area in Fig. 5, and suggest historically a high energy rebound in Spain. Very few data points (eight when using final energy, three when using useful exergy) support a situation of no energy rebound (or hyperconservation), i.e. a situation where efficiency gains result in at least the final energy savings to be expected if only efficiency gains were at play. It is worth noting that the only data points corresponding to a situation of no energy rebound are those of the 2008 economic crisis, during which both final energy consumption and economic activity dropped. Fig. 6 shows that the economic output metric is of key influence on the Spanish energy intensities obtained. Indeed, a rgdpna based intensity suggests that the useful energy and exergy of the Spanish intensity has increased until a peak in 2005, while a rgdpe based intensity suggests that these intensities peaked in 1985. Both charts show, to some extent, a bell-shaped curve, as intensities firstly rise importantly during the ''Spanish Miracle'' period of high industrialisation until reaching a peak, and then initiate (albeit at different dates) a decrease which can be interpreted as a period of structural changes that reduces the economy's reliance on energy intense industrial processes. In both graphs, a sharp decrease in all intensities can be observed from 2004 onwards, although this decrease halts in the aftermath of the 2008 economic crisis, from 2009 to 2012, and then resumes.

Bell-shaped energy intensities
Last, primary and final energy intensities provide insights very different from useful stage intensities. Indeed, primary and final energy intensities in 2016 are comparable to their initial values when using rgdpna, while they are considerably lower than their initial values when using rgdpe. This suggests that the Spanish economy is effectively reducing its reliance on energy consumption. Conversely, useful stage intensities qualify this interpretation, for they remain at high levels corresponding to those of the industrialising period of the 1970s-1980s (rgdpna) and of the late 1960s (rgdpe). Thus, both useful energy and exergy intensities suggest that the Spanish economy remains fairly dependent on energy consumption, and provide less optimistic insights than primary and final energy intensities.

LMDI decomposition and levelling off thermodynamic efficiencies
The decomposition coefficients calculated are presented in Fig. 7. Final energy supply ( ) has been the factor of higher influence in the supply of useful exergy in the studied period. Table 3 highlights the evolution of each decomposition factor over time.
firstly increased until the mid-1970s, then started decreasing until the early 1990s, and remained roughly constant ever since. The first trend corresponds to the industrialisation of the Spanish economy, during which more energy efficient economic sectors, e.g. industrial activities, increased their share of energy consumption. The second trend suggests a period of structural changes towards less energy efficient economic sectors in the period [1975][1976][1977][1978][1979][1980][1981][1982][1983][1984][1985][1986][1987][1988][1989][1990], which corresponds to a rise of the service sector and tourism industry [105]. Then, has increased over the whole considered time period, which means that overall, energy consumption has shifted towards increasingly energy efficient end-uses within each economic subsector. Finally, has been a crucial factor in increasing the useful exergy supply until roughly 2000, since then it has stagnated. This suggests that efficiency gains are slowing down in the Spanish economy (see Appendix E for national final-to-useful and primary-to-useful exergy efficiencies).

Testing for causality
Granger causality is reported when it is detected with 95% confidence. Alternatively, the result of a test is ascribed to the ''No evidence'' outcome. Fig. 8.a presents an overview of our results. Two main conclusions can be drawn from this table. Firstly, in 80% of the cases, Granger causality is not identified. 10 Secondly, when causality is identified, the conservation hypothesis is the most backed up hypothesis (12% of cases), although the number of tests backing up the growth hypothesis is non negligible (6% of cases).

Influence of the chosen procedure
Consistent with the literature, Fig. 8.b shows that the procedure employed for conducting causality tests influences the outcomes of our tests. The T-Y procedure tends to support more the conservation hypothesis (25% of cases) than the two other procedures, which provide more balanced results. In most cases, none of the three procedures identify Granger causality, although the T-Y procedure identifies causality in more cases (33%) than the two other procedures.

Influence of the energy metric
The energy metric was also found to influence the outcome of our tests. Fig. 8.c shows that for each energy metric, most tests fail to identify Granger causality, and that when causality is identified, the most backed up hypothesis is the conservation hypothesis. However, the percentage of tests backing up the growth hypothesis increases when moving from final energy (3%) to useful energy (4%), and more notably, when moving to useful exergy (10%). Yet, even when useful exergy is used as energy metric, results remain inconclusive. Fig. 4. Energy consumption versus economic output growth rates. On the left, energy consumption is measured as final energy, and on the right, as useful exergy. The linear best fit is displayed for both energy metrics, and the R 2 value is 0.84 when using final energy and 0.86 when using useful exergy. Graphical areas corresponding to hypercoupling, relative decoupling, and absolute decoupling, are identified.

Fig. 5.
On the left, final energy growth rate as function of final-to-useful energy efficiency growth rate. On the right, final energy growth rate as function of the final-to-useful exergy efficiency growth rate. Graphical areas corresponding backfire, partial rebound, and no rebound, are identified.

Table 3
Decomposition factors evolution during each decade. For each decomposition factor, the reported number is not the actual decomposition factor, but the ratio between its value at the end of the decade and its value at the beginning of the decade, in order to highlight the relative evolution of decomposition factors occurring during each specific decade.

Influence of the economic output metric
Similarly, for each economic output metric used, Fig. 8.d shows that in most cases, tests fail to identify causality, and support firstly the conservation hypothesis when causality is identified. The number of tests supporting the conservation and growth hypothesis are more balanced when using the rgdpo and rgdpe metrics than when using the rgdpna metric. Overall, results remain inconclusive. Fig. 9 presents the results obtained with different number of lags when useful exergy is used as energy metric. With one or two lags, results correspond to the general trend introduced in Fig. 8.a. Conversely, when more than two lags are included in the VAR model (neglecting results obtained with four lags, for which only five tests are conducted) causality is identified in considerably greater proportions. In addition, when causality is identified with more than two lags, results supporting the growth hypothesis are remarkably more numerous than those supporting the conservation hypothesis, with the exception of models including eight lags. SI (Section 4) shows that the influence of the number of lags was neither found as stringent when using final energy nor useful energy.

Obtained fits
The focus is here on the results obtained when fitting the CES APFs. Fits obtained with the Linex APF are available in Appendix F, and the   parameters obtained for all conducted fits are available in Appendix G.
(Output elasticities for the CES APFs are available in SI, Section 5.) Fig. 10 presents the fits obtained with all the CES APFs, both using quality unadjusted and adjusted FoPs. The first fit on the whole time period 1960-2016 is displayed on the left, and the second fit on the time period 1960-2008 and thereafter extrapolated in dotted lines is displayed on the right. 11 Remarkably, all the CES APFs are able to account for economic output when fitted over the whole time period 1960-2016, as evidenced by the SSE 1 values available in Appendix G, and this without relying on any exogenous TFP. No significant difference in the quality of the fits (see SSE 1 and SSE 2 ) between APFs with quality adjusted and quality unadjusted FoPs can be evidenced from the conducted fits. However, none of the CES APFs are fully able to follow the 2008 economic crisis when fitted over the period 1960-2008 and thereafter extrapolated. Indeed, the extrapolations reproduce to some extent a decrease in economic output during the crisis, but a significant mismatch appears, as evidenced by the SSE 2 values in Appendix G. Amongst the CES APFs, the ( ), with quality unadjusted FoPs performs best for the extrapolation according to the SSE 2 values. Fits show that APFs with quality adjusted FoPs do not necessarily provide better fits than APFs with quality unadjusted FoPs, which is shown by the SSE 1 and SSE 2 values in Appendix G.

Fitted economic parameters and economic interpretation
Although the obtained fits are very similar, it is noteworthy that the underlying fitted APFs differ greatly in terms of fitted parameters (see Appendix G), which impedes altogether any sensible economic interpretation. For instance, all CES APFs with quality adjusted FoPs collapse to a two FoPs CES neglecting energy consumption when fitted for the period 1960-2008 -which is shown by the values of and 1 in Appendix G. Two examples of sensitive changes in economic interpretation for the CES APF of ( ), nesting structure are shown in Table 4. In the first example, when the fit is performed on the period 1960-2008 with quality unadjusted FoPs, fitted parameters indicate a key role of energy as a factor of production. Indeed, the output elasticity of energy varies between 0 and 0.4 (suggesting a high productivity of energy), and both elasticities of substitution , and , are relatively low (inferior to unity), suggesting a low substitutability of the FoPs. However, if the same fit is conducted with quality adjusted FoPs, the fitted parameters ( 1 ; ) reject energy as a factor of production altogether. Then, the second example shows that when the APF is fitted with quality adjusted FoPs for a slightly longer period of time (2008-2016), energy recovers a key role as a factor of production, with , and , indicating a situation of low substitutability between FoPs. 11 Only the time period 1990-2016 is displayed when extrapolating in order to focus on the latests time span where a mismatch can be observed; the fits perform well during the period 1960-1990 and are not particularly relevant.

Aggregate energy-economy analysis: valuable insights into the energyeconomy nexus An almost nonexistent historical absolute energy-GDP decoupling in Spain.
Most data points were identified in the hypercoupling space in Fig. 4, which refutes both a long-term relative and absolute energy-GDP decoupling. It is noteworthy that the number of data points supporting a situation of either absolute or relative decoupling decreased when using useful exergy -the useful stage therefore provides less optimistic insights than the conventional final energy stage. In short, Spain is nowhere near achieving either relative or absolute energy-GDP decoupling, and the only economic period during which energy consumption was found to decrease is the economic downturn following the 2008 economic crisis. Thus, the ''green growth'' strategy is at odds with historical data in Spain, and consideration of an alternative post-growth strategy appears urgently needed.
A growing final energy consumption fully offsetting efficiency gains. Most data points in Fig. 5 support a situation of ''backfire'', i.e. of increasing final energy consumption fully offsetting energy efficiency gains. Thus, there is evidence that in the Spanish case, historical efficiency gains have not led to actual energy savings. In addition, the only data points corresponding to a situation of no rebound are found during the post-2008 economic downturn, i.e. when economic activity was dropping. While current efforts to reduce energy consumption focus on energy efficiency, the Spanish case study suggests that energy efficiency, within a growing economy, may not lead to the desired outcome. Energy efficiency appears therefore insufficient to reduce energy consumption, and coupling energy efficiency with energy sufficiency (i.e. reducing energy consumption by changing the quality or quantity of energy services provided) seems to be a sounder approach than relying solely on energy efficiency [106][107][108].
Energy intensities following a bell-shaped curve. While results regarding decreasing primary and final energy intensities (Fig. 6) in recent years aligns with the literature (for instance [38][39][40][41]44]), results regarding the useful exergy intensity of the Spanish economy diverge from most published studies. Indeed, while it has been generally found that useful exergy intensities were somewhat constant over time for industrialised countries [28,[43][44][45], the Spanish useful exergy intensity has substantially varied over time. However, like previous useful exergy intensity studies, it was found that useful exergy (as well as useful energy) intensities tend to support a considerably stronger energy-economy link than primary or final energy intensities do.
Levelling off thermodynamic efficiencies. Fig. 7 shows that efficiency gains have been an important driver of increasing useful exergy consumption in Spain, although such gains are currently levelling off. This trend of slowing down efficiency gains was already identified for several industrialised countries, including Japan [48], the US [49], the UK [28] and Austria [43]. A levelling off thermodynamic efficiency suggests that the decrease in primary and final energy intensities may neither decrease as fast as expected, nor carry on decreasing indefinitely.

Inconclusiveness of energy-GDP causality testing and APF modelling
Inconclusive energy-GDP causality testing. Whether using the useful or final stage perspective, no statistical evidence of causality between Spain's economic output and energy consumption growth rates was found. In accordance to the literature, the econometric procedure, the energy and economic output metrics, and the number of lags of the VAR model were found to considerably influence the outcomes of causality tests [54,55]. Consequently, the added value of using a useful stage perspective for energy-GDP causality testing was found inconclusive.  Unsatisfactory APF modelling. No notable difference was identified when using APFs with final energy (quality unadjusted FoPs) and useful exergy (quality adjusted FoPs). Indeed, fits performed similarly, either when conducted over the whole time period, and when conducted over the period 1960-2008 and thereafter extrapolated. In contrast to what is suggested in the literature [58], fits were even found to perform better in terms of SSE 1 and SSE 2 values (see Appendix G) when using final energy in APFs (quality unadjusted). The added value of using a useful stage perspective with APF modelling was therefore found inconclusive. Fig. 6 showed the high influence of the economic output metric chosen for computing energy intensities time series in the Spanish case. This contrasts with a previous study conducted for Ghana and the UK, where it was found that the influence of the economic output metric was negligible [28]. Hence, energy intensities should be handled carefully and sensitivity tests should be systematically conducted by varying the economic output metric before drawing any economic interpretation. Most former studies did not include such a sensitivity test, but recent methodological developments of datasets such as the PWT [85] should from now on enable conducting systematic sensitivity tests.

Energy-GDP causality testing: a dubious validity, and a need for systematic sensitivity tests
Results suggest, on empirical grounds, serious concerns regarding the validity of energy-GDP causality tests. Considering the high sensitivity of results of causality tests on the parameters, any hypothesis on causality could have been backed up by particular tests. It seems therefore absolutely crucial that further causality studies include a systematic sensitivity test. Unfortunately, previous studies often lack a systematic sensitivity analysis, and it is therefore difficult to evaluate the validity of former published results. In addition, there are, on theoretical grounds, crucial limitations related to the VAR models underlying causality tests, which seriously question the validity of such tests. (See Lütkepohl [109] for a description of VAR models.) Indeed, VAR models are linear and static, assuming firstly that the influence of a variable is always proportional to its growth rate, and secondly that the coefficients describing the energy-economy interactions in the model are constant over time [37]. Such assumptions are dubious for statistical tests that need to be conducted for periods longer than 30 years, during which the economy evolves considerably. The inappropriateness of VAR models underlying causality tests may explain the inconclusiveness of both the causality literature and of the Granger causality tests conducted in this paper.

APF modelling: compelling caveats on empirical and theoretical grounds
Likewise, serious concerns regarding the validity of APF modelling can be drawn from results. First, although the conducted fits closely follow actual economic output when fitted for the whole time period, fitted parameters greatly differ (Section 3.3.2). Thus, the quality of the fits obtained may disguise the considerable sensitivity of fitted parameters and therefore, of economic interpretations, to modelling assumptions [65]. Second, APFs providing the best fits may do so by returning unreasonable values. Examples can be found in Tables G.1 and G.2 where specific elasticities of substitution tend to infinity, thereby suggesting that a factor of production can perfectly and indefinitely substitute for one another, which is hardly realistic. Third, most of the fitted APFs were found unable to account for the 2008 economic crisis when fitted on the period 1960-2008 and thereafter extrapolated. Therefore, obtaining a high quality fit does not ensure that the fitted APF will be robust when facing a structural break, which seriously questions the predictive capacity of APFs, as well as their validity as energy-economy modelling tools. Fourth, it is noteworthy that satisfactory fits were achieved while disregarding the mainstream and ubiquitous TFP, which was a priori deemed unjustified. Considering the implications of modelling assumptions on policy recommendations [65], the systematic inclusion of a TFP in APFs -without questioning the extent to which this one is needed for obtaining a plausible fit -may systematically overestimate the role of technological progress in delivering economic output, and thus, sway unjustifiably policy recommendations towards innovation and technology.
Finally, it is to be reminded that, on theoretical grounds, serious caveats have been directed to APFs since their early formulations. Firstly, aggregating the whole economy's heterogeneous production in a single metric is questionable, like aggregating heterogeneous capital within a single metric [110,111]. Secondly, numerous authors claim that APFs are ''not even wrong'' to the extent that an underlying mathematical identity is responsible for the statistically very good fit observed [112][113][114][115].

International perspectives
Although this paper has focused on Spain as a significant case study, far-reaching international implications can be drawn from the findings. First, it was shown that, at least for some methods, applying energyeconomy analysis methods at the useful stage does bring significant advantages compared to applying methods are the final stage. While findings may differ from country to country, and results detailed in this paper for Spain are not generalisable, conducting the analysis at the useful stage is likely to enhance final stage analysis results. Second, this paper shows that two conventional and widely used methods (energy-GDP causality testing and APF modelling) are not robust enough to conduct reliable energy-economy analysis. Showing that such methods are not applicable for Spain does invalidate the methods to the extent that, at best, their general validity is poor. Consequently, studies applying any of these methods for another country or region need to bear the burden of proof and justify that the method is robust for their case study, by providing significant sensitivity analysis.

Limitations
Last, there are important limits to this study. The paper adopts a perspective stemming from the Societal Exergy Analysis literature, where the technological development of energy systems is studied primarily in terms of the final-to-useful energy and exergy efficiencies. These are crucial elements of the energy system, but are not sufficient to capture its whole structure. Other important aspects of the energy system, such as the development of local capacities and microgrids, whether the energy system is centralised or decentralised, operated under an open market regime or not, are regrettably outside the scope of this work. Such aspects are particularly important for future studies that attempt to model the future of energy grids, as numerous countries are increasingly turning to decentralised energy systems operated in an open market regime.

Conclusion
A critical assessment of insights gained on the Spanish energyeconomy interplay when moving from the final to the useful stage was provided. Using aggregate energy-economy analysis, a very tight energy-GDP coupling was found for Spain (and even tighter when moving to the useful stage), with only rare (post-2008 recession) years of absolute energy-GDP decoupling. Regarding energy intensities, useful stage intensities are found to suggest a stronger connection between Spain's economy and energy consumption than final energy intensities. Furthermore, the useful stage analysis shows that thermodynamic efficiency gains are levelling off, and that increasing energy consumption has historically fully offset thermodynamic efficiency gains, which evokes a high energy rebound. Such findings are policy-relevant as they suggest that it is urgent to consider a post-growth economic strategy, and to supplement energy efficiency policies with energy sufficiency policies.
However, the attempt to gain insights on the energy-economy interplay has proved unsuccessful when turning to energy-GDP causality testing and Aggregate Production Function modelling, for which the paper identified crucial caveats. Whether applied at the final or useful stage, it was shown for both methods that arbitrary modelling choices may result in large changes in results, and, therefore, eco-nomic interpretations. Hence, findings highlight that such methods lack robustness and are not appropriate for studying and modelling the energy-economy nexus. Despite their sophistication, they are likely to provide misleading results, and basing policy recommendation on these two techniques seems unsound. This concurs with Heun et al. [65, p. 27], who suggest ''the possibility that energy-economy modelling with APFs [...] may tell us more about theory and modelling approaches than about the economy''.
Thus, findings suggest that moving to the useful stage is crucial in order to unlock new energy-economy insights, but is not sufficient. In addition, useful stage analysis should be applied in combination with alternative, robust, energy-economy analysis methods. Moving away from energy-GDP causality testing and Aggregate Production Function modelling, towards alternative energy-economy modelling such as system dynamics modelling [129][130][131][132], agent-based modelling [133], stock-flow consistent modelling [133,134], and econometrics [135] appears desirable. The energy-economy community is encouraged to develop further research on such alternatives modelling methods, and to apply them at the useful stage of energy use.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Aggregate energy-economy analysis encompasses different tools such as decomposition analysis, the study of energy intensities (ratio of energy consumption per economic output), or econometric techniques, that attempt to find relationships between macro variables such as energy consumption, energy efficiency, and economic output, either as aggregate quantities or as growth rates. Studies may look at historical data to gain an understanding of past energy transitions and derive future trends, or at cross-country analysis to identify the spatial variation of relationships.
Aggregate energy-economy analysis has roots in the aftermath of the first oil crisis, at a time (mid 1970s-mid 1980s) during which concerns regarding the finiteness of the world's natural resources led energy scientists to move away from energy accounting towards energy analysis (see for instance Slesser [116], Cleveland et al. [117], or Costanza [118]). Such trend was not without resistance from the established economics field, which regarded energy analysis as a red herring [119]. Currently, many studies applying aggregate energy-economy analysis techniques at the final stage can be found [38][39][40][41]. Such studies usually find that final energy intensities are decreasing over time, and converging across countries. Conversely, studies adopting a useful stage were recently pioneered by Serrenho et al. [44] and tend to find that useful stage intensities are relatively constant over time [28,45].

Energy-GDP causality testing
Granger defines statistical causality (sometimes referred to as Granger causality) from a variable to a variable when the knowledge of the previous values of adds information when forecasting , in comparison to the reference situation in which only the past values of are known [91,120]. In energy-economy analysis, energy-GDP causality testing has been used to determine whether energy consumption was entailing economic growth, or whether economic growth implied a raising energy consumption, with the purpose of deriving policy implications.
Statistical procedures for identifying statistical causality between variables were introduced in the late 1960s and early 1970s, notably by Granger [91] and Sims [92]. At the final stage, causality testing was first used for energy-economy analysis in 1978 by Kraft and Kraft [50], who found an unidirectional relationship suggesting that economic growth entailed further energy consumption in the US -results which were soon called into question (1980) by Akarca and Long [51], who defended that Kraft and Kraft's results were spurious and due to ''the inclusion of two additional years in the data sample.'' Since then, the energy-GDP causality literature has considerably expanded, with now dozens of studies available, while yet remaining inconclusive [54]. Yet, at the useful stage, , Warr and Ayres (2010) [56] produced the only study currently available, and found a unidirectional causality running from useful exergy consumption to economic output.
Energy Extended Aggregate Production Function modelling APF modelling attempts to model economic output (often quantified by GDP) as function of factors of production (i.e. capital, labour, and sometimes other factors, such as energy). Such APF modelling claims to isolate the effect of each factor of production on economic growth (e.g. its marginal productivities and elasticities of substitution), and to derive policy implications. Empirical applications of APFs include the analysis of technological change and exogenous growth [70], the estimation and analysis of elasticities of substitution [65,121,122] and contribution of factors of production (FoPs) to economic growth [123]. The popularity of APFs is due to their alleged capacity to represent the complexity of a whole economy within a simple functional form.
The APF approach stems from the first formulation by Solow [124,125] and Swan [126] of the well-known neoclassical Solow-Swan growth model (1956)(1957). In its early formulation of a Cobb-Douglass APF, Solow [125] found that exogenous technological progress accounted for 87.5% of economic growth in the US. Currently, technological progress remains usually a crucial parameter to explain economic growth when adopting a neoclassical APF approach [70], even when energy is included [57]. Indeed, the neoclassical cost share theorem, although dubious [99], requires the role of energy in APFs to be marginal [127]. In contrast, ecological economists have adopted the final energy perspective as soon as in 1985 with Kümmel et al. seminal work that introduced the Linex APF [102,128], and defended a crucial role for energy. At the useful stage, the APF approach was first introduced by Ayres and Warr in 2005, who accounted for growth over a century in the US [34], and later on Japan [71], without relying on exogenous technological progress.

Appendix B. Sensitivity test: calculation of the average growth rates
Growth rates displayed in the main article (Figs. 4 and 5) are the computed 5-years average growth rates ( = 5). Here, the influence of the growth rate calculation for varying from 1 to 10 is presented for these two figures.  calculation of the average growth rate, and that increasing the number enables discarding outliers.
Appendix C. Primary energy and useful energy growth rates versus GDP growth rates  Fig. 4 can equally be observed. Thus, the results presented in Section 3.1.2 are robust to the choice of energy metric.

Appendix D. Influence of the GDP metric on the study of growth rates
The influence of the GDP metric on the study of growth rates is shown in Fig. D.1. The obtained charts are similar to the ones presented E. Aramendia et al. Energy consumption versus economic output growth rates. On the left, energy consumption is measured as primary energy, and on the right, as useful energy. Economic output values are measured as the rgdpna measure. The linear best fit is displayed for both energy metrics, and the R 2 value is 0.82 when using primary energy and 0.84 when using useful energy. Graphical areas corresponding to hypercoupling, relative decoupling, and absolute decoupling, are identified. Fig. D.1. Energy consumption and economic output growth rates (measured as rgdpe) over time. Energy consumption measured as final energy on the left, and as useful exergy on the right. R 2 value is 0.66 when using final energy and 0.60 when using useful exergy. Graphical areas corresponding to hypercoupling, relative decoupling, and absolute decoupling, are identified.
in Fig. 4. Hence the influence of the GDP metric on the study of growth rates is negligible.   that APFs with quality adjusted FoPs do not necessarily perform better than APFs with quality unadjusted FoPs.

Appendix G. APFs: fitted parameters
The fitted parameters and SSE indicators as defined in Section 2.5.3 are introduced in the following tables.

Supplemental Information And Data Repository
Supplemental information related to this article can be found online at https://doi.org/10.1016/j.apenergy.2020.116194. Input data and R code for energy-GDP causality testing and Aggregate Production Function modelling are available under a CC-BY-4.0 license at the University of Leeds Data Repository, at http://dx.doi.org/10.5518/931.