Is working less really good for the environment? A systematic review of the empirical evidence for resource use, greenhouse gas emissions and the ecological footprint

Is reducing paid working time (WT) a potential win-win climate change mitigation strategy, which may simultaneously serve environmental sustainability and human well-being? While some researchers and commentators frequently refer to such ‘double-dividends’, most climate and environmental discussions ignore this topic. The societal relevance of paid WT and the potential role of its reduction as a demand-side measure for mitigating the climate- and ecological crisis calls for a critical review of the evidence. Here we systematically review the empirical, quantitative literature on the relationships between paid WT and a number of environmental indicators: resource use (incl. energy), greenhouse gas emissions and the ecological footprint. We applied two comprehensive search queries in two scientific databases; screened ∼2500 articles published until December 2019, and used citation snowballing to identify relevant research. However, we only found 15 fully relevant studies, as well as a number of partially relevant ones. This literature employs substantially different scopes, indicators and statistical methods, each with important caveats, which inhibits a formal quantitative evidence synthesis but usefully informs a critical discussion of the research frontier. Most studies conclude that reductions in paid WT reduce environmental pressures, primarily by decreasing incomes and consumption expenditures. However, existing research does not provide reliable guidance beyond the established link between expenditures and environmental impacts. Quantifying the effects of time use changes and macro-economic feedbacks through productivity, employment, and the complementarity or substitution between human labour and natural resources in production processes has proven to be difficult. To better understand the environmental impacts of specific types of WT reductions, new forms of data collection as well as studies at different scales and scopes are required. The critical discussion of the existing literature helps to conceptually map the pathways investigated so far and to identify crucial next steps towards more robust insights.


Introduction
Supply-side technological measures alone are insufficient to safely avoid catastrophic climate change, making complementary demand-side measures increasingly necessary (Anderson and Peters 2016, Creutzig et al 2016. These measures promise to curb societal resource use (energy, materials and land) and the resulting emissions, which are key drivers of environmental unsustainability (Krausmann et al 2018, UNEP-IRP 2019, Marques et al 2019. In the case of climate change, the global community committed itself to ambitious and rapid action to achieve net-zero greenhouse gas (GHG) emissions over the next few decades (UNFCCC 2015).
A growing number of public intellectuals argue that working time reduction (WTR) could be an environmentally beneficial demand-side strategy for reducing resource use and GHG emissions (Schor 2010, Kopatz 2012, Raworth 2017, Suzuki 2017, Bregman 2018, Frey 2019. This, and WTR more generally, attracts substantial media attention (Hierländer 2018, Harper 2019, Taylor 2019, Semdley 2019, Kahn 2020, Spicer 2020. Some even believe that WTR is inevitable because automation and artificial intelligence will soon outperform humans in many jobs (Frey and Osborne 2017). Others see WTR as beneficial to substantially increase labour productivity (Kleinman 2019). Especially in the affluent Global North, WTR could (again) become a politically more relevant strategy, as the successes of unions fighting for it already show (Reiter et al 2018). Because decent work is such a central societal issue, it is also directly included in the Sustainable Development Goal Nr. 8 'economic growth and decent work' (United Nations 2019). Post-growth visions also emphasize the relation of work and well-being and explicitly mention WTR (The Guardian 2018).
Nevertheless, the topic of WTR is almost completely absent from most IPCC documents, climate strategies at the international, national and subnational levels, and high-level climate policy discussions (Antal 2018). Both this neglect, which may be due to the dearth of information on effects of WTR, and the enthusiasm of its proponents call for a deeper understanding of the climate and environmental impacts of this strategy. In other words, whether and how WTR schemes should be considered in climate discourse and policy is a controversial and open question.
Previous research, at least since Schor (1992) wrote about the 'work-spend' cycle in 'The Overworked American' , has tried to highlight the necessity, display the variety, and study the role of WTR as a strategy to reduce resource use and emissions (e.g. Jackson and Victor 2011, Pullinger 2014, Antal 2014. In its simplest form, the idea is that people who already have enough could work less, therefore earn and consume less, which could increase well-being and reduce environmental impacts without harming employment. Of course, there are many complexities not captured by this first simple picture. For example, consumption patterns may shift towards more resource-and emissionintensive activities such as travel (Hanbury et al 2019). WTR may also trigger production-side changes over time, whose consequences for resource use and emissions depend on the substitutability or complementarity of labour, capital and resources across different sectors (e.g. Apostolakis 1990). Reflecting this complexity, interest in exploring the environmental impacts of WTR from various perspectives seems to be on the rise (Hanbury et  However, in terms of empirical evidence on the quantitative relationship between paid working time (WT) and the environment (ENV), the existing literature is scattered across different approaches and methods, making them hard to compare and synthesize. For WT, different measures, definitions and databases are used. The environmental indicators range from production-based energy and emission accounts to consumption-based carbon-, materialand ecological footprints. While all these indicators have climate-relevance because they include energy or emissions, their differences and underlying assumptions complicate comparisons. While many existing papers have short discussions and selective summaries of previous work, there is no comprehensive and critical review of the literature.
Here we conduct a systematic review of the empirical evidence on the environmental impacts of WTR, aiming for a critical understanding of the knowledge base and necessary next steps to assess WTR as a demand-side mitigation strategy. For this purpose, we employ a reproducible, systematic method to identify the relevant scientific literature on the quantitative relationship between paid WT and resource use, GHG emissions and the ecological footprint (section 2). We summarize the literature (section 3) and critically discuss its strengths and weaknesses (section 4). Then we draw together theoretical arguments and perspectives from the literature and explore promising directions for future research (section 5). Finally, we draw conclusions regarding WTR as a demand-side mitigation measure (section 6).

Systematic review method and criteria
To uncover all the relevant literature, a systematic search process has been conducted (figure 1).
In step 1, two search queries were developed and used to find relevant documents in Scopus and the Web of Science Core Collections, covering all entries until 2 December 2019. The search was limited to English-language titles, keywords and abstracts, yielding 2494 records.
In step 2, titles and abstracts of each record were screened for relevance by the first author, leaving 66 studies. Between reviewer-consistency was tested by assigning 100 records to the second author, with 100% inclusion/exclusion agreement. Studies were deemed fully relevant if they investigated the relationship between paid working time and an environmental indicator (such as energy use, materials use, emissions and the ecological footprint), both indicators were quantitative, referred to the same system (e.g. a household, a state or a country), and the relationship was studied on the basis of measured values. A number of studies were partially relevant, as they dealt with the same relationship but did not satisfy all conditions. These were moved to step 6. In step 3, full texts were screened, leaving ten references as fully relevant.
In step 4, reference lists of all relevant papers were checked ('citation snowballing'), yielding another ten potentially relevant papers. Some of these were not included in the two scientific databases we used. No further studies were identified as potentially relevant from their reference lists (2nd level snowballing). After a full-text screening, five studies were included as fully relevant.
In step 5, the 15 full-texts were coded based on the pre-developed criteria, critically appraised, and synthesized (see table 1). Additionally, the first author corresponded with some of the first authors of the assessed literature (Jared Fitzgerald, Qing-long Shao, Anders Fremstad, Jonas Nässén).
In step 6, the partially relevant studies were collected and utilized for contextualization and discussion of the evidence, as well as to inform next steps. These studies are, for example: macro-economic scenario modelling exercises; models of GHG emissions in which time use had an important role; articles in which environmental behaviour was linked to time use but not WT specifically; research on the environmental intensity of time-uses; and a number of theoretical and qualitative papers.
An important limitation of our approach is that it does not necessarily find broader descriptions of WTR schemes where ENV impacts are not specifically highlighted. Such papers may still be relevant if they discuss selected environmentally important lifestyle changes or strategies of stakeholders that matter for income, expenditures or environmental behaviour 5 . However, these papers do not provide comparable quantitative environmental indicators and would therefore require a completely different review design.

The empirical literature on WT, resource use and emissions
The 15 research articles identified as fully relevant are summarized in table 2 (and the supplementary information, tables S1 and S2, which is available online at stacks.iop.org/ERL/16/013002/mmedia).
Country-level studies investigate the relationship between aggregate WT and ENV indicators using econometric approaches. Besides simple regression models that try to connect the average WT of employed workers with an ENV indicator using a cross-sectional sample of countries, most papers use panel data and build on the STIRPAT approach ('stochastic impacts by regression on population, affluence and technology'). This is a multivariate non-linear model extending the classical IPAT model  (York et al 2003), in which environmental impacts (I) were conceptualized as the product of population size (P), affluence (A: per capita GDP), and technology (T: environmental impacts per unit of GDP). STIRPAT can be written as where a is a scaling constant, while exponents b, c, d and the error term e describe the relationship for unit i (e.g. a country in a year). WT enters these models in two ways: through the scale effect (Schor 1992) and the composition effect (Nässén et al 2009).
The scale effect means that higher work hours yield a higher level of economic output, i.e. refers to the contribution of WT to GDP. Most studies in our sample test for the scale effect by disaggregating per capita GDP into three parts-the employed population ratio, the average WT of employees, and labour productivity (GDP per hour)-and isolate WT as the variable of interest while using the other two as control variables in the econometric model. The composition effect is the impact of WT apart from its contribution to GDP, which stems from household-level consumption decisions influenced by the availability and uses of non-work time and household incomes 6 . In econometric analyses, it is isolated net of GDP (WT is the variable of interest and GDP per capita is a control variable). Additional control variables (urbanization, manufacturing/service ratio of GDP, energy production, etc.) are also included.
Household-level studies investigate the WT-ENV relationship using econometric methods, but with WT and ENV indicators referring to individual households, not entire economies. Two main methods have been used so far. The less comprehensive option is to define WT brackets (e.g. low, medium, high) and use logit regressions to analyse the prevalence of selected environmentally relevant types of consumption in each (Devetter and Rousseau 2011). The other, more common approach is to study the relationship between total household WT and a consumption-based energy/CO 2 /material footprint indicator, which is calculated from all expenditures of the household (Nässén and Larsson 2015, Buhl and Acosta 2016, Fremstad et al 2019. In both cases, a number of control variables can be used (e.g. household characteristics). Figure 2 gives a summary of the main channels through which WT and ENV indicators have been connected in the literature.

Critical appraisal of the empirical literature
We address four groups of questions to discuss the challenges and limitations of understanding the ∆WT-∆ENV relationship: 7 (a) Which system is studied? (b) How is WT measured? (c) How is WT measured? (d) How are conclusions drawn? Figure 3 gives an overview of the relevant methodological choices and the associated concerns, which are covered in the following subsections.

Which system is studied?
The ∆WT-∆ENV relationship cannot be analysed at the global level because WT is not measured consistently. By focusing at lower levels, some impacts of ∆WT will not be captured. One potentially important effect is the sufficiency rebound (Alcott 2008), i.e. that lower demand by those who implement a sufficiency strategy-e.g. by reducing WT and consumption-may reduce prices and increase consumption by others. If consumption grows in units excluded from the analysis, then the ENV impacts of WTR may be overestimated 8 . More generally, the smaller the studied system, the more impacts will occur beyond its boundaries. This is an important limitation for all household-level studies.
On the other hand, the larger the studied system, the more impacts not related to ∆WT can be expected to influence the ENV indicator. A key question is how important WT (or ∆WT) is compared to drivers of ENV (or ∆ENV) that are excluded from the analysis. 7 ∆ refers to changes of variables. Many studies talk about both a static WT-ENV and a dynamic ∆WT-∆ENV relationship. 8 The price interactions will be stronger if the entities whose ENV indicator changes are more strongly connected through markets, so it might be a more serious limitation for household level studies.   In the reviewed country-level studies, the answer is not encouraging: the importance of unobserved variables seems to be enormous. Changes in resource use and emissions are strongly driven by technical and structural changes in sectors like electricity, transport, buildings, and business/industry (Jackson et al 2019). We tested how changes in the electricity mix affect the results of Fitzgerald et al (2018) and found a very substantial influence 9 . All other reviewed studies may be similarly affected as none of them control for changes in the efficiency of power plants or the changing fuel mix. Similar issues will arise for transport, buildings and industrial production. Because WT usually changes by less than 2% per year (and often <1%), changes due to unobserved variables are likely to have substantially larger effects than ∆WT. Therefore, there is a great risk of finding spurious relationships.
Regarding the temporal dimensions of the system under study, how much of ∆ENV appears immediately matters. If there are substantial lagged effects, e.g. because everyday practices slowly adapt to lower WT and incomes, or because of macro-economic feedbacks, then studies assuming a contemporaneous relationship give an incomplete or inaccurate picture. This problem is generally ignored in the empirical literature (table 2).

How is WT measured?
Country-level studies usually use average annual hours per employed worker obtained from datasets based on national accounts (table 2). Calculation methods of these WT values are not internationally standardized, so they are only suitable for comparisons of trends over time and not between countries for a given year (de Vries and Erumban 2017, OECD 2020). WT data from labour force surveys differ from these values by up to 10%-20%, with large variations between countries (Bick et al 2019b). Unfortunately, none of the reviewed studies have used adjusted, consistent WT values, which have only become available recently for a subset of countries (Wingender 2018). In some of the reviewed papers, data availability was also a reason to use WT indicators that excluded certain groups of workers 10 . 9 They study territorial CO2 emissions from fossil fuels for US states. The data they used (EPA 2017) is broken down by main sectors, showing that in 31 out of the 50 states CO2 emissions were mainly driven by changes in the electricity sector. Taking total power generation values from the EIA (2020) reveals that in 23 of these states, the main reason was the change of the fuel mix, not the change of total power generation. (An additional remark is that North Dakota, one of their outliers, had 10%, not 20% increase in overall emissions.) 10 For example, Fitzgerald et al (2018) exclude public and farm employees, i.e. 15%-20% of the total labour force from their analysis of US states. Whether this makes a significant difference is unknown, but the study period is 2007-2013 when the economic turmoil may have had different effects in different sectors. Furthermore, some of their outlier states had radical changes in the private The reliability of WT values is a further question. To test this, we compared average annual WT from the databases of The Conference Board (TCB) and the OECD, both of which are widely used in countrylevel studies (supplementary information, table S1). The results 11 show that annual changes in the differences between the datasets are sometimes larger than changes of the WT values themselves, potentially undermining statistical analyses (supplementary information, figure S1). Similarly, trends in the differences between databases are concerning as the strength of any effect of ∆WT on ∆ENV will appear to be different depending on the database. In addition, Bick et al (2019a) report substantial revisions of WT data in the OECD and TCB databases: e.g. the difference between US and EU values has changed by 40% between the 2003 and 2016 releases of the same database.
Obtaining appropriate WT data at the household level is also difficult. Using WT data of individuals is inadequate because changes of individuals' WT in a household are often strongly coupled, e.g. through the unpaid work of other household members (Jacobs and Gerson 2001, Lewis et al 2008, Wielers et al 2014, Spiegelaere and Piasna 2017. Total household WT depends on the number of household members who work as well as the working hours of each. As ENV may depend on how total WT is shared, the ideal dataset contains information on the number of individuals as well as their WT values, as in Fremstad et al (2019). Without such data, the other reviewed studies analyse the effects of hypothetical WTR schemes that are assumed to affect WT, incomes or expenditures in specific ways (e.g. proportionally reduce all three).
At the household level, the source of WT data matters for reliability. When WT values are selfreported, which is usually the case when no diarybased time use dataset is applied, then its known biases should be kept in mind. For instance, long hours are usually overreported while short hours are underreported (Frazis and Stewart 2014), increasing the width of measured WT distributions. Besides, WT estimates are less reliable if many workers are selfemployed or have irregular schedules (Niemi 1993, Robinson and Bostrom 1994, Bonke 2005, Walthery and Gershuny 2019), which makes statistical estimates more noisy.

How is ENV measured?
Most country-level studies use territorial/productionbased ENV indicators (table 2). Total primary energy sector (a fracking boom) which may not have affected the excluded sectors. 11 The values for many European countries are exactly the same. However, for other countries, such as the USA, Mexico, Japan or Korea (randomly chosen non-EU countries), which are all included in several WT-ENV studies (Rosnick and Weisbrot 2007, Hayden and Shandra 2009, Fitzgerald et al 2015, there are significant differences. supply (TPES) denotes the total amount of primary energy that a country has at their disposal 12 . It includes domestic energy production and energy imports and excludes energy exports and fuels used for international shipping and aviation 13 . Territorial indicators for CO 2 emissions represent either only the domestic combustion of fossil fuels, or in some reviewed studies also industrial processes such as cement and steel production. None of the studies use complete emissions accounts from agriculture, forestry and other land uses, as well as land use changes (AFOLU emissions) (Smith et al 2014).
From a WT-ENV perspective, these productionbased indicators have important weaknesses. Excluding emissions from international flights and shipping can cause an error of 10% or more, especially for small and affluent countries (e.g. the Netherlands), with substantial trends over time (van Goeverden et al 2016, Eurostat 2020. If industrial emissions from cement production are excluded (as in Fitzgerald et al 2018), then changes in the building sector may be lost. Excluding emissions from land use change and forestry underestimates impacts of food and other biomass products (Houghton 2020), which are crucial for tropical countries 14 and for certain impacts through lifestyle changes (e.g. diet shifts and the use of biofuels). Not accounting for other GHGs means that potentially relevant emissions (e.g. methane from food production) are not considered. Even more critical is that none of these production-based ENV indicators reflect the resource/energy/carbon impacts of imports and exports, which make up 10%-30% of emissions attributable to consumption (Wiebe et al 2012), or even more (∼50%) for small and open economies like US states (Erickson et al 2012) or European countries (Tukker et al 2016). Trends of territorial ENV indicators have been strongly influenced by changes in energy/emissions embodied in trade, which grew very significantly in the early 2000s and flattened out around 2010 after the global financial crisis (Pan et al 2017, Wood et al 2019a. In addition, even for the ENV indicators for which official reporting is most accurate, data reliability remains a concern. To show this, we compared territorial CO 2 emissions from fossil fuels per capita from the databases of the Global Carbon Project and the World Bank (supplementary information, figure S2) 15 . The analysis shows that differences in reported emissions can be substantial, up to 20% in 12 It measures energy contents before transformation to other enduse fuels, e.g. the energy content of coal or gas used in a power plant, not the electricity. 13 Changes of fuel stocks are also included, but these are usually small. 14 E.g. Brazil or Colombia, which are included in Fitzgerald et al (2015) and Shao and Rodríguez-Labajos (2016). 15 We chose four European countries (France, Denmark, Germany and the Netherlands), for which Shao and Shen (2017) suggest a negative WT-ENV relationship, i.e. increasing ENV as a result of decreasing WT. the chosen case. These differences can have structural breaks (1989-1990 in figure S2). Moreover, changes of the difference between the databases is very often comparable, and often larger, than the change of the values themselves (supplementary information, table  S3). In addition, sometimes there are substantial revisions of ENV values, especially for countries with weaker institutions or lower commitment to transparency (e.g. Liu et al 2015).
More comprehensive ENV indicators are even less reliable, e.g. because of the uncertainties of AFOLU emissions (Petrescu et al 2020) or non-CO 2 GHG emissions (IPCC 2014). Further complexities and uncertainties characterize consumption-based footprint indicators that attribute all resources/energy/emissions occurring along international supply chains to final consumption. The ecological footprint is conceptually the most comprehensive consumption-based ENV indicator, which provides a measure of global environmental pressures due to food consumption, housing, transportation, consumer goods, and services 16 . This indicator, used in several reviewed papers (table 2), has been heavily criticized, not least because of the arbitrary weighting of the various environmental problems (e.g. van den Bergh and Verbruggen 1999, Ayres 2000, Wiedmann and Barrett 2010, van den Bergh and Grazi 2014). Besides the complicated interpretation, inaccuracies in the data used to calculate the ecological footprint and its carbon component may also make its statistical use misleading (Jóhannesson et al 2020).
At the household level, ENV indicators are not measured directly. Two main types of indicators have been used so far. Devetter and Rousseau (2011) considered selected types of consumption as binary ENV indicators, e.g. 'electricity bills belonging to the highest quartile' or 'having a large house (more than 6 rooms)' . The justification for this approach is that certain types of consumption have substantially higher resource/emission intensities than others (Jalas 2002), and are easier to measure. However, picking consumption categories may bias evaluations towards directly accessible information and risks ignoring other effects.
The more comprehensive, but also more difficult, approach is to account for the environmental impacts of all types of consumption corresponding to the lifestyle(s) of household members using footprint estimation methods (Nässén and Larsson 2015, 16 Environmental pressures from these are converted into a common metric: global hectares, representing the amount of productive land and water areas at average world productivity required to continuously produce the resources and assimilate the wastes associated with consumption (Wackernagel and Beyers 2016). A large fraction of it is due to the theoretical land area required to hypothetically sequester annual emissions. International trade flows are simplified into 'national yield factors' . The National Footprint Accounts are the most widely used EF dataset, providing data for most countries and the world for 1961-2014, based primarily on publicly available UN datasets (Lin et al 2018). Buhl andAcosta 2016, Fremstad et al 2019). Footprint indicators are usually derived from expenditure survey data using input-output (IO) analysis, which maps the structure of economies by quantifying supply chain interactions 17 .
A number of limitations follow. First, household expenditure surveys do not capture all consumption. For instance, some employers provide direct assistance for transport or housing. More importantly, government consumption and investments are not attributed to households, so-among many other effects-emissions linked to large construction projects are not visible in household-level footprints (Chen et al 2018).
Second, IO modelling assumes homogenous prices for each product group (Miller and Blair 2009). This may cause problems, e.g. for energy in countries with liberalized energy retail markets (all case studies reviewed here). In Germany, price differences are up to 20%-30% (Gugler et al 2018) and household energy use is responsible for 25%-30% of CO 2 footprints (Gill and Möller 2018), resulting in 5%-10% uncertainty of household level emissions. Sufficient sectoral/product-detail is thus crucial: IO analysts usually suggest differentiating at least 30-50 expenditure categories. Studies with substantially fewer categories, such as the reviewed paper by Buhl and Acosta (2016), are therefore limited.
Third, it attributes resource use/emissions proportional to monetary flows (Weisz and Duchin 2006). This likely overestimates ENV impacts at high levels of expenditure in consumption categories like housing (or cars) where costs are driven by location (or brand names) without proportionally changing the resource use/emissions associated with their production. Conversely, at low levels of expenditure ENV impacts may be underestimated (Girod and De Haan 2010).
Fourth, results are sensitive to the allocation of resources and emissions to economic sectors and to errors in trade data (Owen et al 2017) 18 . Studies using national IO models, such as Nässén and Larsson (2015) and Fremstad et al (2019) may also be problematic if imports are substantially cleaner/dirtier than domestic products (e.g. for small, open economies with clean electricity like Sweden) (Lenzen et al 2004). This is especially important if the composition of household consumption is systematically related to WT. 17 Officially reported data from national accounts, trade statistics, energy, materials and emissions reporting and official surveys are combined into a consistent modelling framework (Miller and Blair 2009). 18 For comparisons across studies, another complicating factor is that different IO-models exist and the underlying datasets for emissions and resource use (Owen et al 2014, Wood et al 2019b as well as expenditure surveys, national accounts and multi-regional IO tables (Min and Rao 2018) can all differ slightly, affecting footprint estimates.

How are conclusions drawn?
Before discussing how conclusions are drawn at different levels, a general comment is relevant for all reviewed studies. Changes of WT can take place in various ways (Pullinger 2014, De Spiegelaere andPiasna 2017) with potentially divergent effects on WT and ENV indicators, as well as their relationship. Table 3 lists important differences between ∆WTs and refers to their implications regarding the studied relationship.
If different types of ∆WTs simultaneously affect the indicators, then their effects will be confounded in the analysis, preventing conclusions regarding specific types of ∆WTs. This limitation is crucial for country/state-level studies using aggregate WT indicators, and likely relevant for household level studies based on nationally representative surveys.

Country-level studies
WTR is usually discussed as an option for highincome countries to translate productivity growth into lower WT instead of higher consumption (Jackson and Victor 2011, Knight et al 2013, Antal and van den Bergh 2016. Including low-and middleincome countries in cross-country comparisons as several reviewed studies do (table 2) is questionable because not only the drivers of WT and ENV, but also their definitions can be completely different (e.g. in countries with prevalent subsistence agriculture and informal work relations) 19 .
Several studies use country groups to reduce heterogeneity. However, it is unclear whether grouping is enough to avoid all problematic types of heterogeneity. For instance, if WT increases in some countries and decreases in others within a group 20 , then a potentially asymmetrical ∆WT-∆ENV relationship may confuse results . Furthermore, in some studies, country groups change over time. If countries jump from one group to another, then specific trends affecting WT or ENV in a country can distort the results or some groups may have too few countries for statistical analysis (as in Shao and Shen 2017).
With a relatively limited set of countries, the next question is how long the time period should be. There are cross-sectional studies that refer to a single year (Schor 2005, Rosnick and Weisbrot 2007, Hayden and Shandra 2009). This is inadequate because neither WT values nor ENV values are comparable 21 . For longitudinal studies, the goal is to identify a time period with enough data points and a stable ∆WT-∆ENV relationship. The relationship may change due 19 From this perspective, a state-level analysis of the US (Fitzgerald et al 2018) is better because of the uniform definitions and methodology. 20 E.g. in the USA vs. continental Europe since the 1980s. 21 At least no feasible method has been shown so far to control for all country-specific drivers of ENV.

Changes from any given initial level of WT
Influences effects on the intensity and productivity of work (reductions from 55 h per week likely increase productivity more than reductions from 15 h per week) as well as changes in patterns of consumption (forced reductions of part-time work differ from voluntary reductions of overtime) to cultural transitions, demographic changes influencing the number of people in different life-stages, or technological innovations that affect both the ENV intensity of given time uses and the lifestyle choices reflected in time use patterns (Jalas and Juntunen 2015) 22 . Therefore, it seems necessary to avoid structural breaks when WT or ENV change abruptly for external reasons; e.g. 1990 for ex-socialist countries 23 , the great recession around 2009 for most countries, and the fracking boom around 2010 in the USA 24 . 22 However, splitting up the time period in non-arbitrary ways is very difficult because of the various factors that may influence the strength of the ∆WT-∆ENV relationship and the very noisy data that makes statistical methods used for this separation questionable. 23 1990 is in the middle of the first study period for Shao and Rodríguez-Labajos (2016) who include Bulgaria, the Czech Republic, Estonia, Germany, Hungary, Latvia, Lithuania, Poland, Romania, Russia, Slovakia, and Slovenia, as well as the study period of Knight et al (2013) who include the Czech Republic, Estonia, Germany, Hungary, Slovakia, and Slovenia. 24 Without the fracking boom, there would be no apparent relationship on the ∆WT-∆ENV figure in Fitzgerald et al (2018). Model runs without these states are said to be 'substantively similar' to the findings including them, but it would have been useful to numerically report these findings to illustrate the (in)significance of this methodological decision. The results of this study are also very questionable because the study period 2007-2013 includes the financial crisis and the great recession, which drastically changed While a full assessment of the statistical methods of each study is beyond the scope of this paper, we make a few observations about questionable statistical methods and results that are very difficult to reconcile with theory or common sense. First, as changes of WT can affect both productivity (Golden 2012) and the employed population ratio (Zwickl et al 2016, De Spiegelaere andPiasna 2017), it is surprising that not all country-level studies check multicollinearity between the three factors that make up GDP per capita. Second, some studies use methods that look inadequate, e.g. Fitzgerald et al (2015) apply the Prais-Winsten method even though the number of countries exceeds the number of time periods and the time period is short 25 . Third, many papers get very strange results, which should serve as (further) cautionary signs. Shao and Shen (2017) find significantly negative effects of population size and GDP per capita on carbon emissions 26 . For population, Shao (2015) and Knight et al (2013) get suspiciously low and high values, respectively (less than 0.3 vs. the US economy, affecting CO2 emissions in various ways in the different states. 25 This was pointed out by one of the reviewers. 26 In both cases they refer to unobserved variables to suggest potential explanations. However, the same unobserved variables may have affected their findings regarding the WT-ENV relationship, potentially invalidating their headline findings. more than 2). Both studies get very different results for closely related ENV indicators (TPES & territorial CO 2 , and carbon footprint & territorial CO 2 ). Fitzgerald et al (2015) find negative effects of the employed population ratio on TPES, whereas Fitzgerald et al (2018) find no significant effect of GDP per hour and GDP per capita on territorial CO 2 emissions. Despite all uncertainties, Shao and Rodríguez-Labajos (2016) draw counterintuitive conclusions regarding the ∆WT-∆ENV relationship on the basis of weak statistical results, without sufficient sensitivity analysis.
A summary of the main concerns for each country-level study is given below (table 4).

Household-level studies
Not all household level studies try to draw conclusions regarding the ENV impacts of the same type of WTR. To illustrate the differences, we go through the chain of impacts from ∆WT to ∆ENV.
Each step in the ∆WT-∆(income)-∆ (expenditure)-∆ENV causal chain is complex. First, the statistical relationship between ∆WT and ∆(income) tends to be weak because of large differences between hourly wages 27 . Second, annual income and expenditure indicators can be quite different, e.g. because of changes in net savings, income that is not (properly) measured (e.g. capital income 28 ), and transfers between households. Third, the relationship between total expenditures and ENV may depend on personal values and choices regarding the composition of consumption.
Which of these indicators is supposed to change due to WTR has important implications for the ∆WT-∆ENV relationship. Some articles Larsson 2015, Buhl andAcosta 2016) assume that WTR changes WT and expenditures simultaneously and proportionately (for a discussion of representing the financial impacts of WTR as a change of expenditures vs. a change of incomes, see the supplementary information). The aim of these studies is to calculate two separate effects: an 'income effect' , which stems from the financial impacts of WTR, and a 'time use effect' , which stems from the changes in time use patterns (figure 2). Separating these would ideally require detailed longitudinal information on both the expenditures of a large sample of households and the time uses of household members. Then the consequences of changes in either just expenditures or time use could be studied using subsamples in which the other variable is unchanged.
However, such data is generally not available. In both studies using this approach, WT and 27 There may be orders of magnitudes between the incomes of households working the same number of hours, see e.g. data from Fremstad et al (2019). 28 Which can be particularly important in the group that is most likely to prefer WTR. expenditure data come from different datasets. The problem is that WT and income are correlated (Devetter and Rousseau 2011), so time use surveys contain some income effects (e.g. people with higher WT are more likely to have higher incomes and more expensive time uses) and expenditure surveys include some time use effects (e.g. people with higher expenditures likely have higher WT and some expenditures resulting from a time squeeze). It is not enough to consider only one of these effects because the WT-income relationship that is implicit in time use and expenditure surveys is almost certainly different from the WT-income relationship that characterises the studied type of WTR (e.g. a linear relationship) 29 . On the other hand, adding the two effects together without double counting is complicated. Nässén and Larsson (2015) recognized this 30 and attempted to calculate a pure time use effect by not only allocating different types of expenditures to time uses, but also adjusting some of the expenditure intensities to keep total expenditures constant in this part of the calculation. As the original matching between time uses and expenditures is already somewhat arbitrary (Schipper et al 1989, Jalas 2002, Druckman et al 2012, the adjusted version is bound to be very uncertain. What's more, calculating an income effect that does not include any time use effects has not been attempted, which draws into question the usefulness of efforts based on separate datasets. Perhaps the best available data source, which contains both WT and expenditures for the same households, is used by Fremstad et al (2019). As detailed time use data is not included in this dataset, time use information cannot be used to better understand the pathways of ENV impacts (i.e. why certain expenditures change), but a straightforward WT-ENV regression is possible. By doing this, Fremstad et al (2019) make no assumptions about how incomes or expenditures change. (They separately calculate income and expenditure elasticities of ENV, which would correspond to assumptions of WTR changing these instead of WT. A discussion of these is included in the supplementary information.) However, they use a cross-sectional WT-ENV relationship to draw conclusions regarding the longitudinal effects of a WTR scheme, which is very questionable. The relationship between WT and ENV is so indirect that it would be very difficult to theoretically justify this approach. In the case of expenditure elasticities, one could argue that longitudinal changes of ENV may be approximated using cross-sectional data because the expenditure-ENV relationship is 'tight' (at any level of total expenditure the variation of ENV is not very 29 The same limitation is true even if expenditure or time use data is longitudinal because it is unlikely that past changes have occurred under the conditions that characterize any given WTR scheme. 30 Buhl and Acosta (2016) did not.

Study
Main concerns Further comment Schor (2005) Cross-sectional, no control variables, small sample, etc.
Self-describes as thought provoking, not as serious analysis, no conclusions possible Rosnick and Weisbrot (2007) Cross Results without outliers not shown wide) and because characteristic expenditure patterns are followed by specific groups of people. In contrast, the WT-ENV relationship is 'weak' (at any level of WT the variation of ENV is wide) and there are no characteristic lifestyles according to WT 31 . The crosssectional WT-income and WT-expenditure relationships in the sample used by Fremstad et al (2019) are very different from those in the collective WTR assumed by Nässén and Larsson (2015), in which hourly wages are kept constant. It is also incorrect to draw conclusions about the role of WT in explaining differences in ENV at the country level, not only 31 Fremstad et al (2019) tried to reduce heterogeneity by analysing subsamples and using a number of control variables, but the proportion of the variance in the dependent variable that is predictable from the independent variables is still quite low in all of their models (R 2 < 0.5). Controlling for all potentially relevant factors that influence how expenditures would change as a result of WTR looks very difficult (Hanbury et al 2019). If there are systematic differences between unobserved drivers of ENV according to WT, which is fully possible, then longitudinal changes can be expected to differ from the cross-sectional correlation. Path-dependence (e.g. rigidities of consumption patterns) can also invalidate crosssectional results. A further comment is that preferences to change WT under given conditions may differ substantially between different groups in the same sample, so conclusions regarding concrete WTR schemes (collective or individual, compensated or uncompensated, weekly or annual, etc.) may be different even if the same dataset is used.
because the cross-sectional assumption would imply a very specific-and not necessarily realistic-type of WTR, but also because effects at the macro level are neglected. This makes it unclear whether taking annual WT and expenditure data from the same dataset is better than relying on expenditures alone to predict the impacts of a WTR scheme. It is likely that approximating income by expenditures leads to an overestimation of the ENV impacts of WTR while using annual WT or income data leads to underestimation (for the explanation see the supplementary information). Using both approaches is useful for a sensitivity analysis. Differences will depend on the datasets applied, the methods used and the type of WTR regarding which conclusions are sought.
A summary of the main concerns for each household-level study is given below (table 5).

Towards a robust understanding of WT (reductions) and the potential effects on resource use and emissions
In section 3, figure 2 gave a technical overview of selected variables connecting ∆WT and ∆ENV, but did not account for all important causal mechanisms. To propose directions for further research, deeper

Study
Main concerns Further comment Devetter and Rousseau (2011) WT indicator very crude, ENV indicator not comprehensive.
Only qualitative conclusions possible regarding the WT-ENV relationship. Nässén and Larsson (2015) Two non-independent effects added together-part of the problem is not recognized.
No conclusions beyond the expenditure-ENV relationship possible. Buhl and Acosta (2016) Two non-independent effects added together-the problem is not recognized. ENV indicator based on too few expenditure categories.
Unclear reporting of results. No conclusions possible.
Fremstad et al (2019) Cross-sectional WT-ENV analysis not informative about ENV impacts of WTR in general and about country differences.
Conclusions beyond the expenditure-ENV relationship very questionable.
theoretical understanding is helpful. We start from economic changes associated with WTR and proceed towards the social structures that shape the economic system and its dynamics, drive WT related decisions, and influence the impacts of given WTR schemes. Finally, we discuss potentials and limitations of various research directions.

Economic pathways between ∆WT and ∆ENV
Any type of WTR has two immediate impacts: some labour input (working hours) and some labour output (e.g. products or services) disappear from the economic system. A number of indirect impacts can be expected both on the production and the consumption side. When labour input is reduced, production-side impacts may occur through labour productivity, production processes, and labour markets. Impacts on labour productivity are due to individual level changes such as less fatigue and organizational effects like the changing effectiveness of work groups (Golden 2012, Collewet and Sauermann 2017). The magnitude of these effects depends on the types of jobs as well as the type and participants of WTR. Production processes change because different factors of productionlabour, capital, energy, and resources-can substitute or complement each other (Berndt and Wood 1979). Even if the direct labour-energy relationship is close to neutral (Cox et al 2014), longer term effects through substitution or complementarity with capital can be relevant (Fallon and Layard 1975, Apostolakis 1990, Frey and Osborne 2017. These relationships depend on production processes and characteristics of producers like skill levels. Effects through labour markets may include a variety of changes. Total employment may change, e.g. if new workers are hired to keep up production despite WTR, thereby reducing un(der)employment. The strength of this effect depends on broader economic conditions like the level of un(der)employment among workers suitable for the jobs. Besides, sectoral shifts may occur if the organizations where WTR takes place attract Moreover, the previous three effects interact, e.g. changes in productivity and substitution by capital may influence how much labour is sought in the labour market. In addition to production-side effects, the time use of all impacted producers will change, with effects on the consumption side. The impacts of changing production on revenues/incomes at indirectly affected organizations should also not be forgotten.
Effects due to the loss of labour output on the production-side may propagate through supply chains and markets. If some organizations produce less, others may have to scale back too (e.g. buyers) or may see new opportunities for growth (e.g. competitors). On the consumption side, the total exchange value of labour is lost, which includes the compensation of workers as well as the net profits of the organizations 32 . As discussed earlier, income reductions have complex effects on consumption.
Finally, production-side and consumption-side impacts interact. Which effects dominate depends on the markets through which they are connected. Elasticities of supply and demand play important roles here. Supply-side and demand-side economists will likely see the overall impacts differently. Given that impacts have different time scales and no equilibrium is guaranteed, the least one can say is that overall impacts are very complex (figure 4).

Societal structuring of work, time uses and environmental effects
Before turning to the question of what this complexity means for future research, we look at the larger societal structures in which these economic pathways are embedded. We begin with broader theoretical 32 Plus there is a shift of taxes and contributions between the state and the organization. perspectives, then mention theories that are relevant for specific causal effects in figure 4. Given the large amount of literature on the sociology of time use, we only select a few strands that are particularly focused on how people view time and how this is changing.
One broad narrative that focuses on our changing relationship with time is that of Hartmut Rosa (2013). He identifies three types of acceleration. Technical acceleration is about the increasing speed of goaldirected processes, following the economic idea that 'time is money' . Social acceleration means that social beliefs and actions have shorter periods of validity and are co-existing with often radically different other beliefs and actions. The acceleration of the pace of life is a response of modernity to cope with the ultimate limitedness of the human lifetime, which drives people to exhaust as many options as possible. These trends influence production processes, people's aspirations, and time uses. However, such general philosophical approaches have to be further specified to reach quantitative conclusions regarding the ENV impacts of WTR.
A somewhat more concrete strand in the literature talks about treadmill effects (Binswanger 2006). The 'positional treadmill' refers to the constant strive for social status relative to others (Frank 1985), the 'hedonic treadmill' describes the adaptation to higher levels of income and consumption (Stutzer 2004), and the 'time-saving treadmill' means that time-saving innovations tend to have large rebound effects and do not mitigate time pressure. A concrete example for the latter is that faster travel tends to result in longer distances covered instead of travel-time savings. Between 1965 and 2000, average travel-time budgets in affluent countries were invariant with both income and other time uses, including WT (Schäfer et al 2009). If travel-time budgets are still stable, then travel emissions are more strongly influenced by the costs of different modes of transportation than WT, so this theoretical prediction is informative for the ENV impacts of WTR.
However, not all time use categories and environmentally relevant behaviours show consistent patterns, so building up a general theory regarding the ENV impacts of various types of WTRs currently seems unfeasible even at the level of households. Working hours structure everyday lives and strongly influence income, so they likely shape the nature of non-work activities and associated environmental impacts, which occur within specific material settings and social obligations (e.g. Wiedenhofer et al 2018). Yet these behaviours can be expected to be dependent on the context, including family circumstances, worldviews, money matters, living environments, etc. (Hanbury et al 2019, Lindsay et al 2020. The diversity of these influences may be a reason why so little empirical evidence has been found so far for direct impacts of work-life balance on environmental behaviour (Kennedy et al 2013, Melo et al 2018. We also note that a number of environmentally important decisions, e.g. on larger investments, have not been investigated as a function of WT so far. In addition, various rebound effects may cancel out significant parts of initial environmental benefits . At the systemic level, effects are even more diverse. Seemingly beneficial lifestyles like telework have ambiguous ENV effects (Hook et al 2020) and WTR schemes may substantially differ from each other (King and van den Bergh 2017). Therefore, instead of general theories we suggest a series of context-specific investigations.

Implications for further empirical research
Considering the limitations of the approaches used so far, we suggest several research strategies that may help to better understand the environmental and climate change mitigation potentials of WTRs.
At the level of countries, the most important step forward is the assessment of the reliability of statistical approaches using aggregate data. One main problem is that many types of ∆WTs occur simultaneously, and only the cumulative effects of these changes appear in aggregate data. Therefore, both the patterns of ∆WT (whose WT changes and how) and the extent to which these ∆WTs differ from each other in terms of ENV impacts must be understood better. The former could be achieved through labour market investigations focusing on the types and conditions of ∆WT, while the latter requires smallerscale case studies (see below). Another main problem is that current statistical studies do not control for key drivers of ENV indicators. To assess how this affects reliability, the role of ∆WT in ∆ENV could be studied in individual countries. Substantial changes in WT over relatively short periods could be analysed first, considering the various sectoral drivers of ∆ENV and assessing the role of ∆WT in the overall change. Natural experiments in which different regions of a country have different regulations could also be useful to study if appropriate disaggregated data is available (Chemin and Wasmer 2009).
One possible outcome is that case studies find mechanisms that completely invalidate country-level statistical approaches. For example, simultaneous ∆WTs with opposite ENV effects would make aggregate indicators useless. Country-level statistical comparisons are also inadequate if uncertainties in sectoral drivers of ∆ENV turn out to be substantially larger than the contribution of ∆WT. The other possible outcome is that no such prohibitive difficulties are identified. Then the limitations discussed in section 4 should be addressed as much as possible. A longitudinal approach is necessary and structural breaks should be avoided. Countries should be sufficiently similar in terms of ∆WT and ∆ENV in the sample: checking and arguing these will be easier after the case studies. Internationally comparable datasets should be used for WT (Bick et al 2018, Fuchs-Schündeln 2019, and various ENV indicators should be tested in the same study. Changes of public and private debt should be monitored. At the household level, a main reason for the lack of reliable studies is the limited availability of consistent data. Too often, expenditure surveys and time-use surveys are conducted separately and matching these requires many uncertain assumptions. Therefore, collecting and using data on the same households for WT and expenditures is strongly preferable. Such data must be longitudinal to be informative regarding impacts of WTR. In particular, a series of 'before-after' case studies from different contexts would be extremely useful. Building on early efforts (Promberger et al 1999), information should be collected on expenditures and time uses simultaneously, considering savings and loans, as well as changes of aspirations. Using relatively small samples, it might be possible to explore the WT-incomeexpenditure-ENV chain in detail. Like in the countrylevel case, this would also help to estimate the reliability of results based on larger samples but less detailed information. One could assess the importance of various neglected factors like capital incomes, a safety net through inter-household relations, etc.
In these suggested case studies, key behaviours and associated expenditure categories-such as travel, which may have become an important leisure activity, attracting a disproportionate share of discretionary income (Oswald et al 2020)-need special attention. Better understanding these categories and their ENV impacts using both monetary and physical terms may worth separate studies. Similarly, using physical terms for household energy consumption is advisory. These could then feed into more comprehensive assessments using a sufficient number of consumption categories.
Another useful direction is to study cases where WT is reduced while incomes are preserved, like in a Swedish trial a few years ago (Oltermann 2017). These may reveal a pure time use effect. If income is registered in time use diaries, that may also help modelling the effect. For a first step towards estimating a pure income effect, it could be useful to use a combined WT and expenditure dataset (e.g. from the USA) and compare the income-expenditure relationship at given WTs (this is feasible because there are many households at standard levels of WT) with the income-expenditure relationship of the whole sample. To collect longitudinal data, case studies of WT-neutral salary increases and decreases could be beneficial. Whether pure time use and income effects can be combined to understand overall effects at households is a further topic to be explored.
Before jumping to conclusions, household level studies must be complemented by estimates of impacts at the company and macro-economic levels. Whether WTR changes productivity in given sectors is one question. Combining quantitative productivity indicators with qualitative data looks suitable to explore this. Another question is about indirect impacts on other workers' WT (including overtime hours) and employment, which could help to understand how different WTR schemes change WT at the aggregate level. Recent modelling efforts represent a useful step towards understanding this (Cieplinski et al 2021). Analysing the substitution or complementarity of labour, capital and resources in various production processes is not only useful to develop such models, but also to investigate production-side effects of WTR. In each of these cases, specifying the types of WTR under study will be important.

Conclusions
Current research on the potentials of WTR as a demand-side climate change mitigation measure remains inconclusive. While the positive relationship between income/expenditure on the one hand and resource use/emissions on the other is established (e.g. Hubacek et al 2017, Oswald et al 2020, the complexity of the ∆WT-∆ENV relationship and the lack of sufficiently detailed data on both time use effects and systemic feedbacks (including production-side effects) precludes strong conclusions. This review summarized the existing literature, uncovered a number of methodological limitations, and suggested ways forward.
In the case of all reviewed country-level studies, fundamental questions were raised about the validity of conclusions. Some studies used crosssectional samples with severe comparability issues, which makes them inadequate. Longitudinal studies based their results on very noisy data without controlling for the main confounding drivers of environmental indicators, also potentially confusing different types of ∆WTs whose ENV impacts may differ. To test the usefulness of country-level statistical approaches, we proposed country-level case studies focusing on individual countries.
The reviewed household-level articles that aimed to comprehensively assess ENV impacts are also questionable. They either calculated total ENV impacts as a sum of two non-independent effects (the income effect and the time use effect), or used a crosssectional WT-ENV relationship to draw conclusions regarding longitudinal effects of WTR, even though the respective WT-income-expenditure-ENV relationships are likely to be different. The most important step towards rectifying these problems would be the simultaneous, longitudinal collection of time use (or WT) and expenditure data. With this approach, a series of context-specific investigations could provide very valuable insights. Studies on particularly important behaviours and impacts, like those related to mobility, could help to make assessments more reliable, especially if they also collect physical, not just monetary, data. In addition, separate investigations of production-side effects on productivity, the substitution or complementarity between factors of production, and labour market impacts, as well as investigations of macro-level feedbacks should complement these analyses. Taken together, these suggestions represent a significant shift of research directions in this field.
One should keep in mind that understanding current WT-ENV relationships is not all that is needed to estimate the potential environmental impacts of WTRs. At least at the household level, environmentally beneficial WTR looks possible (Hanbury et al 2019), so the other main question regards the conditions under which such benefits are achievable. ENV impacts of WTR schemes depend on various factors that may be changed, such as social norms and urban structures (Kennedy et al 2013, Kallis et al 2013.
Research on these drivers is as important as research on the past or current relationship itself.
Due to the Covid-19 pandemic, both working lives and environmentally relevant behaviours are changing. Exploring how, under these new circumstances, different types of WTRs may affect environmental and climate-relevant indicators is a very important task for the future. Unlike currently dominant supply-side and technological solutions to environmental problems, WTR may provide a new vision of more socially and environmentally sustainable societies.

Data availability
No new data were created or analysed in this study. paper writing. JM participated in coding and statistical discussions, provided technical help. DW participated in research design, analysis and paper writing.