Comprehensiveness of circular economy assessments of regions: a systematic review at the macro-level

The circular economy (CE) is emerging as a solution for a thriving economy within regional and planetary boundaries for environment and social justice. CE is multifaceted with interconnected processes and therefore rather difficult to assess comprehensively. This paper reviewed the corpus of macro-level CE assessments, to find the best practices in CE assessments of regions scaling from neighborhoods to planetary. The extensive content analysis on the corpus of 165 studies used a novel mixed methods of meta-analysis, taxonomy and integrative review. This review investigates the comprehensiveness of CE assessments. Findings include three types of CE performance monitoring, four types of resource clustering, five scales, and a 5-step procedure to evaluate CE. CE can be monitored on: (a) absolute performance, quantifying economic resource-input, stock and waste-output; (b) efficiency performance, monitoring the optimization of CE processes similar to recycling, reuse, or even sharing and virtualizing; (c) policy performance to monitor strategies from regional stakeholders. Resource clustering can create hierarchies by metrics, uses, system-boundaries, or emergy. Identified scales are: XL for the planet; L for continents; M for large provinces, states and smaller countries; S for cities; and, XS for neighborhoods. Scales assist in comparing and benchmarking, but are also required for a proposed policy of localizing CE. This review found the ReSOLVE-framework as relatively comprehensive on CE processes. Also, multiple knowledge gaps were identified among resources, processes and regions. This review aids CE knowledge accumulation across regions and scales, to accelerate implementing the CE.


Introduction
A global and local challenge is to keep the economy within an environmentally safe and socially-just zone, with upper boundaries set by the ecosystem capacity and lower boundaries to ensure quality of life for society [1][2][3]. There are governments and others who respond to this triple bottom line (TBL) challenge by transitioning towards a circular economy (CE), as an approach to decouple anthropogenic pollutants and resource consumption from economic development [4][5][6].
There is no clear and unified definition of CE [7]. Narrow definitions are about optimizing resource consumption through recycling, but mainstream definitions lean towards a broad sustainability paradigm with comprehensively optimizing the TBL dimensions of environment, society and economy [7,8].
A CE that comprehensively optimizes the TBL is more ideal, but this requires a complex systems approach to manage the interconnectiveness of processes and resources within the TBL-dimensions [9][10][11]. When this systems approach is insufficiently comprehensive, it may result in managing towards suboptimal solutions, or problem shifting to processes and resources that are missing in the systems [12]. Optimization also requires evaluations for evidence-based decision-making through measurement, assessment, comparison, and benchmarking [13][14][15].
These evaluations for CE are done at three different system levels: macro-level for regions (here used synonymously to geographic area and territory); meso-level for symbiosis between industries; and, micro-level for firms [16]. Each level requires a different systems approach. Comprehensive assessments are particularly a concern for the macro-level, as all processes, resources, and TBL-dimensions come together here [17][18][19][20][21][22].
There are already hundreds of CE assessment studies [23][24][25]. Most of these are intended as comprehensive, however, their heterogeneous approaches suggest an absence of a successful comprehensive systems approach. Many authors have recognized this need for a comprehensive assessment approach for the macro-level. For instance, Geng and Doberstein [17] urge for 'an information system adopting a systems approach is required if decision-makers are to find more environmentally and financially beneficial ways to plan and manage their resources. ' Haupt et al [20] questions Macro-level studies: 'Do we have the right performance indicators for the CE?' Silvestri et al [22] state that the clearest gaps on macro-level are: 'inadequate monitoring and evaluation of CE implementations through the use of composite indicators (…) the construction of a comprehensive CE Index to capture CE performance as multidimensional phenomenon seems to be missing.' This paper therefore aims to understand how to assess CE scoped as a comprehensive system.
This paper systematically reviews current CE macro-level assessments, to find which processes, resources, and TBL dimensions are assessed. A review specifically targeting the macro-level is still noteworthy among the many existing CE literature reviews. This review aims to amalgamate studies towards a comprehensive assessment approach for the macro-level, by exposing the gaps in current approaches. The gaps result from exploring sub-questions where, when and why, and foremost what and how these studies assess CE. This is followed by synthesizing the studies for comprehensive approaches. All these steps are followed to answer the research question: what is needed for a comprehensive systems approach assessing CE at the macro-level?
Contributions of this study are threefold. First, a methodological contribution for extra rigour through a mixed-methods review, finding gaps and previously hidden patterns in the CE literature. Second, structuring CE assessments in: three distinct types of performance monitoring, five scales, four types of resource clustering, and a 5-steps procedure to evaluate CE; all expand knowledge on comprehensiveness of CE assessments. Third, revealing gaps in literature, particularly in self-proclaimed comprehensive frameworks, denoting the fluid conceptual foundations of CE.

Materials and methods
A corpus on 'assessing CE for macro-level' was created to review what this entire body of literature has and has not covered. Quantitative evidence from this corpus is created by a meta-analysis and taxonomy. Both review methods are powerful to review a large set of literature on coverage, focus, goals, perspective, research methods, theories and more [26,27]. Yet, both fail to find richer qualitative evidence as they do not involve much-nuanced reading [28]. Therefore, the third review-method applied is a qualitative integrative review. An integrative approach 'reviews, critiques, and synthesizes representative literature on a topic in an integrated way such that new frameworks and perspectives on the topic are generated' [29]. Elsbach and van Knippenberg [28] argue this method to be among the most useful for advancing knowledge and furthering research. The quantitative methods guide the integrative review to elicit a deeper qualitative review. A mixed-methods review is novel as it adds to existing methodological gaps, providing an all-round content analysis with rigour and validation of evidence. The success of the mixed-methods review is discussed in section 5.6. The methods are assisted by software tools Scopus, Google-search, NVivo, Wordclouds and Excel.

Corpus creation
To create a complete corpus, a systematic procedure was used similar to PRISMA. PRISMA [30] is the 'Preferred Reporting Items for Systematic reviews and Meta-analyses' for medical sciences. PRISMA is applied with a flexible rationale, because CE is not as organized in terminology and methods as medical sciences are. This resulted in a flow-diagram with six different batches collecting 165 papers and reports (see figure 1). The corpus is available in Appendix 1 (available online at stacks.iop.org/ERL/16/103001/ mmedia).
Batches 1 and 2 come from a Scopus search on papers published after 2009 with keywords: circular, economy, indicator, index, metric, assess and measuring, and their conjugations. The results were filtered manually on relevance. Batch 3 extends the corpus via snowballing by checking citations in papers from the search results to identify additional papers [31]. Batches 1-3 include 30 papers by their English abstract only, as the full paper was either unavailable or not in the English language. Batch 4 expands the corpus by hand-picking papers that are not about CE, but have a similar comprehensive systems approach, they are e.g. on sustainability and doughnut economy. This batch is kept small as our focus is on CE, as CE gained a lot of traction in recent years. Batches 5 and 6 are from Google searches on the original keywords, with selectively handpicked grey literature. Grey literature makes an evidential contribution to CE via reports from governments, and consultants like the Ellen MacArthur Foundation (EMF) and Metabolic.

Meta-analysis
The meta-analysis builds from salient keywords in the corpus. These were obtained with the bibliometricanalysis software NVivo, by searching for the 1000 most frequent used lexical words, including their conjugations. From analyzing the keywords, six themes emerged that relate to the sub-questions. The keywords were then categorized into these themes, or disregarded for not having a relevant meaning. This resulted in 408 relevant keywords, each with over 160 references, and 350 414 references combined. The six keyword themes are: Examples of keywords, conjugations, and manual processing is presented in table 1, the full list can be found in appendix 2.
Next, wordclouds are created to further analyze each theme and their keywords. In a wordcloud the font-size of each keyword is weighted by the number of references. The weighting used here is on a logarithmic (square-rooted) scale, as the distribution of references per keyword follows a Gaussian curve (see table 1). This creates smaller differences in font-sizes, improving the readability of keywords in wordclouds. The keywords are also color-coded by subthemes, as presented in the results section.
The meta-analysis on salience of keywords creates strong quantitative evidence from the complete corpus. However, the corpus made the meta-analysis slightly Western-centric. This is because 30 studies, mostly from China, are under-represented by only an abstract. Furthermore, searches were done in English language, resulting in a grey literature collection predominantly from Europe, of which some are over-represented due to their length; e.g. [32] on Buiksloterham is the lengthiest with 40 000 words.

Taxonomy
The taxonomy categorized the 165 studies of the corpus into a datasheet on multiple topics (see appendix 1). The topics are: author, publication date, journal, comprehensiveness, main-method, measurementtype, data input, and results-presentation as well as their assessments: location(s), scale(s), and period. The taxonomy is used secondary to the meta-analysis for answering the sub-questions of this research.

Integrative review
The integrative review takes a deep-dive into the quantitative results from the meta-analysis and taxonomy by more nuanced reading, reviewing, critiquing and synthesizing. Through this, new insights and perspectives arise from the review [28]. The integrative review aims to create new guiding frameworks to advance theory development on what is needed for a comprehensive systems approach assessing CE at macro-level.

Results meta-analysis and taxonomy
This combined results of the taxonomy and metaanalysis contribute to answering the explorative research sub-questions for: where, when, and why CE assessments; what resources are assessed; and, how is the resource-use optimized of comprehensive assessments on CE at macro-level.

Where?-introducing scales
Keywords on the theme of regions account for 14.1% of all keyword references. Two distinctive subthemes are identified; location of region, and scale of region (see figure 2). Figure 2 displays that most assessments are done in either Europe or (Peoples Republic of) China, but there are also other locations. Many keywords imply a scale of region (keywords in green), but they are rather vague, e.g. a country does not define an actual scale. A clear classification of scales allows better 'like to like' comparing and benchmarking; supports the taxonomy, and; clarity in general. The EU [33] introduces such classification, but this is not usable outside the EU. Therefore, our paper proposes a classification with: XL-scale for planetary studies; L-scale for continents, supranational regions and large countries; M-scale for large provinces, states, smaller countries and mega-cities; S-scale for cities, small provinces and municipalities; and, XS-scale for neighborhoods (see figure 3). This classification is chosen mostly because many studies already implicitly use it for comparisons (see appendix 1).
Scales help with 'like to like' comparing. For example, it is not cogent to compare China with the Netherlands (country to country), but China can be compared with Europe (L-scale to L-scale); as both size (in km 2 ) and population are comparable [34]. Benchmarks are also bound to scales, as each scale should set different CE targets. For example, L-scale may target a full CE within its scaleboundaries, but S-scale would reasonably target only certain resources. This concept of 'localizing CE' is discussed in the discussion section 5.3.
The taxonomy classified both locations and scales of all studies (see figure 4). It presents that 94 studies (57%) include European regions; 67 studies (41%) include Chinese regions, and; only 42 studies (25%) include other regions. The number of academic studies in Europe and China is similar (65 and 63 respectively); the difference appears greater only because English-language grey literature is mostly about Europe. Only 12 studies (7%) focus solely on regions outside Europe and China. Studies are fairly distributed over all scales, except for XS-scale with just 8 studies (5%).
The majority of studies make comparisons between regions for benchmarking, and they mostly compare regions within the same scale. As this cannot be done on the XL-scale, 13 out of 22 XL-scale studies also include L or M-scales with comparisons on these scales. Comparisons between different scales is only done in ten studies (6%). Comparisons between different scales are somewhat misleading, but interscale studies can explore top-down, and bottom-up effects between region and its own sub-regions.

When?-increases per region
The when-question is explored by taxonomy. Figure 5 presents a graph with the number of publications per year, specified by assessed location and scale. It presents a steep increase of total studies from 2016 onwards, which is also the inflection point for European regions, followed by 2017 for other regions. Studies on Chinese regions are published more gradually over time, with a peak in 2011. The number of XL-scale studies has slightly declined over the years, and S-scale studies have risen since 2016.
To delve deeper into the 'when-question' , the taxonomy results were compared with release dates of CE policies, as found in the studies. It was found that CE policies were followed by assessments in their region (e.g. [6,35] for China, and [4] for EU), but also in their scale (e.g. [36] for S-scale).
Concerning the assessed timeframes, it was found that: 38% (63 studies) assess only a single timeframe with data from one year. 36% (60 studies) assess longitudinal progress in regions. 10% (16 studies) only focus on future targets or an approach without measurements of a baseline year. For the remaining 6% (10 studies) there was no timeframe found, as the full study was not available.

Why?-for TBL: the environment, economy or social equity?
The TBL is often the main argument for governments to aim towards a CE. Keywords that relate to the theme of TBL account for 11.0% of all keywords, yet the keyword TBL (synonym with SDG as shown in table 1) is only mentioned in 45 studies, with 548 references in total. TBL subthemes are social equity, environment, and economy (see figure 6). Figure 6 reveals a fairly even distribution of references on the TBL subthemes: 40% on environment, 33% on economy, and 27% on social equity. Note that this result is from quantitative salience of keywords; it did not analyze which dimensions of the TBL are assessed. Social equity is evidently less assessed, because it is more difficult to monitor.
This distribution is more in balance than Geissdoerfer's et al [8] observation: 'CE clearly seems to prioritise the economic systems with primary benefits for the environment, and only implicit gains for social aspects' . This is probably because macro-level has a more intrinsic comprehensive approach, e.g. [2,37].
Additional searches were made for keywords (including conjugations) that describe CE for smaller scales: resource independence no references; autarky 1 [32]; self-sufficiency 53 references, but only six in academic publications, by [9,42]. Any of these in-salient keywords combined with CE also scored low in a general Scopus search.

What?-resources and wastes
Managing resources and their associated wastes is a key-topic in the studies. 26.1% of all keywords are on the theme of resources and 6.4% on wastes. Figure 7 presents them in two wordclouds with subthemes bio-based and technical, along with their measurement units. These subthemes to cluster resources were introduced with the butterfly-framework from EMF (2013). The meta-analysis found 21 bio-based  resources with 12 679 references together; and 21 technical resources, with 9022 references together; indicating a focus on bio-based resources. The most common material measure unit is weight (mass) in tons or kg, with 2791 references in total. The keyword waste is found with 10 363 references, the most salient keyword overall.
The meta-analysis and further analysis also identified resources that receive little attention. Indium has 250 references from 6 studies. Other rare-earth elements receive fewer references, with 16 studies (10%) mentioning any, with five studies also assessing them [43][44][45][46][47][48]. Rare-earth elements are in very low natural supply and indispensable for certain key technologies [49]. It is therefore important to assess their circularity and availability in regions.
The data on the resources are mostly available by existing databases. Different databases were put in the taxonomy, presenting that about 25% of studies using data from the National Bureau of Statistics in China (NBSC) [50] and another 25% EuroStat [51]. A more complete overview of databases and their data is provided in appendix 3.
Reviewed studies cluster and aggregate resources in comprehensive assessments. E.g. clustering on biobased and technical, deepened with sub-clusters, or broadened with clusters of non-material resources, such as energy, population, and labor.       [70][71][72], e.g. [71]. on Sscale for Napoli assess food, building materials, metals, additional materials, fuels, renewables, labor & services and others. Emergy as measure-unit for CE at macrolevel was introduced by [70] for Taipei, but application is still low due to the complexity of applying emergy at this level [73].
Identifying these types of clustering aids understanding that resources can be 'comprehensively' assessed in many ways. Each way leads to different results as resources get placed in different hierarchies. The purpose for clustering relates to monitoring CE performance, which will be further discussed in section 4.1.

How?-strategies and processes towards a CE
The strategies theme counts 107 different keywords that make 25.4% (89 027 references) of all keywords (see figure 8, top). This paper will use the term processes for strategies in a context where they are quantifiable resource actions (e.g. recycling as recyclerate).
R-frameworks include a number of strategies starting with 're': refuse, reduce, reuse/resell, repair, refurbish, remanufacture, repurpose, recycle, recover and re-mine. These strategies have a hierarchy of importance with refuse as most preferred, going down the list to re-mine as least preferred [84,85]. However, the meta-analysis shows that recycling; a lower-ranked strategy, receives by far most attention with 4559 references. Refuse is highest ranked in hierarchy, but as keyword even with its synonyms prevent and avoid, receives only 693 references combined. Figure 8 bottom left, compares keyword references from the corpus, versus the theoretical hierarchy of R-frameworks.
The strategies wordcloud also reveals other keywords starting with 're' that are not included in R-frameworks: regenerate, renew, replace, regulate and resilience (with 250, 77, 227, 475, 234 references respectively). It seems that R-frameworks are rather limited to 'waste management' , which is only one facet of CE [6,86,87]. Besides, selecting keywords starting with 're' is rather whimsical to build a framework upon.
The ReSOLVE-framework has no hierarchy in strategies and is an acronym of the strategies: Regenerate, Share, Optimize, Loop, Virtualize and Exchange. ReSOLVE from EMF [80] also explicitly mentions other strategies as related to aforementioned strategies. Figure 8 bottom right, presents ReSOLVE strategies as wordcloud with keywords weighted from the corpus. This 'ReSOLVE wordcloud' reveals that the categories are fairly equally weighted, except for Regenerate and Virtualize.
R-frameworks cover 11 542 references, and ReSOLVE covers 19 226 references, all of these could be interpreted as processes. From the remaining keywords, 24 relate to governance strategies as policies to stimulate CE, which cover 18 720 references. The remaining non-specific subtheme seems related to the micro-and meso-level (see figure 8 top).
The meta-analysis presents ReSOLVE as selfproclaimed comprehensive framework that is actually reasonably comprehensive and fairly weighted for CE processes. Note here that this meta-analysis only creates quantitative evidence grounded in the existing body of literature on 'assessing CE at macro-level' . This could also mean that the lower weighted Regenerate and Virtualize are not less important, but actually insufficiently studied as they are arguably very impactful. Despite our positive review on ReSOLVE, it is only applied in four studies. Section 5.4 discusses deeper on which processes to include for a comprehensive assessment.

How?-methodologies to evaluate CE
The remaining keywords from the meta-analysis are the methodologies theme (59 636 references; 17.6%). The meta-analysis and taxonomy on this theme confirmed findings from earlier literature reviews from Saidani et al [24] and Sassanalli et al [25]. No significant new findings were made here, therefore these results were left out of the paper. Results are available in appendix 1 (taxonomy) and appendix 2 (meta-analysis). A more qualitative method for further answering 'how to evaluate CE' is explored in section 4. Sections 5.1 and 5.2 will discuss our results in relation to other literature reviews.

Results integrative review
The integrative review builds further on the metaanalysis and taxonomy. Now with a more qualitative approach, the research question is reiterated 'What is needed for a comprehensive systems approach assessing CE at macro-level?' . This integrative review aims to provide guidance and advances knowledge through synthesis.

Reviewing differences in monitoring CE performance
The review by meta-analysis and taxonomy revealed limitations in studies not only on comprehensiveness and clarity, but also the heterogeneous approaches towards comprehensively assessing CE. A more nuanced review identified three types of CE monitoring, of absolute performance, efficiency performance and, policy performance: (a) Absolute performance: monitors resources to find the biggest contributors to prioritize. Clustering resources (as covered in 3.4) create a hierarchy of priorities for regional stakeholders. Indicators include input entering the economic system; stocks accumulated, and/or; output exiting as waste. Studies hardly monitor stock accumulation, which is further discussed in section 5.5. Absolute performance can include economic or social dimensions by comparing resources consumption with economicindicators (e.g. Gross Domestic Product; GDP) and social-indicators (e.g. Human Development Index, HDI). Monitoring absolute performance builds on the conceptual framework from industrial ecology, as described by Boulding [94] and Daly [95]. Here the economic system is depicted as subsystem of the eco system (see figure 9), it depicts that a CE is accomplished in a region when the (socio-) economic consumption as throughput does not exceed the regeneration capacity of its eco system. A few studies therefore offset the economic impact against the eco system regeneration capacity, e.g. [40,59,[74][75][76]., however, these studies assess sustainability and do not use the term CE. CE assessments have been limited to only assessing the economic system, but this misses the point that the boundaries of the economic system is set by the eco system. (b) Efficiency performance: monitors CE processes in regions to understand capacities and Figure 9. A conceptual framework from industrial ecology, based on [94,95]. The economic system is depicted as sub-system of the eco system, with input and output as their interaction, resulting in a 'shrinking parent' as the economic system devours the eco system. Regeneration is the capacity of the eco system to restore the waste created from throughput of the economic system. Loop are the CE processes that make a reversed resource flow, such as recycle, remanufacture, refurbish and compost. This systems approach only depicts material flows, it is not comprehensive for assessing processes like share, optimize, virtualize, exchange and the TBL.
where/what/how to optimize. It is challenging to assess CE performance comprehensively. This is because there are so many CE processes (as shown in figure 8). Many processes can be combined for monitoring and measuring as resource flows in an input-output framework, as is done in figure 9. However, this would still exclude e.g. share, optimize, virtualize, exchange and the TBL. Every efficiency performance assessment is currently unique, which prevents comparisons between results. (c) Policy performance: monitors strategies used by the region to improve CE. Policies are governance-related (see figure 8) and include qualitatively assessing CE processes. Reviewed studies do not present a comprehensive framework to monitor strategies, but rather give a nonexhaustive list of examples, e.g. [80,82,89]. It seems that CE policy frame-works still need to develop more (also confirmed by [21,88,96]). Assessing policies may be rather subjective, but it is in-dispensable for introducing strategies to different regions [97].
Studies in the corpus often combine absolute, efficiency and policy indicators in one assessment. Arguably, this is not a good systems approach and it reduces the meaning of the results. Efficiency is the means to an end; absolute CE is that end, therefore they should not be aggregated. A 100% CE performance in absolute terms could be defined by zero-input or output from the economic system (see figure 9). CE efficiency could be defined by 100% efficiency of loop-processes, supported with high rates in other CE processes. Efficiency performance may have a discrepancy with absolute performance due to increasing population, increasing affluence, and rebound-effects. The corpus does not monitor them as part of a CE strategy, but they are also factors in the system. Ideally, policies should create sufficient efficiency to stay within the absolute boundaries of the planet.

Synthesizing the procedure for comprehensive CE evaluation at macro-level
Heretofore overlooked is the procedure to create a comprehensive assessment. No study in the corpus references to any conceptual or tested procedural steps, despite this being a general challenge for all measurement studies. We synthesized the studies to find procedural steps that build a reliable and valid assessment. Findings from section 3.6 guided this procedure, and studies were reviewed again to explore recurring steps and the rigour in their procedure. The synthesis identified five interdependent procedural steps (see figure 10). These steps actually coincide with measurement-theory of measuring an abstract construct with observable measurements [98][99][100].
The steps of this 5-step procedure are:

Conceptual input:
A study needs to refer to a clear definition of CE, as CE is a rather abstract construct without unified definition. The definition should provide its dimensions and specify implications for the region. This could be developed with the help of a conceptual framework. Reviewed studies make very different interpretations of CE and its many definitions, partially because current definitions are ambiguous on the implications of CE at macrolevel.

Observable input: Resource flows and processes
can be measured because they are observable in regions and at any time-frame. These measurements form data input for assessments. Studies in this corpus mostly use quantitative measurements readily available in databases as listed in appendix 3. Just 13% (21 studies) did not use databases, instead they conducted interviews to seek primary data or made estimations as data input. 3. Assess: Conceptual input and observable input are conditional steps for assessments. As CE and its dimensions are not directly observable, reflecting proxy indicators needs to be aggregated from multiple observable measurements. The assessment framework is composed of proxy indicators as a practical interpretation of the conceptual framework. Compromises will be needed as some conceptual dimensions are not practically assessable due availability, reliability and alike. These limitations should be stated clearly, preferably even presented in the assessment framework. However, most self-proclaimed comprehensive studies are rather hazy on what is included and excluded. Studies seem to create assessment frameworks formatively by available measurements that do not cover all conceptual input. Monitoring absolute, efficiency or policy performance requires different frameworks and different methods, e.g. MFA (Material Flow Assessment), or a fuzzymethod with multi-criteria decision making. It was found that many methods are unable to sufficiently assess all CE dimensions. 4. Compare & benchmark: Assessment results can be compared and benchmarked with other regions or time-frames. Studies assess their most recent observable input to compare with: past years of longitudinal progress; a future target or expectation; a region of the same scale; or, its supra-region. The contribution of a study increases when multiple comparisons are made, as it creates more results, but also better benchmarking.Comparisons are usually presented in a chart or table, ideally presented in a widely applicable dashboard or index with an overview of score and rank that can be reproduced for more assessments [101]. 29 Studies made attempts to create a CE index as universal as GDP or HDI, however, only 'Circularity gap report' [52][53][54][55], footprints, emergy, and eco-efficiency has been applied multiple times. 5. Output: Comparing and benchmarking gives a prognosis on the region, which should be followed by an action plan with CE targets. The success of the action can be assessed by iteration.
The 5-step procedure helps in creating structure, transparency, reliability and validity for comprehensive assessments; leading to stronger comparisons and benchmarking with the right CE targets. The steps in most reviewed studies are disconnected, as they lack considerations of procedural steps in creating an assessment. CE dimensions, measurements, proxy indicators and performance indicators are mixed up, and little reasoning goes to weighting and hierarchy. Studies end up as 'mashup index' [101] with erroneous and obscure results in reliability and validity and therefore, also on its comprehensiveness.

Discussion
The discussion section covers six points: Placement of findings in a broader context by (a) comparing similar review-studies, and (b) transferability of findings. Gaps on (c) localizing CE, (d) strategies and processes, and (e) slow-flowing resources. Final point is (f) the methodological contribution.

Literature reviews compared
This corpus overlaps with corpora in literature reviews from [23][24][25]. This paper selected 165 macrolevel studies as corpus, whereas [23] selected 37 mesolevel studies [24]; 19 macro-level, 16 meso-level, and 20 micro-level studies; and [25], 45 meso-plus microlevel studies. Particularly their taxonomies on methodologies and indicators overlap with our findings on this. These findings are therefore not in section 3.6, but presented in appendix 1. Our corpus is notably larger, but not because macro-level is more popular than other levels; batch 2 selected only 51 out 295 studies as macro-level (see figure 1). Our corpus is larger because it includes: 74 papers published after the creation of their inventories; 21 papers that do not mention CE (but are relatable to CE); and, 30 abstracts without available (English) full paper. Figure 11. Proposed classification of scales as sublevels from macro-, meso, and micro-levels.

Transferability of findings to meso-and micro-level
Certain findings from our macro-level review may also be transferable to meso-and micro-level as comprehensive assessments are explored on all system levels. The rationale for the 5-step procedure, monitoring different performances (absolute, efficiency or policy), and scales seem valid for all levels.
Meso-and micro-level have need scales for like-to-like comparisons and benchmarking just as macro-level does. Figure 11 drafts scales for mesoand micro-level, in addition to the macro-level scales introduced earlier in this paper. The additional scales were identified through a quick review of meso-and micro-level studies from the inventories of [23][24][25].
Three meso-level scales are proposed. Eco-parkscale represents (manufacturing) companies in physical proximity to each other and engaging in a symbiosis of mutually beneficial exchange of waste and byproducts. Supply chain-scale is for the symbiosis process applied on an industry sector. Ecosystem-scale is between multiple (or all) sectors in a region as urban industrial symbiosis. Four scales for micro-level are proposed. Business-scale represents all negative and positive impacts a company makes. Product/service-scale is for the impact of a single product or service. Processscale is for the impact of a single (manufacturing) process. Consumer-scale is for the contribution to CE by an individual, or group, through a specific firm, or in general.

Gap on localizing the CE: benchmarking region targets and boundaries
A gap was found that reviewed studies do not explore self-sufficiency, autarky, resource-independence, resilience, or stability (see section 3.3). Only a single study [102] assesses localizing the CE with 'proximity' as an indicator. Indeed, it seems that current studies and policies have not set the right targets to work towards an absolute CE [21,87,103].
Therefore, this paper introduces localizing CE as a new strategy for what other studies [72,88] have called the ideal CE of urban industrial symbiosis.
Local supply-chains should be created to close the loop on the smallest possible scale. Localizing CE sets benchmarks on scales smaller than planetary boundaries. CE targets for full circularity of certain resources and waste can be set on smaller scales; certain CE processes and strategies should be managed within the scale-boundaries of L, M, S and XS-scales. Small scales may not be able to achieve a fully self-sufficient and closed CE, but high levels of CE within scaleboundaries is imaginable. Regions with high selfsufficiency become building-blocks that, when combined, stay within planetary boundaries. This paper argues for localizing CE as a new policy by setting local CE targets and boundaries. Any region should thrive for a full CE, with only the resources, wastes and processes that cannot be managed within the region to be carried by its supraregion. These policies may be more successful than top-down approaches as local policymakers are more agile and more in control to implement a CE, especially when supply-chains localize and create urban industrial symbiosis. Policymakers can learn quickly through assessing, comparing, benchmarking, and sharing with peer-regions; building capacity together.
Another argument for localizing CE is enhancing local resilience and stability (keywords with also only few references). History has shown many examples where inter-regional dependency caused problems: garbage piling up when regions could not export their waste; lack of local manufacturing and repair facilities due to cheaper labour elsewhere; exploitation of human and nature in economically underdeveloped regions; abandoning of towns after mines closed; droughts after diversion of water upstream; and (trade) wars due to local resource scarcity. Localizing the CE supports regional jobs, and resources become resilient from external influences like geopolitical tensions and pandemics.
A barrier against localizing is the convenience of globalized supply-chains and exports of waste. Despite some initial economic benefits, globalized supply-chains create economic dependency and large (unaccounted) social and environmental impacts [104]. Opportunities towards localizing CE and countering the economy of scale are innovations such as photovoltaics, 3D-printing, internet, peer-to-peer networks and urban farming. Table 2 introduces 'resource & waste'-targets, and 'strategy & process'-boundaries for localizing CE. The targets and boundaries are listed on their ideal smallest scale for self-sufficiency. This to manage the CE more locally and reduce problem shifting to other or larger scales. For example, S-scale should consider food-waste composting and energy self-sufficiency (e.g. by solar), but e-waste may be better processed on the L-scale. The corpus did not provide these targets, but they guided the development of these. Assessments can be made to find the benchmark-region that successfully meets the best CE targets. Any type of performance monitoring can be applied on any region-scale, each with their own local benchmarks and CE targets.

Comprehensive assessments of an incomprehensive definition on CE
Through this review it became evident that different conceptual input of CE definitions and frameworks lead to very different assessments. None of the assessments have a comprehensive systems approach. Particularly studies that monitor CE efficiency performance focus too much on resource flows that loop. There are many more strategies and processes that are more impactful. Monitoring efficiency comprehensively should arguably include more processes: (a) Reuse, repair, and remanufacturing; they lack macro-level indicators (also according [86,96]). (b) Share, virtualize and longevity; are also key to CE [5,8], and their impact may actually exceed that of the loop-processes. Longevity is further discussed in section 5.5. (c) Consumer behaviour; according to various conceptual studies, e.g. [8,19], these processes are also part of CE. Yet, only two studies in the corpus address this [83,105]. Only one assessment indicator was found on this: the basket of products. This indicator was introduced by the EU [106] to assess consumption patterns and their impact. (d) Culture, behaviour change and family planning; these impactful processes are not part of the economic system, but about the society as a whole. Maybe this is why the Japanese government uses the term 'Material-Cycle Society' instead [107]. (e) Regeneration, reforestation and carbon-capture; these processes occur outside the economic system but do affect the balance with the natural eco system. They are related to the (economic) resource of space and land-use. (f) Negative processes occurring outside the economic system but affecting the balance (e.g. methane-emissions from melting permafrost).
All these processes for efficiency do significantly contribute to the absolute CE performance of balancing the economic system within the boundaries of the eco system (as visualized in figure 9). And there are also the processes affecting social-equality as part of the TBL, these are nearly completely missing in assessments.
The focus on processes that loop is partially due to a literal interpretation of circular economy, but mainstream CE definitions defined CE as a comprehensive systems approach with optimizing the TBL dimensions. But these definitions are fuzzy on what needs to be (comprehensively) assessed to achieve this, and without problem shifting. This ontological issue is the biggest gap for all reviewed literature.

Gap on slow flows: longevity and accumulated stock
Longevity of resources is a key element of CE [5,8,108] but the corpus has overlooked this; keywords longevity, durable, durability, long-life and life extension have only 192 references combined. Only three studies [52,55,81] evaluated longevity by assessing building and infrastructure stock.
Two 'Circularity gap reports' [52,55] assessed the significance of stocks accumulation in Gigatones (Gt) on XL-scale. Study [52] found that in 2015: 'Accumulated material stocks are almost 10 times larger in total than annual material throughput-890Gt versus 92.8Gt, respectively' . The studies also found that the stock accumulation grew with 36.0Gt in 2015 and 48.0Gt in 2017, which is nearly half of all material input into the economy. Dramatic regional differences were found by a comparison on the L-scale; Europe had 96Gt accumulated in 2015, and grows towards 107Gt in 2050 (as prediction); whereas China had 239Gt accumulated in 2015, and grows to 562Gt in 20 150 (as prediction). Study [52] is also apprehensive about the longevity of China's stocks.
Christis et al [81] assessed the loss of long-term stock on S-scale for Brussels and found it accounts for 10% of carbon-dioxide (equivalent) emissions, and ca. 25% of waste mass. Many studies assessed waste on S-scale, only looked as fast-flowing municipal solid wastes (MSWs).
Longevity and accumulated stock is of significance in a CE, thus require policy and indicators in comprehensive assessments. Longevity of stocks indicates the success of CE strategies such as design, maintenance, renovation, repair, and resell. Accumulated stock (shared) per capita reflects resource efficiency, e.g. roads (in km) per capita. Accumulated stock is a CE opportunity advantaging developed regions.

Methodological contribution
This mixed-method approach is novel, but this paper demonstrated that a meta-analysis, taxonomy, and an integrative review make a powerful team in covering a large corpus. The meta-analysis by keyword salience, presents itself as a laudable objective and systematic method to quickly cover a corpus. With 165 studies, 408 keywords with 350 414 references combined, the meta-analysis is robust against subjective decisions, manual mistakes, and misinterpreting keywords from their context. Its weakness is the rough data aggregation, which overlooks certain patterns or categories. However, this is where the taxonomy showed its strength, e.g. in identifying macro-level scales, and publication numbers per year and region.
The taxonomy was only used when the meta-analysis was insufficient, as a taxonomy is rather labourintensive in classifying a large corpus.
Meta-analysis and taxonomy created strong quantitative evidence (as noted by [27]), but are blind to qualitative findings hidden in the corpus. This is where the qualitative integrative review contributes; sculpting rich details on top of the raw carving from meta-analysis and taxonomy. The integrative review synthesizing and critiquing literature advanced knowledge and research (as noted by [28]). The meta-analysis and taxonomy were a precondition for the integrative review. This review succeeded in all-round grounded rigour for identifying gaps and patterns within the corpus.

Conclusion
'One cannot manage what we do not measure' [15], but many regions are not short on measurements that relate to multiple facets of CE; the challenge now is assessing CE comprehensively, by the right combination and weighted hierarchy of measurements. This will enable management of CE more successfully, for a thriving economy within regional and planetary boundaries for environment, and social justice.
This paper reviewed the heterogeneous corpus of assessing CE at macro-level, to find best practice and what is needed for comprehensive CE assessments. To create grounded rigour, a PRISMAinspired method created a corpus of 165 studies, which was reviewed by the mixed-methods combination of meta-analysis; taxonomy; and integrative review. Findings expand the understanding of comprehensiveness of CE assessments, aiding CE knowledge accumulation across regions and scales.
Synthesizing the body of literature revealed a 5step procedure (see figure 11) for creating assessments with better comparisons, benchmarking and CE targets. The 5-steps include: (a) a clear definition of CE and (b) measurements corresponding to CE dimensions as proxy-indicators. Both (a) and (b) are combined into (c) an assessment framework, that allows (d) comparisons and benchmarking between regions. This results in (e) a reliable prognosis with recommendations and CE targets to improve CE at macrolevel. A pressing problem found from current assessments is the absence of a widely accepted CE definition that clarifies how to comprehensively assess the macro-level. This paper also found three types of monitoring CE performance. Absolute performance monitors resources to set priorities. Efficiency performance monitors CE processes on capacity and optimization. Lastly, policy performance identifies and compares strategies improving CE, arguably this may include cultural aspects.
A policy on localizing the CE seems to be missing. Such policy requires local CE resource-andwaste targets and CE processes managed within scaleboundaries (as drafted in table 2). Scale boundaries are set on five macro-level scales (XL, L, M, S and XS). This classification also helps to compare and benchmark between regions. Comparing and benchmarking is not exclusively for macro-level, therefore scales were also proposed for meso-level (ecosystem, supplychain, and eco-park) and micro-level (business, single-process, object, and consumer).
While this paper did not review micro-and mesolevels studies, it is presumed that the 5-step procedure, three types of performance monitoring, and classifying scales are also applicable at these levels.
The review found different self-proclaimed comprehensive assessment frameworks. Resource assessments cluster and aggregate to assess by metrics, system-boundaries, uses, or emergy. Although their comprehensiveness is sometimes dubious, they create insightful hierarchies of resources. The metaanalysis revealed the ReSOLVE-framework as relatively comprehensive and weighted in monitoring CE processes; whereas R-frameworks seemed rather limited to waste-management.
Many knowledge gaps were still found in the large corpus of macro-level CE assessments. Understudied resources are accumulated stock and nonsolids (e.g. manure, wastewater with micro-plastics, and methane). Understudied CE processes are refuse, virtualize, share, reuse, longevity and regeneration; but the ReSOLVE-framework includes them. There are arguably even more relevant CE strategies, such as consumer behaviour and culture. Measurementdata and therefore indicators seem to be missing for these CE processes. Understudied regions are found outside Europe and China, and at the nano-scale. Regional self-sufficiency, resilience and stability is understudied by lack of CE targets to localize CE, while this would create building blocks towards a global CE.
A limitation of literature reviews is they only expand knowledge through inductive reasoning. Further study beyond this review could extend knowledge and help society with better comprehensive CE assessments. This starts with conceptually redefining CE for regions, and its dimensions translated into a comprehensive assessment framework (steps (a) and (c) in of the 5-step procedure). Then, this study and the new framework could be empirically validated through steps (b)-(e). From here, regional stakeholders can take appropriate actions, set CE targets and iterate to improve CE in the region.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).