Mapping forward-looking mitigation studies at country level

We provide the first survey of the rapidly expanding literature on country-level mitigation pathways using systematic mapping techniques. We build a database of 4691 relevant papers from the Web of Science and Scopus. We analyze their abstracts and metadata using text mining and natural language processing techniques. To discover common topics within the abstracts, we use an innovative and fully reproducible topic modeling approach based on two machine learning models. We find that the number of papers per country is well correlated with current levels of greenhouse gas (GHG) emissions, with few papers for (current) low emitters, notably in Africa. Time horizons of 2030 and 2050 each account for one-third of the papers, with the former actually more frequent in recent years, spurred by interest in the (Intended) Nationally Determined Contributions. Topic modeling analysis of the data set reveals that forward-looking mitigation papers encompass all dimensions of mitigation, save for financial issues, that are lacking. However, energy and to a lesser degree land use, land use change and forestry are very dominant relative to other sectors. Topics are unevenly addressed across countries, reflecting national circumstances and priorities, but also pointing to gaps in the literature. The limited number of forward-looking papers in (currently) low-emitting countries raises questions about the lack of research capacity in support of the construction of domestic climate policies.


Introduction
The Paris Agreement, signed in 2015, emphasizes nationally determined contributions (NDCs) as the building block of global action against climate change, today and over time as countries are expected to ramp up their ambition over time in subsequent NDCs. An increasing number of countries has also communicated long-term low greenhouse gas emission development strategies under Article 4 of the Agreement and/or adopted mid-century mitigation goals.
Although principles exist for mitigation that are general enough to apply everywhere (e.g. decarbonizing electricity, electrifying end-uses, promoting energy efficiency, enhancing carbon sinks), building effective mitigation strategies at country level requires one to take into account local economic, social, technological, institutional and cultural circumstances. This is even more so when mitigation objectives are ambitious. To inform such a process, country-specific analysis is required (Fragkos et al 2021).
While the country-level assessment of ambitious climate objectives have been conducted for many countries, via both individual exercises and multicountry projects, such as the Deep Decarbonization Pathways project (Waisman et al 2019), CD-LINKS (2019), COMMIT (2019) or ENGAGE (www.engageclimate.org/), to our knowledge no comprehensive survey of this literature exists despite its relevance for policy-making. This may be explained by the large number of countries, large number of research teams with diverse backgrounds (energy, macroeconomics, environment, etc) working on national mitigation pathways, and by the lack of institutions that would bring them together. In contrast, the literature on mitigation pathways at global level originates from a limited number of research teams worldwide and benefits from well-developed institutions, such as global mitigation scenario databases hosted by the International Institute for Applied Science Analysis or the Integrated Assessment Modeling Consortium. It has been extensively surveyed, in particular in the IPCC 4th and 5th Assessment Reports.
In this paper, we bridge this gap by providing a comprehensive overview of the literature on mitigation pathways at country level. Specifically, we ask how comprehensive is the geographical coverage of this literature? Up to what time horizons does it consider mitigation strategies? What models do these analyses use? What are the main aspects of mitigation it addresses?
To do so, we harvest forward-looking mitigation papers at country level from the Web of Science (WoS) and Scopus databases, resulting in a data set of 4691 abstracts and other paper metadata. We use language processing techniques to extract additional information from the abstracts, such as country, time horizon of the analysis or name of the model used. Finally, we use topic modeling techniques to identify the main issues discussed in each of the papers in the database. We improve the method proposed by Lamb et al (2019) by reducing subjectivity bias and by optimizing the parameters of the method to maximize the explanatory power of the topics.
Overall, our paper contributes to a growing literature that mobilizes big data and machine learning techniques to analyze the academic literature on climate change (Belter and Seidel 2013, Wang et al 2014, Li and Zhao 2015, Haunschild et al 2016, Aleixandre-Benavent et al 2017, Lamb et al 2019, Callaghan et al 2020.
In addition to our main findings presented below, the data set of papers and related topics produced in this research is of interest on its own as it allows researchers, policymakers and stakeholders to 'zoom in' on particular topics and/or countries of interest to inform policy processes and/or identify research gaps. We strive here to provide the method and results in a clear, transparent and fully reproducible way, with a view to making the results easier to communicate (Minx et al 2017, Donnelly et al 2018, Haddaway and Macura 2018.
The rest of this article is structured as follows: section 2 details our methodology, section 3 presents and discusses our results, and section 4 provides the conclusion.

Database construction
To find mitigation pathway(s) at national level, we search the academic databases WoS and Scopus for papers that meet the following three conditions: (i) include the name of a country in the title 1 , (ii) include 'mitigation' or a synonym in the title, abstract or keywords 2 , and (iii) include a year in the period  in the title, abstract or keywords.
The two selections are then merged into one database, and duplicates are eliminated. Due to differences in the coverage of peer-reviewed journals, the results of the searches from WoS and Scopus differ significantly, with 944 references that appear only in Scopus and 574 only in WoS. The search expressions and the resulting database of 4691 papers obtained on 14 November 2020, can be found in the supplementary material (available online at stacks.iop.org/ERL/ 16/083001/mmedia).
Limiting the search for country names to the title of the reference is based on the observation that papers focusing on national mitigation pathways typically have the name of the country in the title. Conversely, attempts to use search equations with country names in the abstract led to harvesting too many irrelevant papers. Finally, adding a year is critical to restricting the search to papers devoted to future pathways. Without this condition, the vast literature on current mitigation policies, for instance, would also be embarked upon in the search.
While the search string is precise and encompasses all countries, our identification strategy has two shortcomings. First, we focus on papers in English only, although there are relevant papers on forward-looking mitigation at country level in other languages. Second, we search two major databases (WoS and Scopus), while other relevant papers may be indexed elsewhere. Overall, however, we believe that our approach provides an extensive view of the available literature.

Additional treatments
The database is post-treated using the Pandas library (Wes McKinney 2010) of Python software. We search country names, demonyms and acronyms in the title to associate each of the papers to a country. When the title of a paper contains more than one country name (which occurs for 153 papers), one entry is created for each. The resulting extended database contains 4884 rows. We use it to analyze the geographical coverage of our data set.
Two additional parameters are added. First, we search the title, keywords and abstracts of each of the papers for a horizon year in the [2025,2100] ranges. If more than one number is found, we retain the largest one. Second, we search for model names, again duplicating entries if several models are identified.
We use a combined list of models from the comparative review of scenario modeling tools for national pathways to the Sustainable Development Goals (SDGs) by Allen et al (2016), the list of models documented by the IAMC, as well as the generic expressions 'computable general equilibrium (CGE)' and 'integrated assessment model' . The database with one entry for each combination of paper, country and model has 4996 rows.

Topic modeling 2.3.1. Overview
Topic modeling is a machine learning method aimed at discovering common topics within a corpus of documents; here the abstracts of the papers selected above. Specifically, we use the non-negative matrix factorization (NMF) classification method (Lee and Seung 1999). The starting point is to build the socalled term frequency-inverse document frequency (TF-IDF) matrix, in which each row corresponds to a paper and each column to a word, and in which coefficients measure the frequency of a given word in the abstract of a given paper, weighted by the frequency of that particular word in the whole corpus.
The next step is to decompose the TF-IDF matrix, i.e. to search for a combination of matrices W and H so that their product W × H best approximates TF-IDF. The columns of matrix W, as well as the rows of matrix H, can then be interpreted as topics. Matrix H indicates the weight of each word in each topic, while matrix W indicates the weight of each topic in each abstract.
The outcome of the method is sensitive to the number of topics (the number of columns of matrix W and the number of rows of matrix H) as well as to other parameters of the optimization process. Previous studies using the NMF classification method have explored several topics and selected the value based on expert judgment (Lamb et al 2019, Callaghan et al 2020. Here, we reduce the risk of arbitrariness and subjectivity bias by selecting exogenous parameters from a topic coherence measure (O 'Callaghan et al 2015), based on the Word2vec word embedding algorithm (Mikolov et al 2013a(Mikolov et al , 2013b. The following details each step.

Corpus identification
To identify the corpus, abstracts are pre-treated. All characters are put in lower case, punctuation signs, connectors and commonly used words are deleted, and words are grouped according to common radicals. Since the country scope and time horizon of each of the papers are already identified through the search equation, country names and time horizons are deleted. Terms related to mitigation listed in the search equation are also deleted, since by construction of the database each abstract contains at least one of them. Finally, we exclude terms that are either too rare (i.e. that appear in less than 1% of the abstracts) or too frequent (i.e. that appear in more than 95% of the abstracts). The final corpus contains 1300 terms.

TFI-DF matrix construction
We measure the weight of each term using the TFIDF index, defined for each abstract a and each term t as follows (Salton and Buckley 1988): where with tf (a, t) the number of occurrences of term t in abstract a and df (t) the number of abstracts containing term t. The TF-IDF index thus weighs a particular term in a particular abstract if it appears frequently in that abstract but not frequently in the rest of the corpus.

Topic identification
We use the NMF method to identify relevant clusters of words (hereafter topics). The algorithm searches for a set of K topics so that the product of the nonnegative matrices abstract-topic W A×K and topicterms H K×T best approximates TFIDF A×T . The W matrix can be interpreted as the weight of each topic in each abstract, while the H matrix represents the weight of each term in each topic.
Since the selected number of topics is small relative to the total number of abstracts (typically less than 5%), there is no algorithm of polynomial complexity that converges to a unique solution 3 . However, one can iteratively converge to local solutions by solving the optimization problem (3), in which ||.|| Fro and ||.|| 1 are the Frobenius and L 1 norms, respectively, and where α ≥ 0 and 0 ≤ l 1 ≤ 1 are coefficients.
The first term in equation (3) ensures the convergence of the WH product towards TF − IDF, while the second imposes additional constraints on the structure of W and H. The L 1 regularization (second term) favors the presence of null coefficients in the matrices, thereby limiting the number of topics each abstract is related to, and limiting the number of terms each topic contains. The minimization of the L 2 regularization (third term), on the other hand, tends to limit differences across coefficients in the matrices.
We initialize the NMF algorithm using the Non-Negative Double Singular Value Decomposition method (Boutsidis andGallopoulos 2008, Belford et al 2018). To ensure that our results are reproducible, we set the random seed of the algorithm to 1511.

Parameter optimization in the NMF method
The set of topics identified with the NMF method is contingent on the choice of K (number of topics), α (intensity of regularization relative to the optimization criteria) and l 1 (regularization parameter). We thus build a performance measure for each set of topics, and then select the triplet (K,α,l 1 ) that produces the highest ranking set. To our knowledge, this is the first time that the algorithm below has been used to select the best triplet (it has previously been used to select K only (O 'Callaghan et al 2015)).
The index we use is called 'coherence' . It measures how similar the semantic environments of each of the terms that compose a given topic are. The higher the index, the more consistent are the words that compose the topic. The coherence of a set of topics is the average of the coherence of each individual topic.
Following O'Callaghan et al 2015, we produce a vectorial representation of the semantic environment of each term within the corpus of abstracts using the Word2vec word-embedding algorithm (Mikolov et al 2013a(Mikolov et al , 2013b. Word2vec is a two-layer neuronal network that maps words into vectors that account for the semantic environment of the word. Words that share similar contexts are characterized by similar multi-dimensional vectors.
We use the Skip-Gram method to train the neural network. This approach seeks to predict the semantic context of a term. The error is computed based on the corrected prediction of the words surrounding this term. The coherence of each topic coherence is then the mean of the pairwise cosine similarities between the terms that characterize the topic. Precisely, for a topic k, the coherence index TCW2V k is computed as follows: where N is the number of terms that we choose to characterize each topic 4 , wv kj is the vector associated to term j characterizing topic k, wv ki is the vector associated to term i characterizing topic k and similarity(A, B) is the cosine similarity of vectors A and B, defined as: We assign the set of topics resulting from each triplet (K,α,l 1 ) with the mean of the scores obtained for each individual topic.

Relationship across topics
In order to visualize how topics relate to each other (figure A6), we use LDAvis (Sievert and Shirley 2014), a system initially developed to explore topic-term relationships in a fitted latent Dirichlet allocation model. The intertopic distance is based on the Jensen-Shannon divergence calculated from the H matrix coefficients characterizing the topic-terms relationships. Principal component analysis then projects the set of intertopic distances onto two dimensions. In the online supplementary material, interactive visualization is available and represents the individual terms that are most useful for interpreting each topic. In particular, it enables one to look at the corpus-wide frequency of a given term as well as the topic-specific frequency of the term.

Relationship between abstracts and topics
The W matrix links topics to abstracts. However, it has too many non-zero coefficients, and a threshold is required to ascribe a topic to an abstract. This, in turn, requires that the weights of each topic in each abstract be comparable across abstracts.
We normalize the W matrix so that the sum of the coefficients of each row is equal to one. In this way, each line of the matrix can be interpreted as a share of each topic in a given abstract. To do so, we transform each coefficient in the W matrix as: We then ascribe topic k to abstract a if W a k * > 0.02, as per Lamb et al (2019).
To check how relevant the resulting mapping is, we build another mapping based on titles. Specifically, we ascribe topic k to abstract a if W * ak > 0.02, and if at least one of the five terms best characterizing topic k appears in the title of the paper. Figure A5 presents the number of papers per topic in each mapping. As the figure illustrates, these distributions are similar. The one on the bottom is scaled down from the one on top. Since the presence of a word characterizing a topic in the title of a paper is a strong indication that the paper is indeed related to that particular topic, the comparison between the two mappings is a good indication that our initial mapping is relevant.

Papers are distributed in proportion to countries' greenhouse gas (GHG) emissions
Overall, 136 countries (plus the European Union) appear in the database (figure 1). However, the geographical distribution of papers is particularly skewed. China accounts for 24.3% of all papers. A distant second is the US (9.0%), followed by the UK (6.0%), the EU as a region (5.3%) (this figure excludes papers related to individual EU Member States), and India (4.9%). Region-wise 5 , almost half of the papers (46.5%) focus on Asia, a little more than a quarter on Europe (28.4%) and a sixth on the Americas (17.3%, of which 7.0% on Latin America and 10.3% on North America). Each of the other regions accounts for less than 5% of the papers. Africa, in particular, is very poorly represented (4.4%), with all but seven countries in the region with less than ten papers, and almost half with no paper at all.
The representation of each country in the database appears to be well correlated with its GHG emissions (figure 2). This is not surprising since the larger the problem, the more likely it is to attract the attention of the (domestic and foreign) research community, either suo motu or at the request of governments or other interested parties. A prominent exception is the UK, which features much more frequently in the database than its GHG emissions would suggest. This may translate to the strength of the UK research community on mitigation, and/or the fact that with the adoption of the Climate Change Act in 2008, the UK has a longer history of national climate policies than most high-emission countries.
Another exception is Russia, with significantly fewer papers than its share of emissions would suggest. This might reflect the fact that our search is confined to papers written in English. It might also point to a research community that has invested more in other priorities than mitigation. Amongcountries with medium-or low-emissions, OECD countries, particularly in Europe (e.g. Finland, Switzerland, Sweden) have more papers in their database than their share of GHG emissions would suggest. Developing countries, on the other hand, tend to be closer to the line or below.
The tail of the distribution is also relevant for policy making. Of a total of 197 Parties to the UN Framework Convention on Climate Change, 143 have less than ten papers in the database, 127 fewer than five and 65 have do not have any. In their survey of urban climate mitigation case studies, Lamb et al (2019) similarly find a very uneven distribution of papers by country. While it is difficult to determine a threshold below which the number of forwardlooking publications on mitigation would be 'insufficient' to inform policies, ten papers or fewer (to be compared with 39 major topics in the database, see below) leaves little chance that even the different sectoral aspects of mitigation are adequately covered. Policymakers and stakeholders in Africa, in particular, have for the most part scant scientific literature to rely on, despite rapidly increasing emissions. Informing strategies to limit growth in GHG emissions (and ultimately start reducing them) while continuing with other development goals requires a major shift in the focus of research towards this continent.

Paris Agreement has spurred increased attention to the 2030 time horizon
The distribution of the time horizons of papers ( figure 3(b)) presents two very clear peaks in 2030 and 2050, respectively, each accounting for 34% of all papers. Only 14% of all papers have a time horizon beyond 2050, a major difference with the literature on mitigation at the global level, in which 2100 is the norm. This translates to a difference in research questions. Forward-looking mitigation studies at global level are typically conducted to assess mitigation scenarios against long-term temperature goals, whereas forward-looking studies at national level typically have the objective of assessing more detailed policy packages. For this purpose, 2050 is already a long time horizon. This hypothesis is further supported by the fact that 2014 also marks an inflexion in the distribution of time horizons across papers. As can be seen in figure 3(a), the share of papers with a time horizon Finally, figure 3(b) shows that the distribution of time horizons differs by region. Europe and North America represent more than 62% of the literature with a 2050 time horizon, against 25% of the literature with a 2030 time horizon. Conversely, Asia represents 68% of the literature with a 2030 time horizon, against 40% up to 2050. This suggests that research in Europe and North America is already focused on mid-century time horizons, consistent with the midcentury mitigation strategies that several European countries and the EU have adopted. However the focus in Asia would be more on the conditions under which NDCs can be achieved by 2030. If this explanation is correct, then we should soon see an increase in the share of papers with regard to 2050 and 2060 time horizons in Asia following the recent announcement of the long-term mitigation objective by China. Table 1 presents the outcome of the topic modeling analysis described in Methods. The topics are ranked by numbers of papers attached (column T0.02), and characterized by their five most relevant words (column Terms) 8 . The 'title' attached to each topic (column Topic) is our work.

Studies offer a comprehensive but uneven coverage of major mitigation issues
Almost all of the papers in the database (4687 out of 4691) are related to topic No. 1, characterized by the words 'policy-develop-econom-countriuse' . This is not surprising since papers on mitigation scenarios at country level typically discuss policy implications, including in the abstract. More interesting is the fact that the corpus is then split nearly in half between papers related to topic No. 2 (Climate Change) and those related to topic No. 3 (Energy Efficiency). The two ensembles are largely disjointed, as can be seen from the mapping of the strength of the pairwise combinations of topics (figure A7). Papers associated with topic No. 2 (Climate Change) tend to be also associated with topics, such as Drought, Flood, Water, Crop Yield, Forest, Land Use, Agriculture or Air Pollution. However papers associated with topic No. 3 (Energy Efficiency) tend to be associated with topics, such as Hydrogen, Steel/Iron, Nuclear, Peak, Oil, CCS, Wind/Solar or Buildings. The other topics can be organized into five groups: (i) methods (Scenarios and Systems), (ii) policies (e.g. Costs or Targets/INDC), 8 Topics are characterized by word stems rather than full words. For example, the word stem corresponding to ≪country≫ or ≪ countries ≫ is 'countri' . We use the stemming algorithm from the library stemming.porter2 (https://pypi.org/project/stemming/ 1.0/). (iii) sectors; (iv) air pollution; and (v) climate change impacts (Drought, Flood and Crop Yield). The latter are not all primarily about mitigation, as the search equation also picks forward-looking impact assessment or adaptation study at national level that have in the abstract the word 'mitigation' or a demonym.
Using the outline of the IPCC Working Group III 6th Assessment Report as a rough mapping of the topics associated with mitigation (table 1, column 5), one can see that the forward-looking mitigation papers at national level cover all IPCC WGIII AR6 sectoral chapters (6-11) as well as issues related to demand (5), policies (13) and innovation (16). The absence of international policies (Chapter 14) is understandable since the search equation focuses on mitigation at national level. The absence of a topic related to finance (Chapter 15), on the other hand, confirms anecdotal evidence that few forward-looking national mitigation pathways have been analyzed along that line so far. Finally, the lack of a standalone topic dedicated to SDGs (Chapter 17) may be related to the fact that if individual SDGs are discussed in the abstracts, it may be in a diffuse way that does not get picked up in a topic (except for Air Pollution). Among the sectors that are represented, there is a considerable imbalance: energy is the one with the largest number  %) is the number of country studies associated to the topic divided by the total number of country studies. Since individual papers can be associated to several topics, the sum of the topic representations is not equal to 100. The Policy-Devlpt-Eco topic is not included, since it is present is almost every paper. Scenario and System topics are also not included since they refer to the method of the paper rather than to the issues that the paper addresses. of related papers (27%) followed by land use, land-use change and forestry (LULUCF) (9%), while the other sectors are far less represented. Although the attribution of topics to particular sectors may be debatable in some cases (for example, bioenergy could also be related to LULUCF), and though sector-specific information may also be present in the body of the paper, the observation of an imbalance between sectors appears robust.

Topics reflect country circumstances
The distribution of countries for individual topics mostly reflects the overall distribution of countries in the database (see figures A3 and A4). At one end of the spectrum, China has the largest number of papers for all topics except Heat Pump (preceded by the UK), Nuclear (preceded by Japan, the UK and South Korea) and Hydrogen (preceded by Germany, Japan and the UK). At the other end of the scale, African countries appear only once in the top five for a topic (Ethiopia for Drought). There are, however, differences across topics. Forward-looking mitigation studies of industrial sectors (Cement or Steel-Iron) have been conducted predominantly for China, while the distribution of papers across countries is much more balanced for topics, such as Renewable Energy or Buildings. The imbalance in research across countries in the Urban topic is particularly surprising since urban development issues are not confined to China. It is, however, consistent with the finding of Lamb et al (2019) that urban case studies in China overwhelmingly dominate the literature. Figure 4 maps the distribution of topics for the 55 most represented countries in the database. The number in each cell is the number of papers devoted to country x and related to topic k. The shade of the cell indicates the share of the topic in the total number of papers devoted to the country. Reading the figure vertically provides a view of the relative importance of a given topic across countries.
Patterns emerge. Some topics appear in 10% or more of the papers in nearly all 55 countries, such as Energy Efficiency, Electricity or Power. Others stand out only in a limited set of countries, such as Oil in Saudi Arabia, Kuwait, Canada, Malaysia, Indonesia, Ecuador, Mexico and Austria, Coal in China, India, Australia, Malaysia, Poland, South Africa, Vietnam, Chile and Czechia or Nuclear in Japan, Korea, France and Romania. It is not surprising that topics that are related to country-specific circumstances (e.g. fossil fuel endowments or share of nuclear in electricity mix) appear in a smaller set of countries than topics that relate to broadly shared elements of mitigation (e.g. the electricity grid or the power sector). However, the list of countries where 'specialized' topics appear, suggests gaps in the literature. For example, the Oil topic does not stand out in major oil exporting countries such as Nigeria, Russia or Norway. Regarding the 'policy' topics, Costs and Target/INDC appear evenly distributed across countries. Permit market is often present, notably in papers devoted to the EU. On the other hand, Tax is poorly represented (five countries present this topic in more than 10% of related publications), reflecting at least a higher degree of attention in the academic literature to the former relative to the latter.
Reading the figure horizontally provides a country-by-country snapshot of the issues identified by the academic literature as most important in the context of future mitigation. Some countries have balanced literature that covers nearly every topic, while others have much more 'specialized' literature. For instance, literature on Indonesia is balanced, with three major topics (Energy Efficiency, Target/INDC and Land Use) plus eight other topics including power generation, forest and oil. The literature on Poland, on the other hand, is more focused on Power, Costs, Renewable Energies and Coal. Although the former are mostly countries with a large number of papers attached (China, the U.S., the EU) and the latter are by construction mostly countries with a smaller number of papers, the relationship does not necessarily hold everywhere. Portugal or Ireland, for example, have less yet more balanced literature than Japan (with a higher than average number of papers devoted to Nuclear and Hydrogen) or Brazil (Forest, Land Use, Agriculture).
Patterns of countries also emerge, based on natural resource endowments (e.g. Brazil, Canada, Indonesia, Finland, New Zealand, Norway and Ghana all having higher-than-average papers on Forest and LULUCF-related topics), technology (e.g. Japan, Germany, Italy, France, Denmark and Norway on Hydrogen) or specific policies (e.g. Tax in South Africa and Switzerland). Forward-looking studies related to the impacts of climate change are particularly frequent (in relative terms) in some countries, mostly in the global south (e.g. Ethiopia, Pakistan, Bangladesh and Nigeria for Drought). Finally, for each country, it is also interesting to examine topics that are not addressed. Some may just be less relevant in that particular context (e.g. Coal in France). Others may point to gaps in the literature. For instance, one may argue that given their importance for the French 2050 net zero target, Forest and Bioenergy are currently underrepresented in the literature on France.

Models are mainly identified in studies devoted to Asia
Finally, we attempt to analyze the methods used in the papers to study mitigation at country level. This is not easy given the limited amount of information present in the metadata. We focus on models, checking metadata against a database of 80 scenario modeling tools for national pathways to the SDGs (Allen et al 2016) and the list of 48 models documented by the IAMC. We identify the model names in only 16% of the abstracts (734). Compared with the country coverage of the overall data set (figure 1), Asia is even more represented (60.3%) in this subset. For example, Thailand is three times more represented than in the general database (6.2% against 2.2%) (see figure F8). At the other end of the spectrum, Africa is scarcely present (4.7%) with only 13 countries represented. Model-wise, the CGE model is the most common category of models in the corpus. The three individual models that dominate, LEAP, TIMES and MARKAL are all bottom-up. They are highly used for Asia and Europe (figure A9). Unsurprisingly, these three models are mainly used for energy-related questions (top three for topics Energy Efficiency, Electricity, Power, Fuel, Transport, Vehicle, Renewable Energies) (figure A10). Finally, it is interesting to note that the Japanese AIM model is present in 62 publications (of which 59 are in Asia, given its many regional spinoffs AIM Korea, AIM Vietnam, etc), illustrating the importance of regional clusters.
However, these findings are limited to an arguably small sample of papers that name their model (or model type) in the abstract and whose models are state-of-the-art tools in our reference list. The term 'model' is actually present in 52% (2446) abstracts and characterizes mainly, as well as the terms 'scenario' , 'bau' , 'refer' and 'three' , the topic Scenario. This topic is represented in 30% (1428) publications from the database and associated to the 'Method' category as it characterizes papers detailing the methodology in the abstract. To better identify the modeling tools used in these studies, a deeper analysis of the publications based on the full text is needed. Although limited to a small sample of papers, these findings nonetheless emphasize again the inequalities between countries.

Conclusion
In this paper, we provide the first mapping of the forward-looking mitigation literature at country level, using systematic mapping techniques. We find the number of papers per country is well correlated with current levels of GHG emissions, with few papers for (current) low emitters. Time horizons of 2030 and 2050 each account for one-third of the papers, with the former more frequent in recent years, spurred by interest in the (I)NDCs. Topic modeling analysis of the data set reveals that forward-looking mitigation papers encompass all dimensions of mitigation, save for financial issues, which are lacking. However, energy and to a lesser degree LULUCF are very dominant relative to other sectors. These topics are unevenly addressed across countries, reflecting national circumstances and priorities, but also pointing to gaps in the literature.
From a methodological point of view, this paper builds upon and improves on Lamb et al (2019) by providing a systematic way to maximize the accuracy of topic modeling. It also illustrates how topic modeling can complement traditional methods of evidence synthesis. Precisely, most systematic reviews are based on a search query that yields thousands of publications. These are then screened to set irrelevant papers aside and scale down the number of papers to a manageable level. Here, topic modeling is used to aid the screening process and provide an overview of the publications identified along all the steps of the database construction.
This paper has three main limitations. First, the term mitigation (or its demonyms) that we use in the search equation harvests too broad a set of papers, since papers about impacts and adaptation to climate change may still refer to mitigation in the abstract. Deep interactions between mitigation and adaptation make this limitation difficult to overcome. Next, despite instructions by journals, abstracts remain written in very different ways across papers. For the purpose of textual analysis, abstracts that are as close as possible to the method and key findings of the paper are preferable, though that may come at the expense of readability. General sentences providing context about climate change may be easily recognizable as such in a full paper (of which they would only represent a tiny fraction), whereas in an abstract they may be confused with a substantive result of the paper. The ubiquitousness of the Climate Change topic is a demonstration of that risk. Third, as all analysis is based on metadata (abstract, title and keywords), we may miss relevant material that does not make it to the abstract. For example, we cannot rule out that non-energy sectors (e.g. industry or transport) are discussed more frequently in the body of the paper than the abstract suggests.
Our attempt to survey the methods used for conducting forward-looking mitigation studies is limited by the fact that detailed methods, let alone model names, are not systematically presented in the abstract. An overview of national mitigation models, similar to the overview of global mitigation models supported by the Integrated Assessment Modeling Consortium, would be helpful to complement this first attempt.
Finally, our paper has policy implications, as forward-looking mitigation studies typically aim at informing decisions, notably in the context of the Paris Agreement. Where we find such papers scarce, policymakers and stakeholders do not benefit from this source of insight. This is all the more regrettable as these countries, typically with low emissions at present, may still have options to avoid getting into high emission paths. Possible explanations for our findings include a lack of domestic research capacity, lack of data or lack of interest or incentive for foreign research teams to work on other national contexts. In any case, our paper adds quantitative evidence to existing qualitative analysis of the capacity to prepare forward-looking climate policies (e.g. UNFCCC (2019)). In addition, it also provides a basis to further explore the reasons fordiscrepancies across countries.

Data availability statement
The data that support the findings of this study are openly available at https://github.com/ClaireLepault/ Mapping-country-mitig-pathways, with the exception of the abstracts from Scopus and Web of Science, which can-not be published for copyright reasons. However, the code and explanation are fully provided to reproduce the analysis and obtain complete databases. In particular, a tutorial on the systematic search on Scopus and Web of Science at https://github.com/ClaireLepault/systematicsearch-wos-scopus shows how to download the initial metadata databases from WoS and Scopus. explained and clearly reproducible using Python and R at https://github.com/ClaireLepault/Mappingcountry-mitig-pathways.