Data, concepts and methods for large‐n comparative climate change adaptation policy research: A systematic literature review

Climate change adaptation research is dominated by in‐depth, qualitative, single‐ or small‐n case studies that have resulted in rich and in‐depth understanding on adaptation processes and decision making in specific locations. Recently, the number of comparative adaptation policy cases has increased, focusing on examining, describing, and/or explaining how countries, regions, and vulnerable groups are adapting across a larger sample of contexts and over time. There are, however, critical empirical, conceptual and methodological choices and challenges for comparative adaptation research. This article systematically captures and assesses the current state of larger‐n (n ≥ 20 cases) comparative adaptation policy literature. We systematically analyze 72 peer‐reviewed articles to identify the key choices and challenges authors face when conducting their research. We find among others that almost all studies use nonprobability sampling methods, few existing comparative adaptation datasets exist, most studies use easy accessible data which might not be most appropriate for the research question, many struggle to disentangle rhetoric from reality in adaptation, and very few studies engage in critical reflection of their conceptual, data and methodological choices and the implications for their findings. We conclude that efforts to increase data availability and use of more rigorous methodologies are necessary to advance comparative adaptation research.


| SYSTEMATIC REVIEW METHODOLOGY
Systematic reviews have gained considerable attention in the environmental change literature in recent years . In contrast to traditional literature reviews, a systematic review combines a systematic and transparent data collection process with rigorous analysis (Gough, Oliver, & Thomas, 2012;Petticrew & Roberts, 2006). In this article, we use a systematic approach for collecting data on the conceptual, methodological, and empirical challenges articles report, followed by a qualitative approach to analyze the data collected that combines inductive and deductive methods to capture the richness of what is being reported (Dixon-Woods, Agarwal, Jones, Young, & Sutton, 2005). This section describes our review methods, which follow the main steps of the PRISMA protocol used frequently for systematic review (Moher et al., 2009), see Figure 1.

| Step 1: Data collection
To ensure a broad scope and to capture all relevant articles that fit within the frame of this study, we conducted initial scoping of the initial literature to identify appropriate search terms. The following search string was then generated: (climate change adapt*) AND ("progress"OR"change"OR"compar*) AND ("polic"OR"govern*). The search query consists of three parts. We used the first part of the search string to identify articles focusing on intentional policy efforts on climate change adaptation only. Following Dupuis and Biesbroek (2013), we argue that adaptation is "The process leading to the production of outputs in forms of activities and decisions taken by purposeful public and private actors at different administrative levels and in different sectors, which deals intentionally with climate change impacts, and whose outcomes attempt to substantially impact actor groups, sectors, or geographical areas that are vulnerable to climate change" (p. 1471). As such, our framing of adaptation is rather narrow as per other systematic adaptation literature reviews (Biesbroek, Klostermann, Termeer, & Kabat, 2013;

Analysis categories
Scopus search "climate change adaptation" AND "Compar*"; check of abstracts for relevant search terms and synonyms (September 2017) Scopus and WoS search in keywords and abstract (climate change adapt*) AND (" progress"OR"change"OR"compar*) AND ("polic" OR" govern*) Merge and remove overlap. (November 2017/January 2018) Type and content: excluding abstracts without explicit reference to comparative climate change adaptation.
Forward checking: others that have cited study (scholar google) (n=6) Backward checking: based on reference list in included article (n=3) Literature: English, empirical study, explicit methods, climate change adaptation, policy orientated Deductive: general characteristics, sample size, level of analysis, geographical area, study design, sampling frame, type of comparison, data sources Inductive: Subthemes in coding categories 'Journal' , 'Sector, 'Limitations' and 'Future' Excluding articles with no comparative research objective(s)/aim(s), not empirical articles, not on climate change adaptation (e.g. mitigation, disaster risk reduction, hazards). Excluding empirical articles not ≥ 20 cases and not geographical scale (i.e. events, perception, behaviour) Scopus ( Vink, Dewulf, & Termeer, 2013). Adopting this definition meant excluding articles not framed as adaptation but which reduce vulnerability such as the related concepts and approaches of disaster risk reduction, resilience and sustainable development. The second part of the search string was used to identify comparative articles that include either longitudinal ("progress" OR "change") or cross-sectional studies (compar*) in different parts of the world. Finally, "polic*" and "govern*" were included as key search terms to capture our interest in policy-orientated studies.
The two largest online databases, Elsevier Scopus and Thomson Reuters Web of Science Core Collections, were selected to cover both topical and nontopical journals. Including Scopus and Web of Science in our analysis corrects for possible European or Northern-American bias in inclusion of journals. Articles were selected for the time period January 2010 to January 2018 as previous studies suggest that few large-n comparative studies on adaptation were conducted before 2010 (Berrang-Ford et al., 2014).

| Step 2: Eligibility and exclusion
Inclusion was limited to English-language scientific articles and empirical articles only by using the search filters in both online databases. Conceptual and review articles were excluded as we are particularly interested in the way comparative policy research is conducted. We did not restrict by sectoral focus nor field of research to ensure a breadth of the articles included in this review. The Scopus database search was limited to "abstract only" to ensure articles were identified based on content rather than buzz words in the title or keyword list. The search in Web of Science was implemented using "topics." The Boolean search was implemented in November 2017 and updated in January 2018 to capture the latest articles. After removing duplicates (562), a final set of 2,347 eligible articles was compiled. The title keywords and abstract information were exported to Endnote X7.
Next, we refined our selection of included articles manually, which is a useful way to ensure breadth when relevant literature is difficult to find (Dixon-Woods et al., 2005). The abstracts of all articles were read and assessed on whether or not they had a comparative adaptation component. This allowed progressive focusing and to exclude a large sample of nonrelevant articles. The full text of the remaining 476 articles were downloaded and screened for the following inclusion criteria: (a) explicit inclusion of a comparative policy or governance objective; (b) climate change adaptation was an explicit focus of the article. Articles that covered both mitigation and adaptation were included, as long as adaptation was substantively considered; (c) at least some reference to data collection methods, methods for analysis, or size of the sample; (d) geographical or political scale (e.g., national, federal, river basin, province, county, municipality, city) as unit of analysis. Behavioral and public perception studies on climate change risk and adaptation, for example, were excluded from the sample; (e) included at least 20 cases in the main analysis. Although 20 is not a "large-n" in many fields of research, we follow Landman and Carvalho (2013) and classify comparative policy studies into three meta-types: single-n cases that aim to make larger inferences beyond the single case (n = 1); few case comparison (n = <20 cases); and many case or large-n, or variable-oriented case comparisons (n = ≥20 cases).
Applying these criteria stepwise resulted in a set of 63 articles. To ensure we did not miss relevant articles, we used forward and backward reference checking (Gough et al., 2012). Forward reference checking, or chain searching, is the process of identifying and examining articles that refer to the articles in our sample. We made use of Scholar Google to capture relevant articles based on their title. Using backward reference checking we identified articles that were included in the reference lists of the 63 articles included in our sample. Using reference checking we included another nine articles (see SUPPL Part III), which brings the final set of articles included in this study to 72.

| Step 3: Deductive and inductive coding
To analyze the sample of articles, we developed a coding scheme and data extraction table to synthesize the literature. The main categories included: "Descriptive information" (year, journal, scale, location), "Study Design" (sampling frame, study design, data sources, type of comparison), "Limitations" and "Future steps." We used classifications for certain codes, leaving other codes open given the broad scope of possible answers (see SUPPL 1 Part I). The code book was piloted using six randomly selected articles that were coded by two of the authors. This resulted in minor adjustments of the code book to further increase consistency in the coding process. SUPPL 1 (Part II) provides the codebook developed for this article. After all the articles were coded, several codes were merged to create new higher order coding that was included in the final codebook.
The deductive information from the individual articles was extracted using a data extraction table, and excerpts of text were included or synthesized for the categories "Limitations" and "Future steps," see SUPPL 1 (Part IV and Part V). Using Atlas. ti 7, we conducted thematic analysis of the empirical material collected on the categories "Limitations" and "Future Steps." We inductively clustered the main findings based on recurring topics, focusing particularly on conceptual, empirical and methodological limitations and next steps indicated by the authors. The codes were assigned only in cases when the authors of the article made explicit reference to a certain coding category. No codes were assigned if we had to make strong inferences from the text. This means our findings present only what the literature reports as the main empirical, conceptual and methodological issues, rather than our interpretation. The next section presents our interpretive synthesis of the literature.

| RESULTS
Our database of 72 articles shows that comparative studies on climate change adaptation policy have increased from an average of five articles in 2010 to~12 papers per year since 2013. This parallels trends found for adaptation research more generally. However, the number of large-n comparative studies remains low in the context of the rapidly growing body of adaptation literature. Table 1 presents the descriptive statistics of our review. It shows that half of the studies focus on the urban-or local-level (36/72), followed by national level (17/72), and project level studies (12/72). Most studies were found to have fewer than 50 cases in their sample (29/72). The urban and project level are responsible for all studies that include more than 200 cases (15/72). Geographically, cases are centred around high income regions, with Europe (21/72), Northern America (20/72), and global studies (19/72) dominating the comparative adaptation literature. Low-and middle-income countries are underrepresented in our sample. This might be because of the way in which we sampled the articles (i.e.,focusing on intentional Cross-sectional 68 94 *The number of regions is larger than the number of articles included in our sample as some articles cover multiple regions or data sources. **Multiple levels; public sector organisations. adaptation actions, thereby excluding autonomous adaptation, disaster risk reduction and related approaches) or because adaptation is simply less frequently studied and reported on in scientific studies in these locations. We find that more than half of studies (46/72) did not take a specific sectoral focus, but were more generally interested how cities, regions, countries are developing and implementing adaptation measures. This support the ideas that adaptation is gradually emerging across the globe as a new policy field (Massey & Huitema, 2016).
To get a sense of which scientific fields are most engaged in comparative studies, we categorized the articles using the ISI Web of Knowledge Journal Citations Report classifications. Figure 2 shows that the vast majority of articles are published in the categories of environmental studies and environmental science. Studies on comparative adaptation policy are most frequently found in Global Environmental Change (7), Climatic Change (7), and Mitigation and Adaptation Strategies for Global Change (5). This observation is in line with other studies (Javeline, 2014;Swart et al., 2014) suggesting that adaptation scholarship remains in the environmental domains without extensive debates of adaptation policy in, for example, the policy sciences, law, or economics and more domain-specific journals related to health, built environment, water or nature.

| Research design and type of comparison
The vast majority of studies adopted an explorative (22/72) or descriptive (44/72) research design to explore what kind of adaptation is taking place and which variables could be responsible for it, or describe if adaptation is taking place within a specific context. These studies recognize that adaptation is relatively new, that little is known about certain processes and variables, and therefore require efforts to shed some light on these issue. For example, Castán Broto and Bulkeley (2013)  adaptation and what might explain this. We found very few cases explicitly adopting an explanatory design where an explicit research question is posed and hypothesis formulated with the intent to explain how certain variables influence an observed outcome. One such example is the work of Shi, Chu, and Debats (2015) who aim to explain progress among 156 municipalities in the United States. Although we did not systematically code for the reasons the comparative studies were conducted, it became clear that studies generally aim to move beyond individual cases in order to make more generalizable statements and get a better sense of what is going on across scales and contexts. In doing so, most articles recognize that this ultimately means "sacrificing depth over breadth" Hanger, Haug, Lung, & Bouwer, 2015;Kamperman & Biesbroek, 2017). Paradoxically, many articles recommend that more "ground-truthing" is needed through small-n, qualitative case research to get a better sense of what is happening in reality (Holvoet & Inberg, 2014;Kongsager, Locatelli, & Chazarin, 2016;Massey, Biesbroek, Huitema, & Jordan, 2014;Milman, Bunclark, Conway, & Adger, 2013;Tang, Dai, Fu, & Li, 2013). The studies included in our review are almost exclusively cross-sectional in nature, providing a snapshot of the state of adaptation for a specific period in time. Only two studies were found with explicit longitudinal focus, comparing the same cases over time to assess progress: Lesnikowski et al. (2016) analyze changes in reported climate change adaptation actions in the National Communications 5 and 6 for Annex I nations and Kamperman and Biesbroek (2017) analyze how adaptation is integrated in regional water management plans in the Netherlands for three time periods (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020)(2021). Interestingly, many cross-sectional studies recognize the limitations of their cross-sectional design and propose follow-up studies to go beyond snapshots and to assess changes over time Feola & Nunes, 2014;Lesnikowski et al., 2011;Preston, Westaway, & Yuen, 2011). However, one of the key reported challenge for moving towards longitudinal studies is that only for a few instances data are available that allows for longitudinal analysis (Lesnikowski et al., 2011;Preston et al., 2011), see also Section 3.4. The cross-sectional studies are often already collecting new data through either content analysis (53/72) or surveys (20/72), see Table 1, which is a costly and time consuming process. Continuing collecting data for longitudinal studies requires even more data collection activities. Since several studies are the result of specific project funding, with often clear project duration and deliverables, sequential and long-term data collection activities are often not part of the research agenda.

| Sampling of cases
Our findings show that our sample is dominated by studies that adopt a nonprobability sampling method (68/72), choosing mostly the relevant cases (purposive sampling) or cases close to hand (convenience sampling). Such sampling methods mirror the explorative and descriptive research designs discussed above, without incurring the significant cost or time required to select a probability-based representative or random sample. Such approaches, however, are not well suited to assessment of hypotheses, inferential statistics, or inferring results and insights beyond the study sample. Despite this, many studies used convenience sampling and proposed general insights from their results, yet did not address or represent discussion of sample bias and validity of their results.
For studies seeking to use the results from a sample to infer general insights beyond that sample, selecting and justifying the method of sampling is crucial in determining the internal and external validity of the results. It is therefore important that articles are clear about the appropriateness of inferring trends from their sample to generalized insights and discuss whether their results are representative of some larger population, what biases might arise and to what extent their results are generalizable. In several instances, the sample was drawn from existing networks to allow for easy access of data; for example, Shi et al. (2015) use the ICLEI's network of 1,200 municipalities in 86 countries as a sampling frame and extract data on adaptation for 156 cities in the United States to analyze the influence of 13 indicators for climate change adaptation planning. Although insightful, such sampling methods are designed to generate insights about adaptation in the sampled cities only. Inferring insights to cities globally or in general requires justification of the extent to which the sampled cities are representative of global cities in general . Various authors have noted that ease of access to a comprehensive dataset facilitates testing (Stadelmann, Persson, Ratajczak-Juszko, & Michaelowa, 2014). However, several studies note that their sample is biased towards larger and pro-environment municipalities, complicating and biasing the upscaling of the research findings. Other studies confirm that memberships in these networks, for example, significantly influences the likeliness that adaptation planning is taking place (Reckien, Flacke, Olazabal, & Heidrich, 2015).

| Data sources for comparative adaptation policy research
We find that content analysis of systematically collected and nonsystematically collected policy documents (53/72), surveys among policy experts (20/72), and existing databases such as project repositories (19/72) were the three main methods of collecting primary data and creating new datasets for comparative research purposes. Typically, collecting and coding policy documents was the preferred option for datasets of up to roughly 100 cases. It becomes too time consuming and resource intensive to systematically collect and analyze the data from documents when the number of cases further increases. In these instances, survey instruments were most frequently used and is the most preferred option for city level studies (14/36) where sample sizes tend to be higher. Existing project databases were used for both low number of cases but also for larger analyses. For example, Robinson and Dornan (2017) use the bilateral finance data from the OECD DAC database which contains 30,794 projects to analyze financial flows for climate change adaptation in small island states. Rather than using manual coding, Robinson and Dornan make use of a multistep machine coding algorithm to cluster and analyze the data.

| Reported limitations
Of the 72 studies included, 14 studies did not report any conceptual, methodological or data limitations. Half of the studies (38/72) did not mention any suggestions for next steps to navigate the possible limitations. In this section we cluster the limitations reported by making a distinction between: (a) the data that are available for analysis and (b) the limitations reported for the analytical approaches using this data, referring to both conceptual and methodological limitations.

| Limitations associated with the data
We already noted that the key challenge reported is access to comparable, consistent, comprehensive and coherent data. Different types of studies have reported different types of limitations associated with the data for the dependent variable.
First, studies using content analysis-that is, systematic coding and analyzing of policy or project documents-frequently reported challenges related to accessing documents. Most studies using content analyses methods recognized some of the ambiguities of coding (Heidrich et al., 2013), but did create a transparent codebook and/or included multiple coders to calculate inter-coding reliability scores to address these issues. However, many of the limitations were associated with the documents used. Overall, existing repositories where relevant documents are collected are scarce or are still under construction. Several studies use the UNFCCC database with National Communications-self-reported progress on mitigation and adaptation for a defined time period, for example. Although these provide a comprehensive overview, they are indicative for the kind of limitations often reported with similar databases. Several studies have noted, for example, the poor reporting requirements provided by the UNFCCC leaving much room for member states to cherry-pick what they want to report . Government documents tend to over report government-driven and planned adaptation, leaving less room for private and autonomous adaptation, thus portraying a skewed picture of the type of adaptation taking place (Fleig, Schmidt, & Tosun, 2017). Burch, Mitchell, Berbes-Blazquez, and Wandel (2017) note that reports in the IDRC database were of significantly varying quality which made analysis and comparison difficult. Moreover, studies note the potential for "greenwashing," or governments relabelling existing policies as new adaptation initiatives, between reporting periods . Successful adaptation efforts are more likely to be reported, and the less successful adaptation efforts are (purposefully) overlooked given the (political) nature of these reports (Bizikova, Parry, Karami, & Echeverria, 2015;. Moreover, successful or best-practice adaptation measures are frequently reported in multiple reports, thereby increasing the chances of "double counting" adaptation actions for multiple time periods . Betzold and Weiler (2017) argue that developing countries specifically might over-report the climate relevance of their aid, in the hope of receiving more aid funding.
Several studies have therefore moved away from focusing on one comprehensive database of project or policy documents, and instead use a more comprehensive way of data collecting, by using web searches in more or less systematic fashion. , for example, develop a comprehensive data collection protocol consisting of a set of steps to systematically go through governmental websites and online search engines to collect as much relevant data as possible. The search stops when a saturation point has been met (no more "new" hits or relevant documents surface). To distribute the search time equally, these studies often use an upper limit of time spent per city for data collection, for example, 2 days maximum. Some authors have moved even further by incorporating expert opinions though surveys, interviews and workshops to get a comprehensive account . This, however, significantly increases the time needed for data collection.
Whether using existing repositories or using comprehensive web searches, authors refer to the challenges of dealing with spelling errors in texts (addressed by using multiple search configurations), ambiguity in phrasing key concepts such as when something constitutes adaptation (addressed by adopting a narrow definition of "planned adaptation," taking either a conservative or progressive approach of including and excluding excerpts), and navigating multiple languages (addressed by selection in eligible cases such as official UN languages or setting thresholds of the minimal number of case or countries that use a language, making use of google translate or similar software, or translators are hired to search and code documents) (Bassett & Shandas, 2010;Fallon & Sullivan, 2014;Lesnikowski et al., 2016). Several authors also note that the documents report on past activities and given the time to write and approve certain documents, much of the adaptation actions captured are not representing the current adaptation efforts (Lesnikowski et al., 2011).
Second, studies frequently use (online) survey instruments as their main method for collecting data, particularly for city level data. Studies use mostly new surveys, as existing large surveys such as the European Social Survey and other census surveys, do not systematically inquire about climate change adaptation, and if they do, it most frequently refers to public perception on climate change. The articles in this review frequently refer to the challenge of finding appropriate survey respondents (Gurran, Norman, & Hamin, 2013;Lee & Hughes, 2017). Although in large cities there are often dedicated sustainability or climate change officers targeted, many note that medium to small cities have no such dedicated position with in municipal departments, making it difficult to find the relevant population. Given the nature of adaptation, urban planners and environmental officials are often surveyed, although getting access to these respondents is difficult and time consuming (addressed through snow-ball sampling, contacting general contact information from municipalities via phone or email). Perhaps unsurprisingly then that several studies use ICLEI survey data for their analysis. In some cases, politicians such as mayors are sampled as the highest decision making authority in the city. However, as Kalafatis (2017) note, some respondents (city mayors) were not willing to respond to the survey invitation due to the political nature of adaptation, particularly in the United States (Wood et al., 2014). Similar as for content analysis methods, survey respondents are more likely to report adaptation successes than failures, thereby biasing the dataset. Moreover, authors stress the difficulties of specifying complex policy issues in closed survey questions, therefore feeling forced to ask simple questions (Shi et al., 2015). Particularly survey studies with explorative ambitions focusing at city level refer to the poor data availability for the independent variables, particularly when it comes to city level data on impacts vulnerabilities and sociopolitical variables (Tang et al., 2010;Wang, 2013;Wood et al., 2014).
Finally, studies using existing databases, particularly project databases, primarily argue that it is challenging to infer if the original data were intended in the way they are used in the study (Boyd & Juhola, 2015). Such ambiguity makes it challenging for researchers to ensure that the findings correspond to the data collected. Another reported limitation is the limited specificity of the available data, and that not all data are relevant or in the right format for the study.
Clearly all data collection methods have certain limitations and justification for the data and methods used is through identifying the weaknesses of alternatives. Authors who use systematic searches to collect project and policy documents for content analysis, for example, argue that survey instruments provide insights in the perceptions about policy, rather than provide reliable data sources needed to conduct meaningful comparative research. For example, Reckien et al. (2014) argue that "… many studies rely exclusively on self-report measures such as questionnaires and interviewing of city representatives… which might incorporate bias" (p. 333). Contrastingly, authors using survey studies argue that policy documents suffer from lack of specificity and what is reported in documents does not translate well to what is happening on the ground.
The challenge of lacking (sufficient) data means that exploring, describing, or explaining adaptation comparatively is challenging. Several more general points were raised in the literature. First, the absence of evidence on adaptation does not mean the evidence of absence of adaptation. , for example, use systematic web searches and content analysis of how 401 cities globally and find that several large cities, including London and New York are well adapted, and that in many countries in the Global South, there is lack of progress. However, in these contexts many adaptation actions might not be labeled as adaptation or there might be a lack of reporting capacity of those cities. As noted by  "...we are not comparing actions themselves, but rather comparing reporting of action" (p. 55). Second, several authors also note that over the past decade small-n and qualitative studies have advanced our understanding of what adaptation (could) mean in different places and contexts. However, "...not all factors found in the literature are easily transferable, quantifiable or measurable, or available in official statistics for all cities studied," as noted by Reckien et al. (2015, p. 6). This is not only purely a limitation in data availability on adaptation but also a limitation of comparative policy research in general. Third, whilst there is a wealth of data becoming available on project and policy actions, there is very limited, if any, data on policy performance, quality of proposed policy, or policy success/failure (Amundsen et al., 2010). Authors recognize that identifying the key policy instruments is important, but in many cases this has resulted in "bean counting" providing limited value in terms of policy advice. Collecting more specific data on performance, quality, and implementation is therefore a frequently reported future research pathway (Berrang-Ford et al., 2014;Fleig et al., 2017). Fourth, (access to) data is unequally distributed across the globe. Several studies note that they exclude particularly developing countries as they have limited data available. Moreover, data at, for example, the national level is, in general, more easily accessible and comprehensive than data at the city level (Koski & Siulagi, 2016). Finally, the validity of the available data is also challenged by some authors as most of these documents, plans and survey datasets are outdated the moment they become publicly available, particularly given the rapid development of adaptation. In addition, changing contextual conditions, such as new elections or new legislation was reported to potentially influence findings (Boeckmann & Zeeb, 2014;Milman et al., 2013).

| Limitations reported for analytical approaches
Capturing adaptation actions is, however, not merely a data issue. The way in which "adaptation"is conceptualized and operationalized as well as the methods used to analyze the data are of critical importance. Overall, we find far fewer references to these type of limitations compared to limitations about data availability.
We find few studies that mention that "adaptation" as a concept remains rather vague, or that large variation exists in how adaptation is perceived, understood and operationalized and that this impacts the comparative research. In most cases, the researchers are not explicit about what adaptation is (i.e., have not defined it) or make reference to an existing definition mostly provided by the IPCC. When operationalizing their approach, the studies included in the review often assume that whatever is reported by policy documents, survey respondents or project reports as adaptation is adaptation. Several studies acknowledge that this is not necessarily accurate from a conceptual and methodological perspective, but there are very fewif any-alternatives to capture adaptation for comparative purposes (Donner, Kandlikar, & Webber, 2016). Particularly articles where mitigation and adaptation are both studied mention the challenge of clearly defining what adaptation is compared to mitigation as compared to mitigation, adaptation is more difficult to define (Kalafatis, 2018b). Similarly, Donner et al. (2016) note the challenge of discerning adaptation finance from development aid.
Another important topic discussed is the limits of a "policy output approach." Policy output studies generally capture and count the number of adaptation actions reported using certain classifications that allow to compare across cases and over time (Chen, Hellmann, Berrang-Ford, Noble, & Regan, 2018;Lesnikowski et al., 2016). And whilst this might be helpful to capture what governments are doing, several studies note that measuring policy outcomes and impacts to assess policy efficiency and effectiveness are important next steps to go beyond such "bean counting." This would require connecting policy output data with policy impact data. Two things are noted that complicate things here. First, authors note that outcome data are currently lacking, particularly at city level, to conduct such analyses (Wang, 2013). Second, whilst these studies mention the importance of connecting actions to outcomes such as reduced vulnerability and increased adaptive capacity, it is challenging to attribute the effect of policies on the outcome. We found no studies that were able to perform such analysis and no studies were found that provided concrete recommendations of how to connect policy outputs to policy impacts in practice.
The studies included in this review article predominantly focussed on one level of analysis, but recognize the influence of other levels and vice versa to understand adaptation policy. The multilevel dynamics are well recognized in in-depth cases but clearly these theoretical and conceptual considerations are not easily translated into comparative contexts (Hanger et al., 2015;Stadelmann et al., 2014). Moreover, different sociopolitical systems exist in which the multilevel governance dynamics is different, thereby further complicating the analysis. Whilst there were some articles included in this review that focussed on multiple levels (Heidrich et al., 2016;Preston et al., 2011), these authors recognize the limitations of their approach and have called for more in-depth cases to investigate the multilevel governance dimensions of adaptation.
The descriptive articles generally aim to classify types of adaptation measures (Biagini, Bierbaum, Stults, Dobardzic, & McNeeley, 2014;Bizikova et al., 2015), to rank countries, projects or plans , or aim to evaluate progress over time Woodruff & Stults, 2016). This is not a simple task because finding transparent metrics and suitable indicators that capture the essence of what one wants to measure from a theoretical perspective versus what is feasible from a data perspective, is challenging and in some cases impossible. Constructing formulae to standardize, weight and aggregate certain indicators is often a normative process which researchers are generally well aware of. Kamperman and Biesbroek (2017), for example, discuss the influence of weighting different types of adaptation actions for assessing progress made by Dutch Water boards.

| DISCUSSION AND CONCLUSION
This review shows that there is a small but steadily growing body of literature that aims to compare climate change adaptation policy efforts across time and contexts. Although mostly explorative and descriptive in nature, this literature has raised relevant questions that cannot be addressed by adopting the more traditional single or small-n comparative methods. There are a number of insights from the inventory of challenges reported in the reviewed studies that deserve further discussion.
An important reason for conducting this review is the trend of increasing methodological rigor in social science in general and comparative policy research in particular. This is driving adaptation researchers to be more explicit and transparent in all aspects of their research: in the choices when setting up their research design, sampling of cases, defining reliable and valid measurements of the dependent (and independent) variables, ensuring replicability of their research, and reflecting on the limitations that impact the research findings. Whilst some studies have done a poor job in recognizing the research limitations and the consequences this has for interpreting their findings, we find that most studies did discuss to some extent the strength and weaknesses of their work. Some studies perhaps did not discuss in terms of limitations, but were transparent in the choices and justification of those choices. There are significant steps still to be made to further the precision of comparative adaptation policy research, however.
The results of our review show that finding conceptually and methodologically appropriate data was found to be the most critical challenge facing comparative adaptation studies. Comparative policy studies operate in a data-imperfect world, but for adaptation this is particularly pertinent. Our recommendation would therefore be to invest in collecting new adaptation relevant data. When it comes to climate change research in general, significant amounts of (public) money are being spent to support initiatives to systematically collect data on (changes in) the bio-physical system. Whilst the costs are huge, the value of collecting such data is undisputed (Balstad, 2011;Fawcett, Pearce, Ford, & Archer, 2017). This is different for social science research that aims to monitor climate policy and actions across the globe, where there are few examples of systematic, rigorous, transparent, or coordinated efforts. This is surprising as in social sciences more generally, extensive data collection efforts are funded-for example, General Social Survey or European Social Survey-and result in valuable insights (Kolarz et al., 2017). The few comprehensive data collection efforts on climate policy that do exist are mitigation and energy centred rather than orientated towards adaptation, with some notable exemptions such as the Grantham Climate Change Laws of the World and the Climate Change Performance Index. This might be reflective of the fact that adaptation has only recently attracted scientific and policy attention. This is compounded by a limited push from the adaptation research community to fund datadriven research as the community itself is still characterized by small-n qualitative research. Research institutions and national funding agencies invest primarily in short-term research projects that are context specific and deliver immediate results such as tools, guidebooks and context specific recommendations even though the effects of these efforts are not always convincing (Burch et al., 2017;Clar & Steurer, 2018).
What would such data for comparative adaptation research could look like? Clearly creating custom datasets (i.e., data that is specifically targeting adaptation initiatives and actions) is preferred over using readily available data. Specifically, if we want to move from policy output data to more advanced comparative studies that allow us to capture not only the density (i.e., frequency of actions) but also the intensity of policy action (i.e., the quality of those actions), new data needs to be collected on issues like adaptation leadership, budgets, objectives and processes of integration (Ford & King, 2015;Schaffrin, Sewerin, & Seubert, 2015). Some have argued that global reporting efforts such as the National Communications could be strengthened to create such a database, for example, through more strict guidelines and assessment tools . Whilst this would provide a source that is more comprehensive and comparable, significant limitations to self-reported datasets continue to persist. Also, collecting more customized data on adaptation should not be a burden that is placed on the countries, cities or regions that currently have the least data available as these are often developing countries or small cities with limited policy capacity in the first place. Instead, we see an important role for national funding agencies, UN and multilateral organizations such as the World Bank, private funds, and donors to support more comprehensive adaptation data collection efforts.
Such data collection might be considered too labour intensive at first sight, but alternative ways for collecting, processing and analyzing large volumes of data exist and are hardly used in adaptation research. In policy sciences, for example, there are increasing efforts to use automated and real time web scraping tools to collect massive amounts of data for analysis (Blei, 2012;Wilkerson & Casas, 2017). Machine learning algorithms are becoming more popular to filter through huge publicly available datasets (e.g., social media, or legal repositories) in search for relevant data and auto-coding large parts of this data Kirilenko & Stepchenkova, 2014). We found just one study that used these methods (Donner et al., 2016). Although their potential for adaptation studies has not been proven, there is no reason to assume such methods would not work for adaptation research.
Next to data related issues, we identified several other analytical challenges that were frequently reported, some of which are more easily addressed than others. For example, it remains difficult to define a reliable and valid measurement of adaptation that allows for meaningful comparative work and distil adaptation from greenwashing or identify maladaptation. Since it is unlikely that there will be one universal measurement for what adaptation is, making researcher's assumptions explicit is an important first step to reconcile this dependent variable problem and keep learning what constitutes adaptation. Another important and frequently reported challenge is that of policy attribution: can we causally link specific adaptation policy interventions to desired outcomes (e.g., reduced vulnerability or increased adaptive capacity) to test the effectiveness and efficiency of those policy interventions? It will be hard, if not impossible at this stage to address the problem of attribution due to the complexity of adaptation itself, the many cofounding factors that influence the outcome, as well as the limited outcome data on climate impacts, vulnerability, and adaptive capacity. The latter seems to be specifically the case for local level studies where such data is particularly scarce. Rather than trying to causally attribute adaptation policy interventions to outcomes, it might be more productive to ask questions of policy alignment: are the adaptation policy goals aligned to the climate risks? Are the proposed set of policy instruments likely to achieve those goals? What evidence is there to support any (mis)alignment? We argue that such evaluative questions are more realistic and equally informative at this stage. Over time there might be more evidence that allows us to try to attribute adaptation policy interventions to outcomes and ask questions around the efficiency and effectiveness of adaptation policy.
Comparative adaptation policy research has potential to address some of the pertinent questions that have emerged in recent years, for example, under the Paris Agreement and the Agenda 2030 on sustainable development (UNEP, 2017). To answer important big picture questions that have emerged about whether we are adapting fast enough or whether climate investments are distributed in an equitable way, requires methodological rigorous comparative research. Some of these questions may require purely quantitative approaches, whereas others allow for combining qualitative with quantitative and small with large-n studies. In any case, this requires the researchers to address the conceptual, empirical and methodological challenges that comparative adaptation policy research faces.