Comparing open data benchmarks

An understanding of the similar and divergent metrics and methodologies underlying open government data benchmarks can reduce the risks of the potential misinterpretation and misuse of benchmarking outcomes by policymakers, politicians, and researchers. Hence, this study aims to compare the metrics and methodologies used to measure, benchmark, and rank governments ’ progress in open government data initiatives. Using a critical meta-analysis approach, we compare nine benchmarks with reference to meta-data, meta-methods, and meta-theories. This study finds that both existing open government data benchmarks and academic open data progress models use a great variety of metrics and methodologies, although open data impact is not usually measured. While several benchmarks ’ methods have changed over time, and variables measured have been adjusted, we did not identify a similar pattern for academic open data progress models. This study contributes to open data research in three ways: 1) it reveals the strengths and weaknesses of existing open government data benchmarks and academic open data progress models; 2) it reveals that the selected open data benchmarks employ relatively similar measures as the theoretical open data progress models; and 3) it provides an updated overview of the different approaches used to measure open government data initiatives ’ progress. Finally, this study offers two practical contributions: 1) it provides the basis for combining the strengths of benchmarks to create more comprehensive approaches for measuring governments ’ progress in open data initiatives; and 2) it explains why particular countries are ranked in a certain way. This information is essential for governments and researchers to identify and propose effective measures to improve their open data initiatives.


Introduction
Various benchmarks have been developed to compare governments' progress in Open Government Data (OGD) initiatives. Examples of such benchmarks include the Open Data Readiness Assessment (Global Delivery Initiative, 2020; The World Bank Group, findings of our study 1) to better understand how the strengths of benchmarks can be combined to create more comprehensive approaches for measuring governments' progress in open data initiatives; and 2) to understand why particular countries are ranked in a certain way. This information is essential for governments and researchers to identify and propose effective measures to improve their open data initiatives. Ultimately, this should lead to more value creation from open government data, including increased transparency, trust, innovation, and economic growth.

Research background
This section provides background information related to our research's main topics: the benchmarking process, including measurements, benchmarking, benchmarks, and ranking lists (section 2.1) and previous research on the benefits and criticisms of benchmarking (section 2.2). Maheshwari and Janssen (2014) describe benchmarking as part of a process that involves multiple steps (see Fig. 1). This process starts by determining benchmark indicators, i.e., defining or updating the indicators of progress measurements (step 1 in Fig. 1). Measurements may already be available, and they can be used to develop a new or integrated tool to measure progress. Benchmarking indicators are typically quantitative in nature (Rorissa et al., 2011). Schellong (2009) refers to three types of measures: natural, proxy and constructed measures. Natural measures can easily be used in benchmarks since these are already in use, such as the amount of money spent on particular investments for a specific country or organization (idem). Proxy measures can indirectly be connected to the objective of a benchmark, such as the number of broadband connections when measuring the concept of 'information society' (idem). Constructed measures usually combine multiple measures when there is no clear understanding of how a concept should be measured. Constructed measures combine various achievement levels and assign values to each of them to eventually derive a final score (Schellong, 2009). An example of a constructed measure is the measurement of 'citizen-centric public service delivery' using various indicators related to the quality of public services from the perspective of citizens and public administrators (World Bank Group, 2018).

The benchmarking process
After defining progress measurement indicators, the measurement itself is performed (step 2 in Fig. 1) by collecting data from various sources, such as social media data, research questionnaires, and organizational reports (Maheshwari and Janssen, 2014). The activity of 'measuring' is the basis of each benchmarking process. In the measurement phase, researchers, citizens, governments, and other actors collect data about various aspects of the measured phenomenon. A concrete example of a measuring activity in the context of open government data is the collection of data about the number of datasets on various topics openly shared by governments through open data platforms all over the world, such as data provided through the American, Australian, South African, Brazilian, Chinese and French open government data portals. Benchmarking of countries' open government data initiatives could then encompass a comparison of the number of datasets provided per topic, an analysis to interpret the similarities and differences between these numbers, and countries' ranking on their progress.
Subsequently, the benchmarking is performed (step 3 in Fig. 1), making a comparison using specific yardsticks. Maheshwari and Janssen (2014) make a distinction between internally-based, expert-based, and crowd-based benchmarking. Internally-based Fig. 1. The five steps of the benchmarking process, based on Maheshwari and Janssen (2014) and Schellong (2009).
A. Zuiderwijk et al. Telematics and Informatics 62 (2021) 4 benchmarking refers to measuring and benchmarking within a particular organization or part of an organization, where data is not openly shared outside the organization (idem). Expert-based benchmarking involves experts, such as consultancy companies or expert panels, who carry out the measurement and benchmarking. Crowd-based benchmarking refers to measuring and benchmarking in a system where the entire measurement and benchmarking system, the collected data, and the results are openly shared with the public. In such a system, the crowd may be asked to provide input for the measurement activity (Maheshwari and Janssen, 2014). Open government data benchmarks often combine expert-based benchmarking and crowd-based benchmarking and sometimes also integrate internally-based benchmarking.
The benchmarks' outcomes can be used to create ranking lists of countries or organizations with different final scores (step 4 in Fig. 1). The main audience for open government data benchmarks is open government data policymakers, who can use benchmarks to identify the strengths and weaknesses of a government's open data policy and define measurements that could improve the provision and use of open government data. For instance, in the example mentioned above, the benchmarking activity may reveal that governments in certain countries do not openly share data on air quality, procurement, and government budgets. In contrast, they do share data on other topics. Implementing measures to improve the provision and use of open government data could increase value creation from open government data, such as transparency, trust, economic growth, and innovation.
Finally, the benchmarking process ends with the step of taking the outcomes of the benchmark activity and ranking lists to identify areas of improvement (step 5 in Fig. 1) (Hong et al., 2012), which requires interpretation (Maheshwari and Janssen, 2014). In an ideal situation, the stakeholders would implement the identified improvements in practice or within their organization (Skargren, 2020). Some scholars refer to this activity as 'benchlearning' and 'benchaction' (Freytag and Hollensen, 2001). The evaluation of the improvements then leads to a feedback loop where this cyclical process repeats.

Benchmarking benefits and criticisms
Benchmarking has been mentioned in the context of e-government as a useful tool for "learning, information sharing, goal setting or supporting performance management" (Schellong, 2009, p. 4). When government organizations have a better understanding of their current progress, they can identify which steps to take to improve their progress in open government data publication and use processes. They can also compare their progress to that of other countries and learn from countries which have progressed more (Susha et al., 2015). For example, country A could explore best practices implemented by countries B, C, and D which have progressed more. Similar approaches are used in benchmarking by firms, for example, to improve their services and products as well as their competitiveness and performance (Hong et al., 2012;Kyrö, 2004). Furthermore, benchmarking tools and the rankings they produce can be used by decision makers to develop information and communication policies and ensure the allocation of sufficient resources to implement such policies (Rorissa et al., 2011).
While benchmarks can be useful for many purposes, they also create considerable ambiguity regarding the interpretation of results (Bannister, 2007;Janssen et al., 2004). Certain variables may be relatively easy to measure and benchmark, such as the number of datasets downloaded from an open data portal or the number of users registered on an open data portal. However, it is much more complicated to measure and benchmark less concrete variables, such as 'the provision of open government data' and 'the use of open government data', because these are concepts that cannot be computed using a single score. It then "becomes necessary to use proxy variables and/or psychometric type tools" for these types of concepts, which "raises the question of what these should be" (Bannister, 2007, p. 173). Benchmarks in the area of e-government in general need to consider the context and purpose of public administration (Skargren, 2020). Moreover, for concepts that cannot be measured using a single number or assessment, benchmark developers need to compute a scale to create a score composed of multiple scores (Bannister, 2007). This implies that benchmark developers need to decide which methods and approaches to use to arrive at such a score, while fixed or commonly agreed rules for doing this are often lacking (Bannister, 2007). In addition, scoring methods vary with context (Bannister, 2007;Charalabidis et al., 2018a). What is seen as progress or success strongly depends on the benchmarking study (Janssen et al., 2004). Besides, repeating benchmarks over time is even more problematic because definitions of variables included in the measurement may change, the context may change, or the data needed may not be available anymore (Bannister, 2007).
The number of available open data benchmarks has increased rapidly in recent years (Máchová and Lnénicka, 2017;Sayogo et al., 2014;Susha et al., 2015). While Susha et al. (2015)  Both for open data provision and use, it may concern different types of data (e.g., statistics or not), in different formats (e.g., machine-readable or not), from various fields (e.g., agriculture, transport, or energy), involving multiple types of actors (e.g., governments, researchers, companies or citizens), at different levels (local, regional, national, international, global), from different countries or continents. Besides, researchers and practitioners have applied different perspectives on open data, such as economic, technical, operational, legal, social, political, and institutional perspectives (Zuiderwijk et al., 2014). In addition, progress in the area of open data can be measured in many different ways, for example, through surveys, case studies, experiments, and log data analysis (Purwanto et al., 2020). Covering all these different open data dimensions is possible but it might be too much for a single benchmark. Consequently, open data benchmarks often focus on some of these different dimensions and leave out others.
It is unclear whether the new benchmarks claim to be more comprehensive and address the older ones' shortcomings or whether they have different foci or coverage. Moreover, there is a lack of information about whether the relatively older open data benchmarks changed over time and how they were adapted and developed. It is therefore unclear whether the findings concerning countries' open government data initiatives of several years ago still hold. This lack of information creates uncertainty about the extent to which existing benchmarks are useful to continuously track the progress of countries over time (as opposed to their position in rankings this year). While reducing the risks of the misuse and misinterpretation of open government data benchmarks requires policymakers to clarify the similar and divergent metrics and methodologies used to measure and rank governments' performance in open government data benchmarks, this clarification is currently lacking.

Research approach and methods
This section provides information about the critical research paradigm adopted for this study (section 3.1), the qualitative metaanalysis that functions as the basis for our comparison of open government data benchmarks (section 3.2), the selection of open government data benchmarks included in our analysis (section 3.3), and the approach used to assess the benchmarks (section 3.4).

Critical research paradigm
This study adopts a critical research approach. While various paradigms are possible in benchmarking research, including the positivist and interpretivist research paradigm, critical research has been acknowledged as a useful paradigm for benchmarking research (Kyrö, 2004). Critical research uses a critical theoretical orientation, which means that the research's aim is framed in the context of theoretical issues (Cecez-Kecmanovic, 2005). The critical research approach explores if and how "institutions, ideologies, discourses […] and forms of consciousness in terms of representation and domination" constrain human decision-making, imagination, and autonomy (Alvesson and Deetz, 2000, p. 8). Critical research seeks to challenge established conceptions of truth and norms of knowledge creation and achieve social change (Cecez-Kecmanovic, 2005). Critical research thus seeks to challenge rather than confirm what has been established (Alvesson and Deetz, 2000). Critical research is appropriate for studies that drive activity, change, and empowerment (Kyrö, 2004). To quote from Cecez-Kecmanovic (2005, p. 22), "the purpose of critical social research is to change the worldactors, information systems, organizations, and society, including their dynamic, complex and emergent interrelationships." By identifying the factors behind subjective conceptions, including factors related to values, experiences or expectations, critical research seeks to "empower participants by liberating them from old modes of thinking" (Kyrö, 2004, p. 60). Interaction between theory and practice plays a relatively important role in critical research (idem).
The critical research paradigm is appropriate for attaining our research objective. Critical research is suitable for studies aiming to answer and understand "why" questions (Kyrö, 2004). This paradigm is compatible with the questions asked in this study, such as why certain countries are ranked differently in the ranking lists following open government data benchmarking activities. In this study, we challenge the outcomes of existing open government data benchmarks by comparing their metrics and methodologies that currently result in different, poorly understood ranking list outcomes. While various ranking lists comparing governments' achievements in open government data publishing and use already exist, we argue that these lists may not represent the 'truth'. Policymakers, politicians, and researchers need to be aware of the processes underlying open government data ranking lists so that they can act upon them. This study seeks to drive action to improve existing benchmarks and expose some of their weaknesses. These are the main motivations for adopting the critical research paradigm in this study.

Qualitative meta-analysis
We apply a qualitative meta-analysis to open government data benchmarks in this study. Qualitative meta-analysis can be used to "provide a concise and comprehensive picture of findings across qualitative studies that investigate the same general research topic" (Timulak, 2009, p. 591). It is useful for research that develops new interpretations from the analysis of multiple studies without having a priori concepts to test (Given, 2008). Qualitative meta-analysis has two main objectives: first, "to provide a more comprehensive description of a phenomenon researched by a group of studies, including its ambiguities and differences found in primary studies" (Timulak, 2009, p. 592) and second, "to provide an assessment of the influence of the method of investigation on findings." Qualitative meta-analysis has been found to be useful for the comparison of open data benchmarks in previous research (Susha et al., 2015) and the comparison of e-government stage models and maturity models in general (Almuftah et al., 2016;Dekker and Bekkers, 2015;Lee, 2010;Siau and Long, 2005). We argue that it is also useful for this study, since we seek to compare the differentiating elements of existing benchmarks in measuring open government data progress.
The meta-study method, one form of qualitative meta-analysis, is a research approach that seeks to analyze the theory, methods, and findings of qualitative research and to synthesize the findings from these activities into new ways of thinking about phenomena (Paterson et al., 2001). Drawing on research by Ritzer (1990), Zhao (1991) states that meta-analysis has three main components: meta-data-analysis (the analysis of findings), meta-method analysis (the analysis of methods), and meta-theory analysis (the analysis of theory). These three types of analysis should be undertaken prior to synthesis (Barnett-Page and Thomas, 2009). Fig. 2 shows how we apply the meta-study approach to our study of open data benchmarks. The meta-data analysis is carried out in section four and includes comparing the metrics used in open government data benchmarks. The meta-method analysis is performed in section five and compares the methodologies underlying open government data benchmarks. Finally, the meta-theory analysis described in section six compares the theoretical models on benchmarking open government data. We discuss the overall meta-analysis in section seven of this article.

Selection of open government data benchmarks
Based on our research objective to compare the metrics and methodologies used to measure governments' progress in open government data initiatives, we defined the following five criteria to select benchmarks for our open government data benchmarks comparison. First, the benchmarks should focus on open government data since this is our study's focus. Second, the benchmarks should A. Zuiderwijk et al. assess the progress of governments, to remain consistent with our research objective. Third, the benchmarks should assess governments' progress in multiple countries or organizations since we are interested in differences in ranking lists resulting from the benchmarking activity of different benchmarks. Fourth, the benchmarks should assess countries or organizations based on one or more aspects of open government data sharing or use. Some benchmarks focus on a particular part of open government data initiatives: only the data sharing aspect or the data use aspect. In contrast, others include indicators and measurements of both perspectives. And fifth, the information about the open government data benchmarks is available and accessible, which is essential for comparing the metrics and methodologies used in existing open government data benchmarks.
Applying these criteria, we searched Google using combinations of the keywords 'open data', 'benchmark', 'rank', 'index', 'maturity' and 'assessment'. This led to the identification and selection of nine relevant open government data benchmarks, as depicted in Table 1. Most of the selected benchmarks are global, while one focuses on European countries and EFTA countries (OD Maturity) and one focuses on OECD member countries and OECD partner countries (OECD report). Susha et al. (2015) compared the first four benchmarks in this list and the PSI Scoreboard. We did not include the PSI scoreboard since it no longer exists, and no recent information is available. Benchmarks five to nine in Table 1 were developed in line with the study by Susha et al. (2015). By comparing the more recently developed benchmarks to the benchmarks that have been in existence for longer, we can also examine the development of benchmarks over time.

Benchmark assessment approach
We used the following approach to assess the benchmarks. The first author of this paper began by analyzing the benchmarks using the information sources mentioned in Table 1. These information sources were identified by searching for the benchmark on Google,  Paterson et al., 2001;Susha et al., 2015;Zhao, 1991).
7 and subsequently analyzing all possibly relevant documents available through the benchmark's website. Afterwards, the second and third authors of this paper checked and validated the results using the same approach to search for the information sources. This did not lead to additional information sources. The three authors discussed questions and doubts, such as when they were unable to identify information about the 'amount of data collected' by the OECD Report (#7). The second and third authors' checks led to minor changes in the benchmark assessment, but not to any fundamental changes. While all the analysis results were double-checked and discussed by multiple authors of this paper, these findings have not been checked with the creators of open government data benchmarks or other actors involved in open government data benchmarking.
For the temporal analysis of how the selected open government benchmarks developed over time, we examined the methods used every year that the measurement was carried out and listed these in a document. For each benchmark, we examined changes in metrics and methodologies used over time. Then we also compared the metrics and methodologies used from year to year and sought patterns. This information was used as the basis of our conclusions on the evolvement of the benchmarks over time.

Meta-data: Comparing the metrics used in open government data benchmarks
The first step of this research compares each benchmark's purpose, the main variables, the themes covered, and the underlying rationales (see Table A-1 in Appendix A). Based on this comparison, we identify similarities, discrepancies, and gaps, and we identify the assumptions underlying the selected open data benchmarks. When a benchmark has multiple measurement moments, we only report the methodology used in the last edition of that benchmark. We sometimes refer to individual benchmarks in the text below; the abbreviations correspond to those mentioned in Table 1.Comparing the nine benchmarks from Table A-1, we see that they have a different focus, and some have multiple focus areas. The OD Readiness benchmark (#1) and the OECD Report (#7) aim to assist in planning and to function as a decision-making instrument for open data policymakers. In contrast, the OD Barometer (#2), OD Maturity (#5), WJP Index (#6), ODIN (#8), and EIU (#9) focus on providing insight into and a better understanding of the current situation and existing gaps. The OD Index (#3) and ODIN (#8) both aim to be a tool for advocacy and question governments' progress. OD Economy (#4) and OD Maturity (#5) seek to go beyond these objectives by deriving guidelines and best practices from benchmarking and bench-learning. Revealing progress made (OD Maturity, #5), encouraging dialogue between stakeholders (ODIN, #8), and promoting open data policies (ODIN, #8) are purposes mentioned by a single benchmark only.
The readiness of a particular country, region, or organization for an open data program is measured by four benchmarks (OD Readiness #1, OD Barometer #2, OD Maturity #5, OECD Report #7). The benchmark used by the World Bank Group (OD Readiness #1) explicitly focuses on open data readiness. It sheds light on whether a government organization (at any administrative level) is ready to implement an open data program. OD Barometer (#2), OD Maturity (#5), and the OECD Report (#7) also evaluate the actual implementation of open data initiatives, in addition to the readiness for such an initiative. Four benchmarks (OD Barometer #2, OD Maturity #5, WJP Index #6 and OECD Report #7) evaluate the impact of OGD initiatives, and three of them (OD Barometer #2, OD Maturity #5 and OECD Report #7) evaluate the full combination of readiness, implementation, and impact. Open data policymakers need information in all of these phases to decide whether an open data initiative should be started, adjusted, or terminated. The benchmarks each have a different focus and complement each other.
We also studied the scope of the nine benchmarks from the perspective of development over time. One finding by Susha et al. (2015) was that, at the time of their study, open data benchmarks mainly focused on readiness and implementation, rather than the impact of open data initiatives. After more than a decade of open data movement, we now see that the impact of open data is becoming more topical in the open data literature (e.g. see Charalabidis et al., 2018b), and the newer benchmarks reflect this. Of the four relatively older benchmarks, only one included impact measurement (OD Barometer, #2). Of the five relatively newer benchmarks, four indicate they measure the impact of open data (OD Maturity #5, WJP Index #6, OECD Report #7, EIU #9). The first three of these four encompass readiness, implementation and impact. Implementation was already measured by three out of four relatively older benchmarks and the same applies to all five relatively newer benchmarks (OD Maturity #5, WJP Index #6, OECD Report #7, ODIN #8, EIU #9). Thus, over time we see a shift towards more measurement of impact in the newer benchmarks.
The selected benchmarks cover a large variety of topics. Although several benchmarks have a similar focus, they measure different aspects of open data initiatives' progress. The majority of the variables are measured by a single benchmark only. None of the variables used by open government data benchmarks is measured by more than three benchmarks. One could argue that the analyzed benchmarks complement each other. Policymakers can select the variables to evaluate their open data initiative and combine the ones they find most relevant. This considerable fragmentation of variables creates a risk that the users of open data benchmarks can 'pick and choose' the benchmarks that make it easier to gain a higher score and show a better picture.
The rationales of the different benchmarks show their similar and differing perspectives on open data progress. As regards differences, some benchmarks define significant progress in open government data initiatives as initiatives that have a dynamic ecosystem (OD Readiness #1), in which open data portals are developed (OD Maturity #5) to support the rich supply of high-quality data (OD Readiness #1, OD Index #3, OD Economy #4, OD Maturity #5), in which the data is extensively used (OD Economy #4, EIU #9), many different stakeholders are involved (OD Readiness #1), and an impact is achieved (OD Maturity #5). Successfully progressing open data initiatives have a policy in place (OD Maturity #5), profit from political support (OD Economy #4), and have limited barriers to accessing and using OGD (EIU #9). Some benchmarks emphasize society's involvement and engagement with open government data users, or the combination of government, private sector, and civil society (OD Barometer #2). Progress in the context of open government data benchmarks is also understood to be open government data initiatives that are effective (EIU #9) or that are positively evaluated from the perspective of citizens (WJP Index #6). One benchmark (ODIN #8) defines progress in the context of open government data initiatives as initiatives that have great openness and coverage of national open statistical data (ODIN #8), as an important category of open government data.
Regarding the similarities in rationales, four out of nine benchmarks see the publication of government data as one of the most important characteristics of open data progress and look exclusively at open government data publication (OD Index #3, OD Economy #4, OECD Report #7, ODIN #8). These are both relatively older and newer benchmarks. Two relatively newer benchmarks exclusively focus on the use or potential use of open government data (WJP Index #6 and EIU #9). Three benchmarks look into both aspects (OD Readiness #1, OD Barometer #2, OD Maturity #5). Two benchmarks focus on open government data from citizens' perspective (WJP Index #6 and EIU #9). In contrast, two others explicitly mention that they look into the involvement of multiple stakeholders (OD Readiness #1, OD Barometer #2, OD economy #4). Two benchmarks make a distinction between countries with different open data progress levels, namely OD Economy (#4) and OD Maturity (#5). They divide countries into beginners, followers and trend-setters. OD Maturity (#5), a more recent benchmark than OD Economy (#4), adds fast-trackers to this division, which is a group that has emerged more recently.
Three benchmarks (OD Barometer #2, OD Index #3 and OECD Report #7) explicitly relate the progress of open government data initiatives to the G8 Open Data Charter, (2015), and the G20 Anti-Corruption Open Data Principles (G20's Anti-corruption Working Group, 2015) in defining open government data progress. These charters advocate for data to be open by default, timely and comprehensive, accessible and usable, comparable, and interoperable. Moreover, open data should be useful for improved governance and citizen engagement and for inclusive development and innovation.
The preceding leads us to conclude that the benchmarks paint an inconsistent picture of what defines open data progress. The selected benchmarks have very different purposes and cover a large variety of variables. The benchmarks' scope differs, although over time, we see a shift towards more measurement of impact in the newer benchmarks. Since most of the benchmarks include different variables, their findings may complement each other.

Meta-methods: Comparing the methodologies underlying open government data benchmarks
In this section, we evaluate the methodologies applied in open data benchmarks. We analyze the influence of the investigation method used in the open data benchmarks on the benchmarks' findings, and we analyze the development of open data benchmarks over time. The approach used for this meta-methods analysis has been described in Section 3.3. Table B-1 in Appendix B provides the results from our meta-methods analysis. The table shows that the geographical coverage of the selected benchmarks ranges from 10 to 178 countries. Out of the nine benchmarks, three provide results for 2018, 2019 or 2020 (OD Maturity #5, WJP Index #6, OECD Report 7, and ODIN #8). The other benchmarks provide results for one or more years in the period 2011-2017. Some benchmarks have been used only once (OD Economy #4 and EIU #9), and one is only used on demand (OD Readiness #1). The most long-standing benchmarks are the OD Barometer (#6) and the WJP Index (#6), which have been used consistently since 2013 (OD Barometer #2) and 2015 (WJP Index #6), respectively.
Most of the selected open government data benchmarks have a validity check (OD Readiness #1, OD Barometer #2, OD Index #3, OD Maturity #5, WJP Index #6, OECD Report #7, ODIN #8). For some open government data benchmarks, there is no mechanism or check to validate the findings (OD Economy #4), or it is unclear whether a validity check is being applied (EIU #9). Validation mechanisms that are applied make data and / or the methodology available as living data or living documents (OD Readiness #1, OD Barometer #2), comprise peer review by experts or expert teams (OD Barometer #2, OD Index #3, ODIN #8), provide justifications and confidence levels (OD Barometer #2), perform cross-checks with those responsible for open data projects at the national level (OD Maturity #5, OECD Report #7), perform result validation through desk research (OD Maturity #5) and a cross-check against qualitative and quantitative third-party sources (unclear which ones) (WJP Index #6). Some benchmarks include validity checks on the reputation and professionalism of the organization conducting the assessment (e.g. OD Maturity #5, OECD Report #7). In contrast, other benchmarks (e.g. OD Barometer #2, OD Index #3) use a crowdsourced approach and foster trustworthiness by inviting feedback on the results from the community.
Different weights are applied to components and a variety of scales are used in the benchmarks. In some benchmarks, all dimensions have equal value (OD Barometer #2, OD Economy #4, OECD Index #7), whereas in others, different dimensions have different weights 9 (OD Readiness #1, OD Index #3, OD Maturity #5, ODIN #8) or averages are calculated (OD Barometer #2, WJP Index #6). Scales vary from yes/no questions to Likert scale questions and from ten-point scales to three-point scales, usually combined in a single benchmark.
In some cases, benchmark developers adapted their benchmark methodologies throughout the measurement period. For instance, the number of countries was reduced for the OD Barometer (World Wide Web foundation, 2019). The international edition, which was last conducted in 2017, included 115 countries. In contrast, the latest edition covers 30 countries which have publicly committed to adopting the International Open Data Charter Principles (2015) or the equivalent G20 Anti-Corruption Open Data Principles (G20's Anti-corruption Working Group, 2015). Another methodological change concerns the change in scale. While previous editions of the OD Barometer used scaled values, the latest version uses absolute values in the 0-100 scale for scores (World Wide Web Foundation, 2019). The methodology of the OD Index has also changed over time, so results of multiple years are not directly comparable. Significant changes were applied between 2015 and 2016, including revisions of the set of datasets used, changes to dataset definitions, an increase in entries to the index, and changes of the review process from peer review to thematic review (Open Data Charter, 2015). The OECD Report also changed its methodology (and its name), as a different approach was used in 2016/2017 compared to 2014, although it is unclear what exactly changed. Sometimes this lack of clarity is caused by the lack of metadata. For instance, the surveys used to create the OECD report are not shared openly; only the report and underlying data are available online. Methodological changes create difficulties in being able to consistently measure the progress of countries.
We also analyzed the information in Table B-1 using Schellong (2009) types of measures: natural, proxy and constructed measures (see Section 2.1). We found that all of the examined benchmarks use at least constructed measures, which means that they combine multiple progress levels. They attribute values to each progress level to eventually deduct a final score (Schellong, 2009). None of the benchmarks is solely based on natural measures, i.e., measures already in use. Some benchmarks (e.g., OD Index #3, ODIN #8, and EIU #9) use proxy measures in addition to constructed measures, such as the number of datasets published by an organization or country, as one of their measures. Proxy measures can only indirectly be connected to the benchmark's objective , and they are always used in combination with other measures. The findings that open data benchmarks combine various achievement levels, and that their measures can only indirectly be connected to the benchmark objectives are consistent with the multidimensional and multifaceted nature of the open data concept that we referred to in Section 2.2. Since many dimensions and facets need to be considered in measuring open data progress, it is impossible only to use a single, direct indicator.
The benchmarks and their methodologies reflect some of scientists' critical criticisms on the open data literature. A first criticism is that open data research is, generally, less focused on impact and more on data provision (Gascó-Hernández et al., 2018;Safarov et al., 2017;Sieber and Johnson, 2015;Zhu and Freeman, 2019). This is also reflected in the examined benchmarks. Most open data benchmarks only address implementation and impact from the perspective of data provision or capability (EIU, benchmark #9 is an exception), despite referring to terms such as open data use and value generation. Merely focusing on this data provision while ignoring the required commitment, resource investment, and sustained efforts from the data providers' side reduces the possibility to attain economic and social value (Krishnamurthy and Awazu, 2016). Second, literature on the various open data adoption levels and user interaction, participation, and engagement is scarce (Hossain et al., 2016). This is similar to our findings concerning open data benchmarks. These terms are excluded from most of the investigated open data benchmarks. Although the WJP Index (#6) states it measures 'civic participation', in fact, it only measures the possibility for citizens to participate in open data processes. A third criticism on the open data literature is that economic and business-related aspects are often ignored (Hossain et al., 2016), although it is complicated for open data scholars to obtain information concerning applications and businesses developed based on open data (Corrales-Garay et al., 2020). Some of the benchmarks in our selection do address the economic aspects (OD Barometer #2, OD Maturity #5, ODIN #8, EIU #9). However, specific information concerning, for example, the number of developed applications or businesses building on open data is lacking in these benchmarks. It is complex to quantify the economic impact of open data in benchmarks since this is difficult to measure, and impact is mostly indirect and multidirectional.
In Section 4, we concluded that the benchmarks paint an inconsistent picture in defining the metrics to determine open government data progress. In this section, we found that open government data progress is also measured in divergent ways. The benchmarks use different methodologies for their sampling, data collection period, frequency of measurement, government level addressed, type and amount of data collected, data collectors and data providers involved, validity checks, scales, and weights of components. Additionally, several benchmarks changed their underlying methodology or aspects of it over time. Finally, we found that open data benchmarks mainly use constructed, indirect measures, which is consistent with our characterization of open data as a multidimensional and multifaceted concept.

Meta-theory: Comparing the theoretical models for benchmarking open government data
To better understand the metrics used in open government data benchmarks, this section discusses the existing open government data progress models identified in the literature. Academic literature often refers to progressing open government data initiatives as initiatives with high levels of performance or maturity (Charalabidis et al., 2018a;Veljković et al., 2014). To identify open government data progress models, we searched Google Scholar, Web of Science, and Scopus. We used the following combination of terms in the title, abstract and keywords: "open data" AND (benchmark OR rank OR assessment OR evaluation OR growth model OR stage model OR maturity OR progress OR framework). We let the database sort the search results based on relevance and limited the search results to the period 2011-2021. In the first assessment phase, we examined each database's first 30 search results (so 90 results in total) and then manually determined the papers' relevance by looking at each item's title, keywords, and abstract. In case of doubt, we included the manuscript in our selection. This led to an initial selection of 26 papers. In the second assessment phase, we read the full manuscripts and removed three types of studies from our selection (eleven studies in total): -studies with irrelevant results, such as studies that remain at the conceptual level without developing a specific model or framework (four studies) or that focus on data in general rather than open government data in particular (two studies); -studies of which the full-text was not accessible (one study) or not available in the English language (two studies); -studies that adopted open data progress models developed in other studies (two studies).
Then we added three papers identified by snowballing these papers. Eventually, we selected the eighteen most relevant search results that contained fifteen identical open data progress models. The underlying OGD model data derived from our literature review can be found through the 4TU.ResearchData portal (DOI: 10.4121/14604330).
This section explains how the fifteen selected open government data models define progress, and we compare the characteristics of these models (see Table C-1 in Appendix C). The selected models have different foci and different levels of analysis. For example, Kalampokis et al. (2011) and Sayogo et al. (2014) focus specifically on governments' open data. Solar et al. (2012) and Welle Donker and van Loenen (2017) study open data in general without focusing on a specific actor or group involved. Ham et al. (2015) focus on open data progress through open innovation by governments, where governments assume an intermediary role.
The models also differ in terms of taking a data provider's or user's perspective. One model (#15 in Table C-1) exclusively adopts the data providers' perspective on OGD progress, meaning that it evaluates the readiness of government agencies to openly share their data with the public. Various models (# 2, 4, 5, 8, 13, 14) exclusively take a data users' perspective in their evaluation of OGD progress, in which they study open data progress from the perspective of what data is publicly available and how external actors are engaged in governments' data provision. Most of the selected models include both a data provider and user perspective (# 1, 3, 6, 7, 9, 10, 11, 12). For example, they focus on both governments' data supply and how users can make use of this data. However, none of the fifteen selected models refers to impact, and only three refer to value creation as a critical theme (#1, 7, and 13).
The identified models are ordered chronologically, meaning that lower numbers concern relatively older models, and higher numbers refer to the more recently developed models. Considering this information, one cannot identify an obvious pattern or apparent differences in the adoption of data providers' or users' perspectives over time. The most recently developed models are those by Dahbi et al. (2019) and Osorio-Sanabria et al. (2020). Compared to the older models, these models are not necessarily more comprehensive or more impact-oriented. It is also not clear whether the newer models build on the older models, or combine the outcomes from models that appeared to be useful in the past.
Our analysis shows that each of the selected open data progress models differs in terms of the number of levels or the terms used to refer to them. Several scholars argue that pursuing higher levels of progress requires some prerequisites. For example, Solar et al. (2012) maintain that higher levels of progress can be achieved by introducing perspectives on establishing public services, legal aspects, technological aspects, and citizen and entrepreneurial aspects. They conclude that attaining higher levels of progress requires introducing proper rules, technology, knowledge, and skills. As the level of performance increases, public participation and engagement become topics with higher priority in some models of open data progress (Ham et al., 2015;Sayogo et al., 2014). Higher levels of open data progress then go hand in hand with governments increasing the public's open participation in their work and decision-making through various methods and technologies, such as social media and applications. Various terms are used to refer to 'participation' in the different models, including citizens' perspective (Solar et al., 2012) and user characteristics (Welle Donker and van Loenen, 2017). In addition to facilitating public participation in open data projects, some researchers refer to other steps required to attain higher levels of open data progress, including data governance (Welle Donker and van Loenen, 2017) and the integration of government data with non-governmental formal and social data (Kalampokis et al., 2011).
In comparison to the open data benchmarks discussed in Section 4 and 5, the theoretical models for benchmarking OGD are relatively identical in terms of measures used. Similar to the benchmarks, the theoretical models mainly focus on constructed measures and some of them additionally contain proxy measures. For example, Hjalmarsson et al. (2015) first scan the number of available data sources (a proxy measure) and then combine this information with a qualitative assessment of various quality dimensions of these data sources (a constructed measure). None of the selected models uses natural measures, which again can be explained by the multidimensional nature of the open data process (see Section 2.2). Furthermore, just like the benchmarks, some models contain a limited set of measures while others are more comprehensive. For instance, Dahbi et al. (2019) evaluate five themes (i.e., the discoverability and richness of information, data quality, reusability, and interactivity), where each theme is composed of different indicators, consisting of various possible scores. Other theoretical OGD progress models have a more narrow scope. For example, they are focused on specific countries (Osorio-Sanabria et al., 2020;Srimuang et al., 2017) or do not define the different stages of the themes they evaluate (e.g., Osorio-Sanabria et al., 2020).
In sum, the models reviewed paint a complex picture of what constitutes high progress levels of open government data initiatives. The authors of most models agree that the critical element is the generation of value, but they emphasize different mechanisms and processes to achieve this. Some of the newer models seem more comprehensive as they include a wider variety of themes and perspectives

Discussion: A qualitative meta-analysis of open government data benchmarks
This section discusses the findings from our qualitative meta-analysis: the comparison of open government data benchmarks. First, we compare the definitions of open government data progress according to theoretical models in the literature with existing open government data benchmarks (section 7.1). We then discuss the metrics and methodologies shaping the variation between open government data benchmarks (section 7.2), followed by a discussion of the development of open government data benchmarks over time (section 7.3).

Comparing open data progress definitions between benchmarks and literature models
We compared the way that progress is defined in the literature on open data to the progress levels according to the nine open data benchmarks we analyzed in the previous sections. As in the nine open data benchmarks, the selected fifteen open data progress models from academic literature reflect a distinction between progress stages. The benchmarks refer to differences in terms of open data readiness, implementation, and impact. Although we did not find this exact distinction in the academic literature, some benchmarks have a similar logic to specific open data progress models from the literature. For example, the OD Readiness benchmark (#1) exclusively focuses on readiness and shares the sense of the progress model by Solar et al. (2012), which focuses on various organizational capacities essential in preparing for an OGD initiative. Similarly, the model by Sayogo et al. (2014) echoes the OD Economy benchmark of Capgemini Consulting, as they all emphasize quality data publishing and user participation opportunities. It is also noticeable that specific models (e.g, Kalampokis et al., 2011;Sayogo et al., 2014) and benchmarks share the data-driven focus of the OD Index (#3). The legal dimension, one of the many dimensions in OD Readiness (#1), OD Index (#3), and WJP Index (#6), is only present in the progress model presented by Solar et al. (2012).
According only address open data implementation and impact from the perspective of data provision or capability (EIU, benchmark #9 is an exception). For example, the term 'civic participation' as measured by the WJP Index (#6) suggests that citizens' actual participation is measured. In fact, only the possibility for citizens to participate is measured.
All nine open data benchmarks focus on governments, mainly at the national level. Only OD Readiness (#1) includes both national and sub-national levels, and the EIU (#9) most probably includes governments at multiple levels. This is not completely clear, however, because of missing information. Eight of the nine open data benchmarks focus on countries, while only the OD Index (#3) concentrates both on countries and regions. This means that none of the analyzed benchmarks concentrates on local government level, while the literature also calls for monitoring strategies to address open government data use at the local level (Wilson and Cong, 2020). When it comes to the open data progress models from the literature, nearly all models measure open data progress at organizational level (e.g., Kalampokis et al., 2011;Solar et al., 2012;Welle Donker and van Loenen, 2017). Some of the identified open data progress models are not organization-specific but can be applied to multiple organizations (e.g., Ham et al., 2015;Sayogo et al., 2014), countries or data platforms (e.g., Máchová et al., 2018). In general, this reveals a different measurement level for the open government data benchmarks used in practice and the progress models used in academic research.
In sum, while open data use, participation, and user engagement are important elements of several open data progress models, these models do not specify exactly how practitioners should measure these elements of open data progress. While several open data benchmarks include open data use, participation, and user engagement, these benchmarks mainly look at whether there is a possibility for open data use, participation, and engagement, rather than measuring the actual use of data. This is probably the result of the complexity of measuring open data use, participation, and engagement. Consequently, the actual use of open data is measured only superficially and mainly at country level. The findings of open government data benchmarks only paint part of the picture. Users of open government data benchmarks may not always be aware of this limitation.

Analyzing the metrics and methodologies affecting the variation between open data benchmarks
We found that the nine selected open data benchmarks and the five selected open data models use different metrics and methodologies to assess open government data progress. The differences in sampling used in the identified benchmarks can often be explained by looking at their objectives and scope (i.e. the meta-data). For example, the OD Index (#3) presents itself as a global index, which explains why this benchmark covers a large variety of countries and places. Regarding the methodology, a standardized questionnaire is used that can be applied to many countries and places worldwide. As another example, OD Maturity (#5) is a benchmark developed by the European Data Portal and hence focuses specifically on Europe. However, methodological differences in, for instance, the amount of data collected, the specific data collectors and providers, and the applied validity checks cannot be explained using the collected meta-data.
The same counts for the differences identified in the academic open data models. The differences in level of analysis in the open data models can often be explained by the type of model and its themes. For examples, the model developed by Solar et al. (2012) focuses on open data maturity in public agencies. Therefore, it is organization-specific. The model by Welle Donker and van Loenen (2017) concerns the open data ecosystem, which explains why it covers the themes of data supply, use and governance. Nevertheless, not all aspects identified through the meta-theory analysis can be explained in this way. For example, the focus and scope of the models do not provide arguments for the different stages used in the models and for the different functions that progress and maturity have in them.
The differences between the methodologies and metrics used in the open data benchmarks and the open data models are not necessarily bad. The different approaches used may very well complement each other. They can also be used as a way to investigate whether one methodology is more effective or efficient than another, and whether they lead to similar or different results. In this way, benchmark developers can learn from each other. Researchers in general can use comparisons between open data benchmarks to better understand how open data progress may be measured.
Six of the nine benchmarks provide a rank based on their evaluation of countries, regions or organizations. The fact that the open government data benchmarks and literature models include different variables explains why they produce different results in these rankings. Open data policymakers can use the ranks to obtain more information about the position their country, region, or organization holds in terms of open government data progress. However, when they do so, it is critical that they examine the details of why a particular country, region, or organization has been ranked in a certain way. Table 2 shows the ten highest-ranked countries according to the most recent edition of the selected benchmarks and confirms that the results are very different. The highest-ranked countries differ considerably, a common finding for countries' comparisons in egovernment research (Janssen et al., 2004). The table does not cover OD Readiness (#1), the OD Economy (#4), and the EIU (#9) because these indices only evaluate open government data initiatives for the selected countries without comparing and ranking them. Furthermore, while the table shows the data derived from the most recent editions of the open government data benchmarks, they are based on data collected in the years before, usually data collected in 2018 or 2019.
Extraordinarily high and low rankings may be misinterpreted since most benchmarks focus on one particular aspect of open government data progress while leaving many other relevant factors out of scope. The ranking itself therefore does not provide much information, whereas the motivation behind it and an investigation of the underlying variables does. This information could be beneficial in decision-making about further developing a specific open government data initiative, especially when the combination of multiple benchmarks is considered. For example, decision makers can combine understandings of the benchmarks of the World Bank Group (#1) and the European Data Portal (#5) of why a country is ready for an open government data program with findings obtained from the benchmarks of the World Justice Project (#6), the Open Knowledge Foundation (#3) and Open Data Watch (#8) about the available data and its quality. These findings can then also be combined with findings obtained from the Economist Intelligence Unit (#9) about open data usage and the effectiveness of open government data initiatives. Susha et al. (2015) studied five open government data benchmarks. Various new benchmarks have been developed since 2015. We updated the open government data benchmarks described by Susha et al. (2015) by adding the new benchmarks developed since 2015 and updating the information about benchmarks that already existed at that time. Our analysis showed that the methodology used by most open government data benchmarks has been adapted to some extent over recent years. This is understandable because open data is a rapidly developing practice and research area; open government data guidelines are constantly under development. However, users of open government data benchmarks are often unaware of this when they endeavor to compare their country's performance throughout the years. Furthermore, the changes to some open government data benchmarks seem purely practical. For example, the OD Barometer reduced its scope from 115 to 30 countries, which makes it easier to compare countries (World Wide Web Foundation, 2019).

Analyzing the development of open government data benchmarks over time
While analyzing many articles, websites, and reports to compare the benchmarks, we found a big gap in information provision regarding the metrics and methodologies used across the open data benchmarks. Some benchmarks provide relatively plentiful information about the methodology they use to assess open data progress in different countries, whereas others provide only limited information. For example, Open Data Watch (2019a) (#8) and the Global Open Data Index (2019b) (#3) provided an extensive report describing all variables included in the study and the way they were assessed. Conversely, The Economist (#9) only provides limited information about its data collection method. Several benchmarks neither provide access to the survey questions used nor to the raw data collected (e.g. OD Maturity, #5, and EIU, #9). Items of information about the OECD benchmark are provided in many different places, and they need to be combined to create a better picture of the main characteristics of this benchmark. The metrics and the underlying methodologies adopted need to be very transparent to help open data policymakers to correctly interpret the findings from open government data benchmarks. While most open government data benchmarks have adapted their methodology to some extent over the years, our analysis shows that the scope and variables included in these benchmarks have barely changed over time. Only a few benchmarks have added or adjusted the variables measured. This suggests that open government data benchmarks are not adapted according to new insights derived from the latest research and practice. Changes have only occasionally been applied to improve the quality of the benchmarks. For instance, the definitions of datasets used by the OD Index have been improved to create better consistency of the index (Open Data Charter, 2015).
Our study also aimed to explore the synergies between the benchmarks and academic models and whether they are more in accord with one another than before. Previous research by Susha et al. (2015) found that open government data benchmarks are less focused on addressing open data use and impact than open data progress models from the literature. Our study found considerable variety among benchmarks and models for assessing the progress of open government data initiatives. The evaluated benchmarks together address the readiness, implementation, and impact of open data initiatives. As to the difference between older and newer benchmarks, we found that newer benchmarks (benchmarks #5-9) all include the assessment of impact in one form or another and thus are more focused on assessing the outcomes of open data initiatives, rather than successful implementation, which was the focus of previous benchmarks. Moreover, two open government data benchmarks (benchmarks #6 and #9) include citizens' perceptions and views as one of the variables. This suggests that, although there is still considerable variety in variables covered, the benchmarks and academic models are more in accord with one another than before, albeit only slightly.
However, when we critically examined the way that implementation and impact are measured, we found that most of the evaluated benchmarks do not measure the actual use of open government data (EIU, benchmark #9 is an exception). Most benchmarks that address implementation and impact focus on the supplier's perspective on open government data use and impact. For example, the dimension of 'civic participation' as measured by the WJP Index (#6) measures "whether people can voice concerns to various government officers and members of the legislature, and whether government officials provide sufficient information and notice about decisions affecting the community, including opportunities for citizen feedback" (WJP, no year, p. 1). While the term 'civic participation' suggests that it measures actual participation by citizens, it actually measures whether citizens have the possibility to participate. As another example, the Global Open Data Index (#3) assesses the legal openness of data (openly licensed), technical openness of data (open and machine-readable format), and practical openness of data (immediately downloadable, up-to-date, publicly available, available free of charge). A high score in the Global Open Data Index's rank reflects that the datasets provided are legally, technically, and practically open, rather than measuring the level of support given for open data use or engagement with open data users (Nikiforova and McBride, 2021).
Although openness and transparency aspects and the supply side of open data are necessary conditions for open government data use, scholars and open data policymakers should be aware that these open government data benchmarks do not measure other essential aspects. For example, it is essential that data is machine-readable and has an open license (Opendatacharter.net. 2015;; G20's Anti-corruption Working Group, 2015). However, suppose the user cannot contact the data provider for specific questions about the methodology used to collect the data. In that case, if the user does not trust the data provider or believes the quality of the data is inadequate, this data may still not be used. This same observation was made in an earlier analysis by Susha et al. (2015), and implies that newer benchmarks for open government data have not overcome this shortcoming.

Conclusions
This study aims to compare the metrics and methodologies used to measure, benchmark, and rank governments' progress in open government data initiatives. Using a critical meta-analysis approach, we compared the metrics of nine open government data benchmarks in terms of key concepts, themes, and metaphors and the methodologies underlying the benchmarks. We found that four out of nine benchmarks consider the publication of government data to be one of the most important characteristics of open data progress and look exclusively at open data publication (OD Index #3, OD Economy #4, OECD Report #7, ODIN #8). Two benchmarks exclusively focus on the use or potential use of OGD (WJP Index #6 and EIU #9). Three benchmarks look into both aspects (OD Readiness #1, OD Barometer #2, OD Maturity #5). Moreover, the variables that open government data benchmarks measure are very different. Most of the identified variables are only measured by one or two benchmarks. This inconsistency concerning what defines open data progress is also visible among the five open data progress models found in the literature. The diversity of variables shows that open data is a multifaceted concept, and its accurate evaluation requires adopting a comprehensive approach to this concept.
Another important finding is that although several open government data benchmarks (mostly the relatively newer ones) claim that they measure open data impact, this is often done from the perspective that a certain impact is possible and the required conditions exist, rather than the actual establishment of this type of impact. Most open government data benchmarks neither measure citizens' and other actors' actual participation in the specific use of open data nor established collaborations between open data providers and open data users. In contrast, various open data progress models from the literature refer to participation and user engagement as key characteristics of more progressed open government data initiatives, although the number of open data progress models studying value-creation as a key theme is still very limited. None of the identified theoretical models specifically refers to measuring impact dimensions.
The methods used to collect information about open data progress are diverse and include the number of countries covered, the sources of the information collected, the frequency of carrying out the benchmarks, and the validity checks applied. The methodology of several open government data benchmarks has changed over time, and variables have been adjusted. Methodology changes tend to be practical (e.g., reducing the number of countries because this makes it easier to conduct the measurements) rather than based on new findings from the open data literature. On the one hand, the methodological diversity of open data benchmarks may lead to different ranking outcomes and hence puzzle policymakers and other benchmark users. On the other hand, benchmarks using different methodologies may complement each other. They may also allow researchers to better understand how open data progress can be measured in different ways and whether the different methodologies lead to different outcomes. Since many dimensions and facets need to be considered in measuring open data progress, it is impossible to use a single, direct indicator.
Regarding the limitations of this study, our comparison of open government data benchmarks and open government data progress models from the literature required the collection of information from many different sources. It was sometimes difficult to find specific information. Furthermore, the information we collected was sometimes open to multiple interpretations. While all the analysis results were double-checked and discussed by multiple authors of this paper, these findings have not been checked with the creators of open government data benchmarks or other actors involved in open government data benchmarking. Besides, while the open data concept is multidimensional and multifaceted (see Section 2.2), it may not be possible to address all these dimensions and facets in a single benchmark. Consequently, the comprehensiveness of open data benchmarks may need to be addressed by a combination of open data benchmarks that build on and complement each other. Thus, there may be no such things as an 'ideal' or 'best' open data benchmark, because they all fulfill different purposes and together lead to a comprehensive evaluation of open data efforts.
We recommend that future research addresses these issues and repeat this study in a couple of years to closely monitor Our study offers an overview of the differences in opinion on how researchers and practitioners believe the progress of open government data initiatives can be measured. Second, our study connects research and practice by analyzing how benchmarks (practice) reflect or build on academic open data progress models (research). In terms of the measures used, our study revealed that the selected open data benchmarks are relatively identical to the theoretical models for benchmarking OGD. Similar to the benchmarks, the theoretical models mainly focus on constructed measures and some of them additionally contain proxy measures. Third, scholars researching the topic of open government data in general may consult open government data ranking lists to justify certain choices in their research design. For example, for case study research they may purposefully select cases of countries where open government data progress is high or low, following a certain logic. Before our study, these scholars lacked an up-to-date overview of the different and similar metrics and methodologies used to measure and rank the progress of governments in open government data initiatives, as well as an overview of the differences in the ranking lists. Our study now provides scholars with this updated overview so that they can more effectively assess the strengths and weaknesses of each benchmark a) itself, b) compared to other benchmarks, and c) over time. We updated the open government data benchmarks described by Susha et al. (2015) by adding the new benchmarks developed since 2015 and updating the information about benchmarks that already existed.
This study also contributes to practice. First, practitioners employing open government data benchmarks can take our findings concerning existing benchmarks' strengths and combine them to develop a more comprehensive benchmark or create more comprehensive approaches for measuring governments' progress in existing open data initiatives. Our study showed that most benchmarks only address one specific aspect of the open data lifecycle (e.g., only the provision of open government data, only the use of it, or only the potential value that might be created with the data). Moreover, the actual use of open data is measured only superficially and mainly at the country level. The findings of open government data benchmarks only paint part of the picture, and developers of open data benchmarks need to be aware of this. Second, policymakers concerned with open government data can use our results to understand better why a particular country is ranked in a certain way and which metrics and methodologies have been used to arrive at the final benchmark score and ranking lists. This information is essential to identify the measures that could be taken to improve governments' progress in open data initiatives and propose appropriate measures for implementation. These findings should ultimately lead to more value creation from open government data, including increased transparency, trust, innovation, and economic growth.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.