Auditing A methodology to measure the quality of tax avoidance case studies: Findings from the Netherlands

In recent years, there have been substantial efforts to combat corporate tax avoidance. These efforts have been propelled in part by mediatized case studies, conducted by non-governmental organizations (NGOs) and other groups, on the tax avoidance practices of multinational enterprises. These case studies have been criticized because they allegedly lack quality, but this criticism has not been assessed academically. This research seeks to address that gap. It proposes a new methodology to analyze the quality of these case studies systematically. We construct ten indicators related to alleged weaknesses and use these to assess 14 case studies involving Dutch corporate entities. We ﬁnd that the quality of these case studies is affected negatively by a lack of adequate data. Thus, if companies and governments enhance transparency, this could increase the quality of case studies by NGOs and other groups. In addition to this, we ﬁnd that the NGOs and other groups themselves can sometimes increase the quality of their case studies by investing in technical expertise and adopting practices that foster objectivity. The methodology developed for this study could also be of use for other topical debates, such as disclosure requirements and corporate social responsibility reporting. Whereas we were able to address challenges related to internal validity, the external validity of the ﬁndings could still be improved by extending the selection of case studies. © 2020 The Author(s). Published by Elsevier Inc. This is an open access article under the CC license


Introduction
In recent years, the area of international taxation has become increasingly prominent in political and public discourse, both on the national and international stage. Much attention has been given to the corporate tax position of multinational entities (MNEs) and the broad and often rather abstract notions of "aggressive tax planning" and "tax avoidance". The debate on international tax is not static and can differ by country. Furthermore, it tends to center around specific instances regarding aspects of the tax system or the behavior of specific taxpayers. In the Netherlands, issues such as the use of mailbox companies, confidential tax rulings, and unlawful state aid are extremely topical. The public debate has been shaped in part by the influence of non-governmental organizations (NGOs), such as Oxfam Novib, ActionAid, and the Tax Justice Network (TJN).
Many of these NGOs entered the tax debate from the perspective of developing countries. Tax avoidance in developing countries was considered to be especially problematic in light of high levels of poverty in developing countries and the need for these countries to invest in sectors as infrastructure, education, and health care. NGOs have contributed to this debate in several ways; crucially, this includes research into concrete instances of (alleged) tax avoidance by a specific MNE. Such research tends to focus on specific aspects of the worldwide tax structure of an MNE, considering in particular instances where income is subjected to a very low effective tax rate. Case studies are a key ingredient for the public debate. They illustrate the impact of tax rules, which are frequently complex and abstract, and in doing so make it possible for people who do not have a specialist background in tax law to participate in the debate. NGOs provide simplified yet relevant examples, sometimes involving a well-known brand (e.g. Grolsch beer), which provide a clear starting point for further discussion. This dynamic allows NGOs to exert a degree of influence over the subject matter of the debate, and the framing of the issue of tax avoidance. The case study approach by NGOs was developed as a way of circumventing problems in the availability of data (Finér & Ylönen, 2017).
As far as we are aware, no previous academic research has tried to systematically answer the question: what is the quality of these case studies? In light of the significance of case studies for the debate on tax avoidance, the absence of such research is noticeable. This last point is underlined by the fact that case studies have, in the past, been criticized for their (alleged) lack of quality, e.g. with respect to their (alleged) bias and lack of expertise on the part of the authors who carried out the case study. However, as far as the present authors are aware, case studies have not yet been analyzed from an academic perspective. This may, in part, be due to the absence in the international tax and accounting literature of a methodology that could be used to complete such an analysis in a meaningful manner.
The principal aim of this article is to assess the quality of case studies (also referred to as 'reports'). We seek to contribute to the existing international tax and accounting literature by providing a new methodology for assessing the quality of tax avoidance case studies. To enable sufficient depth, we have limited the testing of the methodology to case studies with tax avoidance structures involving Dutch corporate entities. This study also aims to provide a positive stimulus for the quality of future case studies by NGOs. It is to this end that we include a number of concrete recommendations in section 5.
The structure of this paper is as follows. Section 2 outlines the analytical starting points. Starting with the definition of the term "tax avoidance", it then goes on to describe the current international tax debate and the role of case studies in that debate, as well as the relevance of the quality of these case studies to NGOs' effectiveness. Section 2 closes with an overview of the debate on the quality of these case studies. Based on this overview, the participative, inclusive, and iterative quantitative case study methodology is developed in section 3. Using that methodology, section 4 systematically analyzes the 14 case studies that involve the Netherlands. The quality of the case studies is appraised in detail from the perspective of three elements: the lack of adequate data, the lack of expertise, and the lack of objectivity. Section 5 discusses the findings and provides several recommendations to increase the quality of the case studies. One of the main recommendations concerns the availability of data, in which companies and governments have a role to play. Section 5 concludes with remarks on the role of these case studies in the future. Interestingly, since the methodology was developed and applied to the case studies in the Netherlands, some of the quality standards that were within the sphere of control of the NGOs appear to have been partially adopted by NGOs.

The controversial concept of "tax avoidance"
There is currently no consensus among tax experts on the meaning of the term "tax avoidance" and several related terms, such as "aggressive tax planning" and "tax dodging" (Heitzman & Hanlon, 2010;Kovermann & Velte, 2019;Wang, Xu, Sun, & Cullinan, 2020, forthcoming). In its Glossary of tax terms, the Organisation for Economic Cooperation and Development (OECD) described tax avoidance as follows (OECD, 2020): a term that is difficult to define but which is generally used to describe the arrangement of a taxpayer's affairs that is intended to reduce his tax liability and that although the arrangement could be strictly legal it is usually in contradiction with the intent of the law it purports to follow.
We consider this to be a useful starting point for this paper, although it should be made clear that the OECD description is not comprehensive, universally accepted, or indeed unambiguous. For example, if a country intentionally adopts tax legislation that enables MNEs to reduce their tax liability abroad, such arrangements may still be regarded as tax avoidance from the perspective of other countries. Sometimes alternative terms with a similar meaning are used, such as "aggressive tax planning". This paper does not aim to define the concept of "tax avoidance" or similar terms. Consequently, no claims are made regarding whether the tax structures explored by NGOs in the various case studies constitute "tax avoidance" in the eyes of the present authors. For present purposes (i.e. developing a methodology to assess the quality of case studies), it suffices to establish whether a particular case study defines key terms and concepts (such as "tax avoidance" or "aggressive tax planning") and whether that study includes a legal analysis of the relevant tax rules. The objective is to assess to what extent the case studies are supported by relevant legal reasoning, taking into account the claims made by NGOs and mindful of the risk of judging ex-post with reference to legal developments, including terminology and definitions, postdating the publication of a case study.

The international tax debate
In a relatively short period, issues that were previously mainly of interest to specialists, have become "topics of public interest and popular media reporting", which have resulted in "new political initiatives [which] have also progressed at an unexpected high pace" (Forstater & Christensen, 2017, p.3). This is evidenced by the OECD's ongoing work on base erosion and profit shifting (BEPS), which was launched in 2013 and which seeks to combat certain forms of harmful or otherwise undesirable international tax planning by MNEs.
The European Union (EU) provides another interesting example. In 2012, the European Commission announced a clampdown on tax evasion and tax avoidance (European Commission (EC) (2012)). To date, this has resulted in a series of new measures, including the automatic exchange of tax rulings between EU Member States and the Anti-Tax Avoidance Directives, requiring the phased introduction of a selection of anti-abuse measures from 2019 onwards. Since 2013, the European Commission has also been countering tax avoidance via the EU state aid rules.
There has been considerable media attention around the issue of tax avoidance. A series of data leaks (e.g. LuxLeaks, Panama Papers, and Paradise Papers) based on research by the International Consortium of Investigative Journalists (ICIJ) has been vital in this respect, revealing the scale of offshore tax planning and providing concrete cases involving high-profile persons and companies (Oei & Ring, 2018).

Significance of case studies in the international tax debate
NGOs play a role in the international tax debate. Examples of this include participation in public consultations (Seabrooke & Wigan, 2016, p. 367) as well as other types of lobbying and advocacy activities. NGOs also have a degree of media presence, regularly as a result of the research (case studies) that they publish. The rise in the number of times that the Tax Justice Network is cited in the press has been substantial (Dallyn, 2017). Although such outwardly observable facts do not reveal the exact impact and influence of NGOs on the tax debate, it is clear that their role in, for example, the BEPS Project has been significant. It is also worth noting that two of the European Commission's recent investigations into unlawful state aid and MNEs were the direct result of case studies by trade unions and the Greens (European Commission, 2015a; The Greens/European Free Alliance, 2016c).

Three areas of criticism of case studies
The effectiveness of the case studies as a means of furthering the position of NGOs, will depend to a great extent on whether other actors in the international tax debate regard the case studies as being sufficiently authoritative (cf. Seabrooke & Wigan, 2016). Consequently, the quality of the case studies is extremely important. If other actors in the debate, such as policy-makers or the media, do not accept the validity of the claims, this reduces their effectiveness furthering the position of NGOs. While the quality of particular case studies has received praise (Gunn, 2015), there are also concerns about the quality of case studies in general. In the next section, we provide an overview of these criticisms. While there are potentially other yardsticks with which to assess the quality of case studies, we focus primarily on the abovementioned criticisms. This approach aims to avoid developing criteria which seem relevant from the vantage point of the 'ivory tower', but that are not deemed to be significant by the actors involved. Based on a high-level analysis of the Dutch media coverage of the international tax debate, the present authors distinguish three main areas of criticism of case studies. The present authors do not claim that this is a comprehensive overview, as other types of criticism are voiced, albeit less often and less coherently. For the present purpose (i.e. the development of an assessment methodology), we hold the view that the identification of three types of criticism is sufficiently detailed to allow for a meaningful analysis of case studies. The first point of criticism concerns the availability of in-depth information and data needed to carry out a case study. More specifically, the point of criticism is that available information and data is insufficient as a basis for the analysis made in the case study (area 1). The second point of criticism concerns the alleged lack of objectivity of the authors of each case study (area 2), whereas the third point pertains to their alleged lack of expertise (area 3).
The three areas of criticism are distinct, yet sometimes overlap, and can be mutually reinforcing. For instance, a lack of objectivity on the part of the authors of case studies could potentially contribute to complacency on the point of technical rigor. In turn, a lack of relevant data could complicate a proper technical analysis, as the authors may not be aware of all facets of a particular case. In some instances, the boundaries between the different points of criticism may be blurred. For example, if the representativeness of a case study is not explained, should this qualify as a lack of technical expertise or instead as a lack of objectivity? For the analysis in section 3 of this study, we aim to make our choices in this regard as transparent as possible.
2.4.1. Lack of in-depth information and data about the case (area 1) The first area of criticism involves claims regarding the researchers' alleged understanding of the case as a result of a lack of in-depth information. Taxpayers do not typically publish their (entire) tax structure, and relevant information in this regard is often not made public. In the Netherlands, agreements for prior certainty ("tax rulings") provided by the Tax Authorities to taxpayers, are confidential, although, since mid-2019, summaries of these agreements are made public (Snel, 2018).
Rules on confidentiality and privacy can prevent researchers from obtaining the information needed to form a comprehensive understanding of a particular case (Finer & Ylönen, 2017). Although researchers may be able to identify a certain level of tax planning, it is generally extremely hard (or even impossible) to establish the exact tax position of an MNE. Large data leaks occurred in recent years and shed light on certain aspects of the tax planning of a range of companies (Oei & Ring, 2018), but do not alter the fact that in-depth information can be lacking.

Lack of objectivity on the side of the researchers (area 2)
The second area of criticism concerns the objectivity of researchers. The persuasiveness of the case studies is tied to the (perceived) expertise of the researchers, which in turn is linked to notions such as objectivity and expertise (see section 2.4.3). Case studies by NGOs are written in a specific political context and have a particular purpose. It has been suggested that researchers can lack objectivity and may potentially try to find evidence that matches their preconceived conclusion (such as the idea that the Netherlands is a "tax haven"). Instead of promoting "evidence-based policy making", these researchers are thought to be engaged in "policy-based evidence making" (Seay, 2015).
In some cases, the criticism that researchers lack objectivity is made by companies targeted by a particular case study (e.g. by Fleurette Group (2016)). These companies have an interest in the outcomes of the case studies. Indeed, they may even have an interest in discrediting the findings of a case study. Companies have a fundamentally different perspective on their tax position, as they have access to all the relevant facts and circumstances (often in contrast to the researchers who wrote the case study). It is conceivable that honest mistakes and misunderstandings, which are due to a lack of in-depth information on the side of researchers, are viewed by companies as a lack of objectivity.

Lack of expertise (area 3)
The third area of criticism concerns the (perceived) methodological shortcomings of case studies, in particular the (perceived) lack of expertise by researchers of the case study. Tax law is a specialized field of law, characterized by highly complicated national legislation, tax treaties and other international agreements, case law, and jurisprudence, as well as unwritten principles of law and national practice.
International tax law is a specialist field that requires training and experience to navigate. It is often difficult to establish to what extent a researcher of a particular report grasps the legal nuances of the case at hand. The educational background of researchers does not provide a definite answer to this question, if only because the researchers may seek advice from technical experts to supplement their knowledge. The perceived lack of technical skills on the part of the researchers is flagged as an area of criticism (e.g. Paladin, 2015). When discussing a research article by Sikka (2010) on tax avoidance, Hasseldine and Morris (2013), p.10) stated: "We also suspect that the researcher does not fully understand some of the tax arrangements that are discussed."

Methodology
The first section of the methodology focuses on the selection and grouping of the cases. This methodological section subsequently explains why and how we developed a participatory quantitative case study methodology. The third section highlights the development of generic quality criteria. This section is followed by an explanation of how we developed and applied a scoring methodology. This methodological section closes by indicating how the feedback of the researchers and NGOs was included to strengthen the quality of the appraisal methodology and outcomes.

Selection of the case studies
To be included in this review, a case study needs to fulfill the following four criteria. Firstly, it needs to be a standalone case study focusing on tax avoidance or aggressive tax planning. Secondly, the company structure addressed in the case study must include a Dutch entity contributing to a reduction in taxes. Thirdly, the study was published between 2013 and 2017. Fourthly, the study must include a quantitative estimate of the amount of revenue at stake.
Our study focuses on the Netherlands. There is widespread concern about the problematic role that the Netherlands plays in aggressive tax planning and tax avoidance. Furthermore, the European Commissionissued a negative decision on state aid given by the Netherlands to a subsidiary of Starbucks (European Commission, 2015b), which was annulled by the General Court in 2019. More recently, the European Commission opened an investigation into possible state aid granted by the Netherlands to Inter IKEA (European Commission, 2017) and Nike (European Commission, 2019). Dutch legal entities also featured prominently in the tax planning structures made public by LuxLeaks in 2014, the Panama Papers in 2016 and the Paradise Papers in 2017. Studies by NGOs have furthermore put the Netherlands in the top three countries contributing to tax avoidance (Oxfam, 2016; Publish What You Pay Norway, 2011).
The Dutch government has vigorously denied claims that the Netherlands engaged in harmful tax practice and the facilitation of aggressive tax structures (Accountantweek, 2015). Over the past ten years, the Netherlands has taken various measures aimed at combatting tax avoidance. These include unilateral measures as well as the implementation of measures following from international agreements. The Netherlands has formally stated that it seeks to combat its reputation as a 'tax haven', as the bad press was negatively affecting the national investment climate (Kahn, 2018).
For the present paper, 41 case studies were identified on companies that were allegedly using the Netherlands because of its tax climate. The search for these case studies started by verifying the publications by major NGOs which focused on international taxation issues in their lobby or campaigns (e.g. ActionAid). Next, an academic keyword search was completed using terms such as 'case studies' and 'tax avoidance' to identify additional academic case studies. A more general search was also completed of the websites of the Dutch and European parliaments, to determine if there were any other case studies mentioned in the parliamentary proceedings that were neither published on the websites of NGOs nor in academic journals. Finally, we checked for additional cases featured in the Dutch media around key media moments (e.g. LuxLeaks). After identifying 41 cases in this way, we applied the four selection criteria mentioned above (see Table 1; a score of 1 means that a criterion is met). This last step resulted in the selection of 14 case studies.
The 14 case studies are clustered into three subgroups based on their type of authors, where they were published, and their (implicit) objectives (see Table 2 below). Three of the case studies were commissioned by a political party, namely the Greens/European Free Alliance (EFA) (this is a relatively small political party in the EU and has tax justice as one of its priorities). These case studies were presented on the website of the political party. The main objectives of these case studies appear to have been to promote a debate in the European Parliament and to obtain stricter rules against tax avoidance at the EU level.
The second subgroup consisted of three case studies eventually published as articles in Critical Perspectives on Accounting. We included the case studies in the second subgroup because they are based on reports written for NGOs. Although the objective of these articles was to contribute academically to the research field on tax avoidance, the objective of the underlying case studies was comparable with the objective of the other case studies by NGOs included in the scope of our study. The assessment of the case studies included in the second group was based on the combined publications, i.e. the academic articles were regarded as complementary to the original case studies written by the NGOs.
The third and largest subgroup consisted of the case studies written for an NGO and then published on the website of that NGO, without an accompanying academic article. The main objective of these case studies was to create public awareness of the issue of tax avoidance. As will become apparent, a detailed analysis of the quality of the case studies shows substantial differences between the three different subgroups of case studies. Table 2 shows the key elements of the 14 selected case studies.

Development of a participatory quantitative case study methodology
One of the main challenges of the present paper was to develop a method to assess case studies. Most of the case studies analyzed in the present context claimed to be intended as either policy documents for advocacy or a form of investigative journalism, and are not intended as academic studies. This raises a number of questions. Is it even possible to develop a uniform set of quality criteria? Is quality not socially constructed in context-specific ways? After all, the intended functions and audiences of the case studies vary. The differences in intended functions of the selected case studies are relevant when considering the quality of case studies. For example, an NGO might select its case not on academic grounds, such as representativeness, but because of the scope to propel "emancipatory action". 1 We are, however, of the opinion that it is justified and important to develop quality control. These case studies had a substantial impact (see section 2.3) on the public debate and aimed to influence policy. Seabrooke and Wigan (2016) demonstrated that international tax debate experts, such as NGOs, can be effective in this regard if they can combine claims to moral authority with demonstrations of expertise. The aim of developing this methodology for assessing the quality of case studies, is to enable experts to strengthen their (demonstrations of) expertise further and contribute more effectively to emancipatory action.
The methodology developed in this study combines a quantitative and a qualitative approach to assess the quality of case studies. It is quantitative in the sense that it critically examines quantitative statements in reports from NGOs, but does not involve interviews or participant observation as a research method. The methodology is qualitative since it takes case studies as the starting point and then aims for an in-depth understanding thereof.

Development of generic quality criteria
It is necessary to assess each case study in the appropriate light, while at the same time providing an analytical framework that allows for a comparison between these case studies. In order to achieve this delicate balance, the present authors designed a set of quality criteria which are, firstly, based on (A) the standards that the NGOs themselves have set in their codes of conduct (Partos, 2012;Stiching Onderzoek Multinationale Ondernemingen (SOMO, 2006). Secondly, these standards are   combined with (B) some more general scientific research norms based among other things on Popper's falsifiability premise (Popper, 1959), Kuhn's notion on the importance of expert reviews (Kuhn, 1996), and the notion of 'construct validity' (Trochim, 2006).

Development of detailed appraisal guidelines
The three areas of criticism mentioned above are broken down into ten indicators. The ten indicators are subdivided into 23 questions (see Table 3). Detailed descriptions were developed for each possible score (0, 0.5, or 1 point for every question) to facilitate an equitable assessment method for all the reports and are found in the Appendix. The 14 case studies were analyzed using these quality criteria.
Once the first version of the research methodology had been developed, it was tested and further developed in three ways. First, the methodology and the appraisal guidelines were shared with the researchers and the organizations that wrote the case studies. Based on their responses, the methodology was refined.
Second, the evaluative questions were answered by two of the present authors independently for all the case studies to determine whether there were any discrepancies between the two authors. As some discrepancies were found between the appraisals, the scoring guidelines were made more detailed, and it was decided that all the remaining studies would continue to be assessed by the two authors independently from each other. The differences were reconciled afterward.
Third, a round-table discussion was organized with fiscal experts with diverse backgrounds (e.g. academics, tax professionals working in a corporate environment, civil servants, and experts affiliated to NGOs) to discuss the methodology of this present paper and our initial findings, and to develop meaningful recommendations, which are presented in section 4. Based on this round-table discussion, the possibility of an intermediate scoring of 0.5 with corresponding guidance was added for certain questions (4b, 7a, and 9c) which were initially scored as 1 or 0. Furthermore, a question was added about the legal analysis of the applicable tax rules in the case study (8c).

Appraisal of the quality of case studies and verification of the findings
After the initial appraisals, the findings were shared with the researchers and the organizations, who provide a response. Sending draft findings to the subjects of research and requesting their feedback on these findings is included as a quality assurance step, in line with Malsch and Salterio (2016), p. 13). Nearly all the researchers responded.
In some instances, the researchers nuanced some elements of their appraisal. One researcher also responded by sharing a comprehensive methodological background report and all the underlying company accounts. When such additional information was verifiable or credible, and the quality appraisal guidance did not require that the information was publicly accessible or included in the case study itself, the scoring was reviewed. In a few cases, this had a substantial impact on the assessment of the case study and this proved an important finding in itself: sometimes the quality of a case study is higher than it seems at first sight, since some case studies omit details about the methodology and research process. Table 4 provides two types of information. On the left-hand side, it provides information regarding the average scores for each of the ten quality criteria -broken down per subgroup of research. The highlighted elements are those that score 0.5 or below (when rounded to one digit), and which, therefore, indicate that there are concerns about the quality of the research on these elements. For example, the average score for the seven case studies written by NGOs (listed in the central column) for the criterion 'access to data' (first row) is: 0.3. For this criterion, which consisted of just one question, none of the NGO case studies scored a 1, four scored a 0.5, and three of them a 0: which comes to a 0.3 score for this criterion. If a criterion consisted of multiple questions, the scores on the questions were averaged (equal weight). This low outcome is in our view problematic. The second type of information is on the right-hand side of the table, namely the average scores for each of the 23 questions. The average is calculated by taking the means of the 14 case studies broken down per question. Below are the most salient findings per quality criteria.

Access to data
Regarding the access to data, most of the cases surveyed signaled limitations in terms of access to relevant data. An important source of information is the annual accounts of individual subsidiaries of MNEs. Such accounts are more readily available in some jurisdictions (e.g. the Netherlands and Luxembourg), than in others (e.g. Switzerland and Curacao). Therefore, an NGO may have access to information for one part of the structure, but not for another part.

Quality of data
As regards the gathering of information, in most of the cases surveyed, the authors of the case studies indicated difficulties regarding the quality of the information available. A recurrent theme is the level of detail available regarding the accounts of  subsidiary companies within the MNE. Consequently, for most studies the score with respect to the data quality and quantity is 0.5 or below, indicating that data limitations are major elements negatively influencing the quality of these studies. Looking at case studies that were not selected, the data limitations reported are even more substantial, because cases with the most severe data constraints typically did not allow for quantitative estimates (see Table 1). Moreover, the authors of case studies written by NGOs indicated that they had sometimes explored the possibility of researching other companies than the MNE ultimately analyzed in the case studies, but that this had not resulted in case studies for publication as they did not have sufficient data to carry out such an analysis. In our view, this merits the conclusion that the issues regarding the lack of access to data cannot be blamed on the researchers, as they aimed to find case studies with enough data. Instead, the data constraints limit the possibilities for high-quality case studies. Based on our research, the authors of case studies seem to have pursued cases despite the existence of data issues, for the simple reason that it is exceptional to find examples of tax avoidance without any problems regarding data access and quality.

Lack of objectivity
The second set of indicators used to analyze the quality of the case studies concerns the issue of objectivity in a broad sense, including openness to external scrutiny. Overall, the scores are relatively high. Of the three constraining factors identified in our methodology, a potential lack of objectivity has the smallest impact on the research quality of the selected case studies. Still, some elements leave room for improvement, especially as regards the right of reply for case studies published by a political party.

Transparency on methods, techniques, and data
The first indicator of objectivity concerns the transparency of the research, allowing readers to ascertain whether the data was used and interpreted in an unbiased way. It consists of two questions: (1) Are the methods and techniques made explicit? (2) Are the (underlying) data available in an accessible format? The latter question is about the primary data used and whether these are in the public domain or obtainable on request. All of the case studies contained at least an indication of the methods, techniques, and data used, although they were incorporated in various different ways. Examples of this include incorporation in the main text of the case study report, as a table, in footnotes, or as a special section. Sometimes the full calculations can be obtained on request. The level of detail varied considerably. Most reports included a full description of methods and techniques. Some reports included an explanation of only some specific aspects of the calculation. All of the case studies surveyed documented their sources. Sometimes the sources (primary materials) were available directly, which greatly simplified the process of understanding the case studies. Sometimes the data can be provided on request. Access to information can be limited in cases in which the NGO provides a link to a website requiring payment.

Right to reply
The second indicator of a lack of objectivity is the right to reply for the subjects of case studies. The "right to reply" is understood to mean the possibility for a company to have insights into a case study before publication and to have the possibility of submitting additional information. The right to reply is assessed using the four following questions: (1) Did the company have the right to reply? (2) If a company had the right to reply, did it have at least two weeks to reply? (3) Was a reply integrated into the case study (if this is possible)? (4) Is the reply publicly available? Eleven of the 14 case studies referred to the right to reply or indicated that the companies were approached for a reaction. Interestingly, the three instances in which a case study did not refer to the right to reply were all the case studies published by the Greens/EFA. It is possible that the authors working for this political party did not ask for a reply because its reports were only intended to draw attention to a broader problem, illustrated by the case studies, and to move that problem up the political agenda. It was noticeable that none of the studies included a clear deadline within which a company should respond. We identified two cases in which the amount of time allowed for a reaction seemed too short. The possibility of accessing company responses to third parties varied among the studies. In seven cases, the reports referred to a website containing the replies of the companies. By 2018, however, the links included for three of those seven studies were not operational or did not provide the relevant information. The extent to which replies from companies were included in the case studies depended on their nature and level of detail. The most important observation in this regard is that the majority of studies that expressly dealt with the replies of companies did not address all of the issues raised in the replies.

Review by independent experts
The third indicator is the extent to which the case studies were reviewed by independent tax experts. In the case of academic studies, this is likely to be through peer review. In other cases, the indicator is applied with reference to a suitable third party. Ten of the case studies explicitly referred to some form of expert review. In two other cases, information on an expert review was provided later by the researchers in response to the draft findings of our study. Not surprisingly, the cluster of case studies that were also published as an academic paper scored higher than the other clusters.

Conclusions and recommendations are based on the research
The fourth and final indicator regarding the issue of objectivity concerns the conclusions and recommendations of the case studies. It consists of three questions: (1) Are the conclusions based on the research itself? (2) Are the limitations to the research made explicit? (3) Are the recommendations a logical follow-up from the findings? In four cases, the conclusions were indeed limited to the research. However, in the remaining ten cases claims were made that were not substantiated by the actual research. For example, almost all of the case studies mentioned the limitations of the their case study into tax avoidance. The case studies, or the broader reports of which they are a part, contained a broad variety of recommendations directed at host country governments, other governments, the companies themselves, and international organizations. In many studies, most recommendations followed logically from the case, but were generalized to other comparable situations. However, some studies also contained unrelated recommendations.

Explanation of the choice for a particular case study
This indicator consists of two questions: (1) Was the selection of the case study explained? (2) Was the representativeness of the case study explained? This indicator sheds light on the external validity of the case study. The majority of the case studies surveyed contained some explanation regarding the selection of the case. In some cases, the trigger was a specific public statement about a company (e.g. a statement in the European Parliament). In other instances, the selection was linked to a more general public outcry over tax avoidance. Some of the cases were chosen as a follow-up to a previous investigation into a particular company. The availability of information was also cited as a reason for selection. Our review included four case studies that form part of broader publications. Such broader publications provided a good overall explanation of how the various case studies were selected, thus leading to maximum scores for this question. By contrast, some studies did not provide any explanation for why a certain case was chosen. All academic articles included a detailed discussion of the representativeness of the cases, whereas all other publications contained only a brief discussion or, more often, nothing. On the specific point of the explanation of representativeness, the different nature of academic publications is conducive to higher quality.

Definition of concepts and legal analysis
The second indicator concerns the use of concepts and legal analysis in the case studies. It consists of three questions: (1) Are the key concepts defined? (2) Are they used in accordance with the current terminology? (3) Does the study include a legal analysis of the applicable tax rules? The term "current terminology" refers to international usage, for example, in the context of the OECD's work on taxation. This question was included to flag instances where usage in case studies appears not to be in line with the usage which might be expected from an international tax perspective. For all three questions, the reports published by the Greens/EFA overall scored much lower than the other studies. Almost all case studies paid specific attention to the usage of technical terms. Most studies defined or explained the key concept in the main text, boxes, or footnotes, sometimes with illustrations. A few NGO reports also included a separate glossary. In the majority of cases, the concepts used in the case studies surveyed are fully in line with the common usage of these terms in the field of tax law. Most case studies provided either a comprehensive analysis of the applicable tax legislation or none at all.

Calculations and counterfactual of tax avoidance estimates
The third indicator concerns the techniques applied to the quantification of tax avoidance used by the authors of the case studies by NGOs. Whereas the previous indicator focused on legal aspects, this indicator concentrates on economic aspects. The quality of the quantitative techniques was assessed using the following four questions: (1) Are the calculations correct and sufficiently substantiated? (2) Is the counterfactual made explicit? (3) Is the counterfactual reasonable and complete? (4) Do the researchers have an explicit and correct strategy for dealing with missing data? There was variation between the case studies regarding the methods used for quantifying tax. It was noticeable that in a limited number of cases, it was not possible to reconstruct the calculations made by the researchers. Certain case studies contained minor errors in calculation or followed a questionable methodology for assessing tax avoidance. In some cases, assumptions were made that can be considered untenable. Surprisingly, in one case, the researchers made a calculation error that suggested a less beneficial tax outcome for the company. Nearly all the case studies contained a clear counterfactual for each tax avoidance element. Only in two cases the counterfactual was not presented clearly or was clear only for part of the calculations. Overall, for the cluster of reports published by the Greens/EFA, the quality of the counterfactuals is much lower. Almost all case studies surveyed had an explicit and correct strategy for dealing with missing data.

Quality control and robustness
The fourth and final indicator of the expertise of the researchers is the quality control of the case studies. This indicator was considered with reference to the degree of uncertainty mentioned in the case studies and the triangulation of data with other sources. The degree of uncertainty can be based on alternative assumptions or calculation methods, thus involving robustness checks that use different specifications to estimate the amount of tax avoidance. Although the case studies did not use econometric models that could be run with alternative specifications, some studies contained calculations that could be performed using alternative assumptions. However, six case studies presented an amount without any such qualifications. None of the case studies sought to quantify an exact margin of error. The efforts to triangulate data for one case were rather impressive, involving interviews with government officials in various countries and even undercover research to investigate the provision of intra-group services. Some researchers described their attempts to collect additional information from other sources, but stated that they were unsuccessful. Only a few case studies did not mention any attempts to obtain other sources.

Discussion on and recommendations for the quality of the case studies
Whereas all the factors had an impact on the overall quality, the data availability and data quality are the only factors that scored an average below 0.5 when all the studies were combined (0.45). Objectivity scored the highest, with an average of 0.68 and none of the individual indicators scoring on average below the 0.5 mark. The lack of technical expertise demonstrated in the studies lies somewhere between the scores with respect to data and objectivity (0.63). This does not mean that no improvements can be made to increase the objectivity of the case studies. It is recommended that all researchers provide companies with the opportunity to respond and give them enough time to do so. It is also suggested that researchers make the responses from companies available and show how they have incorporated them into their case studies.
The individual indicators that scored below 0.5 on average across all the types of studies are the data availability, the discussion on the selection of case studies, and the indication of a degree of uncertainty. More scores would be below 0.5 if they were based on only public information. Therefore, it is recommended that researchers make more of the data and methodology available to the public, such as in separate background papers. The three scores that are below 0.5 are the focus of our recommendations.
Regarding the availability and accessibility of data, both governments and companies have a role to play. The quality of the case studies could likely be strengthened if more and better information were available and more easily obtainable by researchers. Crucially, governments could require the filing of unconsolidated annual accounts of individual entities, with explanatory notes, and ensure that these can be obtained easily for free or at a low cost (Koch, 2017). This applies to all countries, including developing countries as well as countries used in corporate tax planning structures. For countries where this is usually the case, such as the Netherlands and Luxembourg, the data available for the case studies is relatively good. For countries where such filings are not available, such as Switzerland and Curacao (which surfaced regularly in the selected case studies), the data availability is highly problematic.
Furthermore, companies themselves could publish key data, such as relevant financial figures on a country-by-country basis. The Global Reporting Initiative (GRI), which includes sustainability reporting standards that are widely used, provides some guidance for this in the guise of a new tax standard. Moreover, recent examples such as Shell's publication of comprehensive country-by-country data for 2018 demonstrate that it is feasible. Companies could also provide relevant explanations about transfer pricing, group financing, intellectual property (IP), the use of tax treaties, the level of transparency with foreign tax authorities, and the use of tax holidays, tax rulings, and advice by tax advisors (e.g. position papers, memoranda, and legal opinions). In all cases, the publication of such information needs to be done in a format that is accessible and comprehensible to researchers without a specific background in international taxation. We note that the nature of the information needed will depend on the research objectives of a particular study. Combined with companies' legitimate concerns about confidentiality and issues of efficiency, a good approach to increasing the availability of data is for companies to provide information on request, allowing the researchers to ask questions. This would also increase the efficiency of the research process. All case studies would have benefitted, to varying degrees, from more and more readily available information.
Regarding the representativeness of the case studies, researchers can improve their approach. While it can be logical for researchers to select cases based on their emancipatory potential, instead of their representativeness, they could make more efforts to describe and document the degree to which the selected case is exceptional or could be exemplary of a broader situation. For example, this could be achieved by systematically comparing the global effective tax rate (ETR) with that of peer companies. A comparison of the total taxes borne by an MNE could also contribute to a better understanding of the tax position of a particular MNE relative to its peers. Furthermore, we note that a breakdown of different types of relevant taxes could contribute to a more nuanced analysis. For example, in the case of the extractive industry, it would be helpful to distinguish corporate taxes on the annual profit from extractive taxes levied over the exploitation of finite natural resources. This could enhance the external validity of the case study.
Lastly, regarding the degree of uncertainty of case study finding, more use could be made of alternative estimates to highlight the sources of uncertainty and indicate a range of tax revenues forgone rather than a single figure. This would also reduce the risk of misinterpretation or exaggeration from treating an estimate as a hard figure.
In short, for the quality of the case studies to be improved, the companies and the governments are in the driving seat. Nevertheless, NGOs could themselves be more accountable and transparent about, for instance, the methods and data they used. Instead of always being publicly available, sometimes this information was only available on request.

Limitations of the current research
In our study, we were able to reach several conclusions. For instance, on average, the quality of the case studies by NGOs is better if collaboration is sought with academics. Furthermore, the quality of case studies by NGOs is higher than studies of political organizations. Of course, certain limitations apply with regard to the internal and external validity of our findings (Trochim, 2006). We sought to minimize the adverse impact of these limitations in a variety of ways.
Because of the iterative methodology of our research, the internal validity of the research was strengthened. The 'feedback loops' with both a wider group of experts in the tax field and the research subjects reduced the possibility that the outcomes could be attributed to alternative explanations that are not explored in the study (Girden, 2001). Using 10 indicators and 23 questions, we sought to develop an encompassing definition of quality. The risk of potential biases was further reduced because the present authors come from very different professional backgrounds (including a former employee of an NGO, an international tax specialist with a background at a Big Four firm, and a senior civil servant with expertise in the field of international development).
To ascertain whether a study has external validity, the question which must be answered is whether the findings apply to other research subjects whose place, times, and circumstances differ from those of the subjects who participated in the study (Research Connections, 2016). A study's external validity is closely related to the generalizability of the findings. In this regard, we face larger problems as our sample size (N) is relatively low (14, and for two subsets of case studies, those performed by political parties and by academics, N is just 3), even though -to the best of our knowledge -our selection covers all tax avoidance case studies with a quantitative estimate published during 2013-2017 and including a Dutch element. In 2018 and 2019, new case studies by NGOs with tax avoidance estimates were published, for example by Oxfam about US and Australian firms. Not all of these involve Dutch entities. This raises the question of whether and to what extent the current findings can be confidently generalized to such later case studies. Furthermore, the question arises as to the degree of certainty with which our findings regarding the (lack of) quality of the studies by the political parties can be transferred to other studies by political parties. We would argue that modesty is required on the point of external validity. For instance, all three studies by political foundations that we analyzed were produced by the same organization, which reduces the robustness of the findings.
One key way to enhance the external validity, would be to increase the number of observations. For example, this could be achieved by assessing more recent case studies, including some without Dutch entities. Lack of data might then emerge as an even larger quality constraint because the availability of detailed unconsolidated financial accounts was relatively good for Dutch entities. Most of the assessment framework (indicators 1-8) could also be applied to case studies without a quantitative estimate. In any case, the quality of guidelines that implicitly follow from the assessment framework also seems valid for new case studies.

Potential for scale-up of current methodology to NGO research focusing on other domains
NGOs publish case studies on many different aspects of (alleged) corporate malpractice. This can range from the impact on deforestation to issues involving labor conditions, and from human rights abuses to irregularities in corporate social responsibility reporting. We note that there is little systematic quality appraisal for research, even though our study shows that much could be learned from such an appraisal. Can the appraisal methodology developed by us also be used to appraise the quality of case studies outside the field of tax avoidance?
Let us first recap the key steps and features of the present methodology. The first step was the development of generic quality criteria (section 3.3), the next step was the development of detailed appraisal guidelines (section 3.4), and the last step was the appraisal of the quality of case studies and verification of the findings (section 3.5). Throughout all steps, a participative, inclusive, and iterative approach was taken. It was participative due to the extensive use of quality criteria proposed by the participants of the study (the researchers). Inclusiveness was guaranteed by the diversity of experts involved in the quality control system: fiscal experts from all key domains participated (e.g. academics, tax professionals working in a corporate environment, civil servants, and experts affiliated to NGOs). The approach can be regarded as iterative as the scoring methodology was adapted after trial appraisals.
We think that the current steps and features of our methodology could also be helpful to appraise the quality of case study research on other business practices. They could also be linked to the field of audits and (social and environmental) accounting where there is substantial academic debate on disclosure (e.g. Elkins & Entwistle, 2018) and on corporate social responsibility reporting (e.g. (Cai, Lee, Xu, & Zeng, 2019)). The methodology could also potentially be modified to assess company case studies in other domains, such as human rights and environmental issues, while preserving its participatory and inclusive features. For other domains, it may be necessary to verify the key factors that affect the (perceived) research quality of case studies. However, some factors identified appear to have broader applicability and could, therefore, form a useful starting point.
The degree to which our generic quality criteria and detailed appraisal guidelines can be transposed and/or require adjustment to apply to non-fiscal case studies differs per criterion. The four indicators with respect to objectivity (i.e. transparency on methods, techniques and data; the right of reply for the subject of the case study and the integration of the reply in the report; peer review; and findings which are limited to the actual research) appear important in all domains. It seems reasonable to expect that the two criteria relating to data (i.e. data quantity and data quality) are also crucial for other types of case studies. However, for issues of a more qualitative nature, such as harassment of workers, it may be the case that appraisal guidelines would need to put more emphasis on the reliability and interpretation of the data. The four indicators for 'expertise' (i.e. the explanation of the selection of case studies; the inclusion of a definition of key concepts and legal analysis; the quality of calculations and counterfactuals; and quality control) are more closely linked to specifics of the tax avoidance debate. For the appraisal of case studies about human rights, the use of a counterfactual may be less relevant depending on the research method involved. By contrast, it is important that such case studies properly discuss to what extent human rights issues are structural and directly result from company policies and practices. Thus, the appraisal guidelines can be tailored to specific expectations with respect to research quality in other domains.

The future of case-study based tax avoidance research
We aimed to contribute to the international tax and accounting literature by providing a tool to answer questions relating to the quality of tax avoidance case studies. We found that the use of case studies frequently raises questions regarding the (lack of) available data. Indeed, the use of case studies as a way of exploring tax avoidance seems to be linked precisely to the need to circumvent data problems (Finér & Ylönen, 2017). Does this mean that case studies on tax avoidance are a "dead-end street" with little future relevance? We would argue against this. While we are in certain respects quite critical of the quality of the case studies, we are hopeful that their quality will improve. Amongst other things, this is because more data is becoming available online, for instance, data related to the issue of beneficial ownership of companies. To stimulate the uptake of our findings, we organized various presentations on our methodology and the findings to potential users. Interestingly, one organization of which the reports were assessed has started integrating some of the recommendations resulting from this research (SOMO, 2018). For example, the researched company received a month to respond to the report, and its response letter was published on the website of SOMO.
In this present paper, we developed a methodology and subsequently applied that methodology to a set of tax avoidance case studies carried out by NGOs and other groups. While there was already criticism of such case studies by the affected companies and some journalists, there was not yet academic research into the quality of these studies, in part because a suitable methodology was lacking. By developing criteria and guidelines which can potentially be used to appraise case studies in other areas of corporate behavior, this study has contributed to filling that gap. The findings on case studies of tax avoidance show ways to enhance the quality of research, while recognizing that governments, companies, and the researchers themselves all have a role to play.