1 Introduction

Social coexistence is based on a common agreement about the correctness or at least the validity of practical norms. The justification of norms is the core task of practical philosophy. However, there is no consensus in the scientific debate about how norms should be justified. Within the debate on the justification of norms, coherentism and the method of reflective equilibrium (RE) have come to play a significant role (Sayre-Mccord 2007: 141).

The main idea of the RE is to achieve justification by bringing various considerations that support each other into a coherent view. The original RE idea goes back to Goodman and his proposed solution to the problem of induction. Goodman argues that the principles of inductive logic can be justified by a mutual adjustment of rules and accepted inferences (Goodman 1983: 64). His well-known circular argument states that no rule of inference is acceptable as a logical principle unless it is in agreement with what is considered an acceptable instance of inference. At the same time, no view of a particular inference is acceptable if it is inconsistent with rules that are generally accepted as best explaining a wide range of other acceptable inferences (Goodman 1983: 64).

In A Theory of Justice, Rawls adopts this idea of justification and transfers the method to the field of ethics. He uses the method to construct and justify his theory of justice. Here, justification is achieved by bringing considered moral judgments and normative principles into agreement (Rawls 1971). The term “judgments” refers to statements “at all levels of generality” that a person finds acceptable in light of the object of inquiry (Rawls 1974: 8). “Principles” refer to general statements or rules that are normative components of ethical theories. The goal of RE is to achieve a state of equilibrium by bringing judgments and/or principles into agreement. The agreement required for the state of equilibrium is known as coherence. Following the RE idea, the process of justification starts with a search for moral judgments about a subject matter. The next step is to reject the judgments that were not developed under the condition of objectivity and impartiality. Then, the principles that are valid for the case are taken into account. The systematic principles should fit the judgments and be applicable to other cases. Assuming that the initially selected judgments and principles do not fit, the judgments and principles that conflict are adjusted or discarded, and new elements may be considered until there is agreement among the judgments and principles. In this process of mutual adjustment, no element is immune to revision (Rawls 1974: 8). The situation of agreement between judgments and principles is what Rawls calls “reflective equilibrium.“ The reached equilibrium “is reflective since we know to what principles our judgments conform and the premises of their derivation. At the moment everything is in order” (Rawls 1971: 18). The term “equilibrium” refers not only to the balanced status achieved through mutual adjustment, but also to its temporary nature insofar as new information can disrupt the equilibrium at any time.

There are essentially two main distinguishing characteristics of RE. First, the method can be divided into individual and collective equilibria. Equilibria are considered individual when the elements of the levels are determined by a single person. Collective equilibria are a process carried out by a collective, for example, in the context of a public deliberation (Hahn 2013: 438). Second, there is a common distinction between a narrow reflective equilibrium (NRE) and a wide reflective equilibrium (WRE). The difference, according to Rawls, is:

[…] whether one is to be presented with only those descriptions which more or less match one’s existing judgments except for minor discrepancies, or whether one is to be presented with all possible descriptions to which one might plausibly conform one’s judgments together with all relevant philosophical arguments for them (Rawls 1971: 49).

Here, the first form describes the NRE method and the second and more comprehensive variant describes the WRE method.

Since the introduction of the idea of RE, an intense debate has developed in the literature. First, the discussion is about the advantages and disadvantages of the method, and views differ widely. However, numerous works defend the method and recommend its application (DePaul 2011; Knight 2006; Tersman 2018). It is argued that RE is the standard and most common method in contemporary ethics (Brandstedt and Brännmark 2020: 358; Varner 2012: 11). Some go so far as to evaluate this method as without alternative (Räikkä 1996: 173) or as the only defensible method (Scanlon 2003: 149). Conversely, there is a lot of criticism regarding its feasibility and usefulness (Singer 2005). Critical voices point out the risks of relativism and subjectivism and argue that coherence is not a guarantee of moral truth (Blackburn 1993; Brandt 1979; Hare 1972).

In the discussion about the usefulness of the method, various works try to make the idea of RE more precise, develop it methodically, and transfer it to new research fields. The best-known extension of the idea of RE comes from Daniels. According to him, RE is a method of justification. With reference to the work of Rawls, Daniels specifies the distinction between the NRE and WRE. He favors the WRE, which he defines as follows:

The method of wide reflective equilibrium is an attempt to produce coherence in an ordered triple of sets of beliefs held by a particular person, namely, (a) a set of considered moral judgments, (b) a set of moral principles, and (c) a set of relevant background theories (Daniels 1996b: 22).

According to this understanding, the WRE includes the NRE and adds a third level. Daniels considers this extension of relevant background theories to be necessary. This is because background theories can support ethical principles independently of existing moral judgements and justify their acceptance in the face of alternatives (Daniels 1996a: 49).

Concerning other prominent examples of conceptual extensions of the idea of RE, particular mention should be made of the work of Catherine Z. Elgin in the context of epistemology (Elgin 1996) and the network model of Robert Heeger and Theo van Willigenburg (Heeger and Van Willigenburg 1991). More recent developments of the method can be found, for example, in the approach of Hahn, the model of Beisbart, Betz and Brun or in the concept of coherentist business ethics by Rogowski and Rechnitzer (Beisbart et al. 2021; Hahn 2000; Rogowski and Rechnitzer 2023).

Overall, the state of RE research presents a mixed picture, characterized not only by different interpretations and ideas about methodology but also by many open questions. Although the method is discussed in various disciplines such as bioethics, business ethics or law and legal ethics, it is considered unclear in which disciplines and to which subjects the method can be applied. It is also an open question as to which purposes can be adequately addressed by the method. There is an argument that there is no universal RE method. Instead, different versions of the method must be developed for different purposes (Van der Burg and Van Willigenburg 1998: 4). Accordingly, it is questionable which beliefs or which categories should be used as levels. Furthermore, there seems to be no agreement as to which requirements an equilibrium must fulfill. It is discussed whether further criteria should be considered besides the criterion of coherence (Baumberger and Brun 2020; Van Thiel and Van Delden 2009). Also, whether the method can be used purposefully for fictitious as well as real problems is a subject of discussion (Stoner and Swartwood 2017). Based on these numerous ambiguities, the RE method is sometimes not understood as a structural procedure of justification but rather a metaphor of justification (Hahn 2000: 18).

In addition to these open questions, established distinctions, such as the differentiation into NRE and WRE, are also criticized and questioned (Hahn 2000: 242; Holmgren 1989: 59). When RE variants are discussed, the WRE method is usually favored (Hahn 2000: 116). The WRE is considered the more interesting justification method and, according to the literature, is also used more widely (Knight 2017: 49; Räikkä 2009: 51). Given the potentially large number of applications and that the WRE is usually considered the preferred method, this work focuses on the WRE version.

The debate on WRE rarely discusses the successes or the problems and challenges of the applications to date. Consequently, the state of research on the application of the WRE method is surprisingly unclear. So far, there is only one review paper from 2010 by Doorn that deals with applications of the Rawlsian approaches (Doorn 2010a: 132). Doorn identifies 12 papers that use a Rawlsian method, and of these, three applications use the WRE method. Accordingly, Doorn notes that despite the considerable attention Rawlsian approaches have attracted, actual applications are relatively rare (Doorn 2010: 128). There are several reasons, some of which overlap, that make it difficult to provide an overview of the applications of WRE to date. First, the debate is interdisciplinary, and ambiguities exist regarding in which discipline or in which field of ethics the method is applied. Second, there are different versions of RE and the different versions are not clearly defined (Hahn 2000: 242). Third, many philosophical studies omit the method section and, accordingly, the description of the applied method. Due to the vague definition and the different interpretations of the method, it is difficult to determine when an application of the method can be assumed. It is unclear to date how many applications of WRE exist, for what purposes and on what specific topics the method has been applied, how the authors proceeded in the application, and whether they have been successful in their efforts to reach an equilibrium.

To address these open questions, a systematized approach should be selected that identifies transparent inclusion criteria to distinguish between studies that explicitly apply WRE and further work on WRE. In this paper, the method used is that of a critical review.

This critical review aims to conduct a systematized search to identify applications of WRE, provide an overview of the characteristics and methodological details of the applications, and critically contextualize the results within the existing debate. It should be also shown how often, in which disciplines and to which questions the WRE method has already been applied. This review is intended to help understand the extent to which the WRE has been elaborated and established as a justification method in practice. This work may be of interest to those seeking guidance in the discussion of WRE as a justificatory method, whether to assess the justificatory power of WRE as a coherent justificatory method or to gain insights for their own applications. In addition, the paper may be of interest to ethicists who want to improve the method by drawing on previous applications. Of course, it might also be relevant to those who want to investigate whether the criticisms raised against WRE are valid for previous applications.

In this paper, the term “application” is of central importance, so the understanding of the term is briefly described. The term “applications” here refers to studies that not only state the result of the WRE, but also explicitly describe the concrete application steps of the WRE. Accordingly, the more specific term “explicit applications” seems to be necessary in order to obtain a secure database and to avoid pure guesswork about the concrete application steps. Moreover, this concretization is consistent with the understanding of WRE as a method. A method is understood here as a regular or planned procedure to arrive at findings within a science (Apel 2020: 190).

The second section describes the search strategy, the criteria for abstract screening, and the full-text search. The third section “Results” documents the searches based on the flowchart recommended by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) Statement (Page et al. 2021). It also reports on the particular study characteristics, the underlying interpretations of the WRE method, and the specific applications. The fourth section “Discussion” focuses on identifying the potential opportunities and challenges of the method so that possible theoretical extensions can be identified. The final section presents the conclusions.

2 Methodology

This paper is a critical review of the applications of WRE to date. However, the scientific platform for reporting guidelines, the EQUATOR Network (Enhancing the QUAlity and Transparency Of health Research Network), commonly used for systematic reviews, does not have standardized reporting guidelines for the application of the WRE. Therefore, a separate framework was developed for this paper.

Since there are no generally accepted criteria that allow clearly distinguishing between the different variants of the RE idea, different exclusion criteria are conceivable. In this work, the common definition of Daniels is followed and it is assumed that an explicit WRE application exists if the following three aspects are fulfilled: first, the application names at least three levels; second, the category of the levels and the elements taken into account are named; third, a connection between the levels is established. A departure from Daniels’ definition is that this paper does not specify which categories are to be considered on each level. By considering the three levels, a distinction between NRE and WRE is possible. Naming the category and the respective elements ensures that all three levels are included in the process. And the criterion of connection ensures that the search for equilibrium is also given, whereby it is irrelevant whether equilibrium has been reached or not.

2.1 Search Strategy

The search aimed to identify studies that used the WRE method. Since it cannot be assumed that all authors accept the distinction between NRE and WRE, all studies dealing with the RE method were searched. The search strategy included different notations of the term RE. The search string was as follows: (“reflective equilibrium” OR “reflective equilibria” OR “Überlegungsgleichgewicht” OR “reflektiertes Gleichgewicht” OR “reflektierendes Gleichgewicht” OR “reflektierte Gleichgewichte”). The search concentrated on studies containing these terms in the title or abstract. Studies in German or English were included. All topics, disciplines, publication years, and publication forms were considered. The databases Scopus, Web of Science, PubMed, ProQuest, and Philosophers Index (Ovid) were searched to ensure an interdisciplinary overview.

The database search was conducted on 17.12.2020. Additional literature was added through recourse to personal contacts and an iterative reference tracking process as part of the full-text assessment.

2.2 Screening

The screening process involved two steps: title-abstract screening and full-text screening.

In the first step, all titles and abstracts were screened for eligibility. A study was included in the further investigation if it was an original study, written in English or German, and included a possible application of RE. A study was judged to be a possible application of RE if any of the following criteria were fulfilled: first, the author claims to use the RE method; second, the author describes the major components of an application; third, the author argues that the object of study is or is not in a state of equilibrium; or fourth, the aim of the study is to reach equilibrium, establish coherence, bring different positions together, provide a justification, or structure an argument or debate.

In a second step, the full texts of the included studies were examined for their suitability. The full texts were rechecked to ensure that these are original studies written in English or German. The subsequent evaluation of the full texts was based on whether the three conditions of WRE application (following the understanding of WRE in this paper) could be identified. Therefore, it was established whether the authors referred to their used model as RE or WRE and assumed at least three levels. NRE studies with two levels were excluded. Second, studies in which the specific elements of each level were not clearly designated were excluded. Third, studies that did not show a connection between the different levels were excluded.

2.3 Data Extraction

Information on study characteristics, the general understanding of the WRE, and the specific application itself were extracted. Regarding the study characteristics, the year of publication and the purpose of the application were extracted. The purpose of the WRE was differentiated between four types of possible purposes: first, to structure an argument or discussion; second, to analyze and justify a specific moral problem; third, to construct a moral principle or theory; and fourth, to develop guidelines for concrete decision-making. The type of statement was also extracted, namely whether the statements are positive or normative. An application is positive when the WRE is used to describe a particular topic. For example, to describe how a debate is structured or how actors make their decisions. An application is normative when the WRE is used, for example, to justify an object or to argue how a decision should be made. The subject of the application and the disciplines of the application were also extracted. The subject and the discipline of the publishing journal were used as a reference to determine the discipline. It was also determined whether the study was method- or application-oriented. A study was classified as method-oriented if the focus is on the description of the WRE method and the application was used to illustrate the method. A study was considered applied research if the focus was on the application itself.

Regarding the general understanding of WRE, the following pieces of information were extracted: how the authors name the equilibrium, whether a distinction is made between a narrow and a wide reflective equilibrium, and finally, if the equilibrium is used to achieve an overlapping consensus (OC).

When determining the application of the studies, this research examined whether the case study was fictitious or real. Consideration was also given to which sets of beliefs were used as categories for the levels and which elements were considered within the categories. It was also established by whom this content was determined and whether an adjustment process was carried out. Concerning the determination of the elements, a distinction was made between three options. Either they were determined by the author independently, or by the author with recourse to external data such as already existing surveys, or by data collection, for example, when the author conducts a survey. The question of adjustment was limited to checking whether an adjustment occurred regardless of its scope. As soon as it was reported in the text that, for example, an element was adjusted or rejected due to incoherence, when alternative elements were discussed, or when a change of position and opinion was described, this was counted as an adjustment. It was also established whether the authors considered other criteria apart from coherence, for example, to counter criticism of relativism and subjectivism. Here, a distinction was made between the following types of criteria: criteria for the user or the judge, criteria for the way the elements were collected, criteria for the characteristics of the elements, criteria for the assessment of coherence, and criteria for the procedure of the adjustment process. It was also determined whether more than one equilibrium was discussed in the applications and how many equilibria were reached. The achievement of an equilibrium is not determined on the basis of objective criteria, but rather is based on the statements of the authors, who may report whether an equilibrium has been reached or not.

3 Results

The following section first describes the study selection process. Then, the results of the extraction regarding the studies, the underlying understanding, and the respective applications are presented.

3.1 Literature Searches

Figure 1 shows the study selection process. A total of 1,331 studies were identified through the systematic search. After removing 638 duplicates, 693 studies were included for title-abstract screening. During the title-abstract screening, 497 records were excluded, leaving 196 records for search and eligibility assessment. Of the 196 records, 194 could be retrieved. Two records could not be retrieved through libraries, document delivery services, internet searches, or personal contact. Of the 194 studies, 179 records were excluded: four studies because they were not original studies, nine studies because they were not in English or German, 108 studies because they did not include at least three levels in their methodology, 51 because they did not clearly name the categories and elements of each level, and seven because they did not name the connections between the levels. In addition to the database search, five studies were identified through further internet searches, four through reference tracking, and three through personal contact. Of these twelve datasets, seven studies were excluded because they did not consider three levels in the methodology section, and five studies were included.

Fig. 1
figure 1

Study selection process as a PRISMA flow chart

From: Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 2021;372:n71

Fig. 2
figure 2

Characteristics of the studies and the underlying interpretations

A total of 20 studies (articles, book chapters, monographs, and one anthology) were considered. The anthology contained four articles with independent applications. These articles are taken as independent studies in the following. This added three to the original 20 hits. Therefore, 23 studies are referred to in the following. Figure 1 shows the study selection process in the form of a PRISMA flowchart.

Some studies were excluded that seem to meet the inclusion criteria. This is true, for example, for Rawls himself, but he directly indicates in his book “Theory of Justice” that he will not work through the process of WRE (Rawls 1971: 21). Even in the well-known studies by Goodman and Daniels, there are no explicit WRE applications that meet the stated inclusion criteria. Also excluded were Hahn’s frequently cited applications of the fundamental crisis in mathematics and the justification of no-smoking norms (Hahn 2000, 2016). The reason for this is that Hahn follows Goodman’s model and accordingly works with two levels. Consequently, Hahn’s applications here were evaluated as NRE and excluded.

3.2 Main Characteristics of the Studies

Table 1 provides an overview of the included studies. With the exception of McCann’s 1990 study and the four applications from the 1998 anthology “Reflective Equilibrium - Essays in Honour of Robert Heeger,“ all studies were published after the turn of the millennium. The included studies apply the WRE method to a wide range of topics and for different purposes.

Table 1 Overview of the included studies

Ten studies used the WRE method to analyze and resolve a specific decision problem. Of these 10 studies, half addressed moral decision problems in health care. The studies by McCann, Van Delden and Van Thiel and Willems examine decisions made by healthcare professionals regarding the ethical treatment of their patients (McCann 1990; J. J. M. van Delden and G. J. M. W. van Thiel 1998; Willems 2001). The work by Collste examines whether there can be a moral justification for infanticide, and the work by Van Thiel and Van Delden analyzes the extent to which the medical intervention referred to as the “Ashley treatment” is justified (Collste 1998; Van Thiel and Van Delden 2016). The other five studies deal with different questions from different fields. Rutgers’ work addresses two questions in the area of animal ethics. The first is the removal of claws from cats, and the second is the routine performance of cesarean sections on cattle (Rutgers 1998). Fung examines two governance problems, rule-making for political structure and the tyranny of minorities (Fung 2007). The work of Berman examines the legal eligibility of Senator John McCain for the office of President under the Natural Citizenship Clause of the United States Constitution (Berman 2011). Meyers’ study analyzes whether the use of cognitive neuroenhancement can be legitimate in certain contexts, and Tillson argues that simulating wrongdoing is morally reprehensible (Meyers 2014; Tillson 2018).

In six studies, the purpose of the application is to construct theories or moral principles. Of these studies, three focus on theory building and three on the construction of principles. Green develops a theory of legitimate expectations, Harris Jr a theory of moral change, and Tiberius and Swartwood a theory of wisdom (Green 2017; Harris Jr 2005; Tiberius and Swartwood 2011). Melin analyses ethical principles for environmental impact assessment, Neiman examines principles for international corporations, and Rechnitzer attemps to justify a moral principle applicable to the subject of precaution and precautionary decision-making (Melin 2001; Neiman 2013; Rechnitzer 2022).

Four studies aim to structure arguments or debates. Two studies by Doorn deal with the distribution of responsibilities with regard to research and development (Doorn 2010b, 2012). Schroten uses the Hermann case study to analyze public policymaking by ethics committees, and Takahashi analyzes the structure of bioethical reasoning with regard to human embryonic stem cell research (Schroten 1998; Takahashi 2011).

Three studies use the WRE method to develop guidelines for concrete decision-making. The work of Lindenau and Kressig aims to develop ethical guidelines for decision-making in the context of the child and adult protection authority, Preusche establishes quality criteria to assess reform proposals of the German welfare state according to their preferability, and Sheridan focuses on criteria for the seeding of tennis players (Lindenau and Kressig 2019; Preusche 2017; Sheridan 2007).

Figure 2 shows the further characteristics of the included studies. Seven studies in which the WRE is applied can be assigned to the fields of medical ethics and bioethics. Seven studies can be classified as political ethics, three as technology ethics, and two as legal ethics. Four studies can be assigned to other areas, such as business ethics or veterinary ethics. Of the 23 studies, eleven studies have an application-oriented research focus. The goal of these studies is on the justification itself. Twelve studies have a method-oriented research focus. These studies apply the method, but the goal is primarily to illustrate the utility of the method. For example, the WRE method is proposed as a decision-making method for a professional group or as a method with which normative and empirical data can be combined purposefully. In these instances, the application itself has more of an exemplary character. Regarding the type of statements, it was found that a total of four studies, namely those that attempt to structure an argument or discussion, have a positive statement type. The other 19 studies have a normative orientation.

Regarding the underlying understanding of the WRE, it can be noted that 13 studies distinguish between versions of the WRE and the NRE. The other ten studies do not address this distinction. In 13 studies, the method is called the “wide reflective equilibrium” or a variant of the “wide reflective equilibrium.“ Five studies use the general term “reflective equilibrium.“ The studies in German language use the terms “Überlegungsgleichgewicht” and “weites Überlegungsgleichgewicht” which correspond to the English terms “reflective equilibrium” and “wide reflective equilibrium” (Lindenau and Kressig 2019: 129; Preusche 2017: 21). There are also some variations. Harris Jr uses the terms “full reflective equilibrium” to focus on public justification (Harris Jr 2005: 74). Fung speaks of a “pragmatic equilibrium” and Van Thiel and Van Delden refer to their model as a “normative-empirical-reflective equilibrium” (Fung 2007: 444; Van Thiel and Van Delden 2016: 159–160). Both aim at giving greater weight to empirical evidence in their models. Takahashi uses the term “three levels structure analysis” (Takahashi 2011: 1). Another criterion of the underlying interpretation is the focus on OC. Four studies address the goal of achieving an OC. According to the authors, OC was reached in the studies “Exploring Responsibility Rationales in Research and Development (R&D)” by Doorn and “Wir gehen hin und her” by Lindenau and Kressig (Doorn 2012; Lindenau and Kressig 2019). In the dissertation “Judgements in Equilibrium? An Ethical Analysis of Environmental Impact Assessment” by Melin, an OC is reached in the sense that the author proposes well-founded and consensus-based principles for environmental impact assessment (Melin 2001). In the study “A Rawlsian approach to distribute responsibilities in networks” also by Doorn, an OC was not reached (Doorn 2010b).

3.3 Main Characteristics of the Applications

Figure 3 shows the characteristics of the included WRE applications. In total, 50 equilibria were sought in the 23 studies, 19 equilibria were reached, 22 were not reached, and nine did not indicate whether an equilibrium was found. Specifically, 16 studies focus on a single equilibrium, two studies address two equilibria, two studies examine three equilibria, one study analyzes four equilibria, another study examines eight equilibria, and finally, one study discusses 12 equilibria. The study discussing eight equilibria is “A Rawlsian approach to distribute responsibilities in networks” by Doorn. The background is that in this study, eight individual equilibria are assessed by participants with the previously described aim of determining whether an OC can be reached (Doorn 2010b). The study examining 12 equilibria is “Judgements in Equilibrium? An Ethical Analysis of Environmental Impact Assessment” by Melin. Here, twelve background theories are considered and it is examined whether a wide reflective equilibrium can be found for each theory (Melin 2001). In 16 applications, there is evidence that an adjustment process took place to achieve equilibrium, and in seven studies, there is no evidence of an adjustment process. The extent of the adjustment process was not determined.

Fig. 3
figure 3

Characteristics of the WRE applications

In 13 applications, the object of analysis was based on real circumstances, and in eight applications on fictitious scenarios. In two applications, the ontological status of the object of analysis could not be determined. For the different objects of analysis, different categories were considered at the respective levels of the applications.

At the first level, 17 applications use “judgments” as a category. These are specified in various ways, such as “initial judgments,“ “considered judgments,“ or “general moral judgments.“ Six applications use “intuitions” as a category, and eight applications choose other categories such as “values”, “actions”, “commitments”, or “practices”. Some applications use different categories at the same level. For example, Takahahi mentions the categories “moral judgments,” “moral sense,” and “practices” for the first level (Takahashi 2011: 2). In five applications, the first level is further subdivided with regard to the adjustment process. In this way, the applications of Collste, Meyers and Schroten distinguish between “intuitions” and “considered moral judgments,“ Rechnitzer differentiates between “input commitments” and “resulting commitments” and Sheridon distinguishes between “initial moral judgments” and “considered moral judgments” (Collste 1998; Meyers 2014; Rechnitzer 2022; Schroten 1998; Sheridan 2007). On the second level, 21 applications use a category that can be clustered under the term “principles”. In addition to “principles”, “norms”, “moral standards”, “theories” or “rules” are also frequently mentioned as terms. Fung uses “institutions” in his application and in the study by Tiberius and Swartwood “policies” are used as second level categories (Fung 2007; Tiberius and Swartwood 2011). At the third level, “background theories” are used as categories in 14 cases. Five applications use categories that can be grouped under the term “background beliefs”. In six applications “facts and information”, in one application “values” and in another application “practices” are mentioned as categories. In the applications of Collste and Van Delden and Van Thiel four levels are used and in the study of Schroten five levels are considered (Collste 1998; J. J. van Delden and G. J. van Thiel 1998). The categories of the fourth and fifth level have been integrated into the categories of the third level in this review.

The number of elements considered within the levels and categories varies. In twelve applications, 10 or fewer elements were considered. Seven applications identified between 11 and 20 elements. Four applications considered more than 20 elements. These include the two applications by Fung and Melin, which each consider between 21 and 30 elements (Fung 2007; Melin 2001). In addition, there is the application by Van Thiel and Van Delden, which includes between 31 and 40 elements, and the application by Rechnitzer, in which over 100 elements are considered (Rechnitzer 2022; Van Thiel and Van Delden 2016). In 14 studies, the elements were determined by the authors independently, and in seven studies, some elements were collected by the authors with recourse to external data. For example, the elements for the judgment level were collected through surveys, while the elements for the other levels were determined by the author alone. In two studies, all data were collected as part of a case study.

In 13 applications, further criteria were considered in addition to the coherence criterion. In various studies, reference is made to Rawls’ criteria for a “competent judge,“ but these criteria are often not addressed in the application itself (Rawls 1951: 183). Exceptions here are, for example, the applications by Collste, Harris, or Lindenau and Kressig. Collste discusses whether the professional personnel in a clinic can meet the criterion of “impartiality” vis-à-vis the patients (Collste 1998: 245). Harris refers to Rawls’ criteria and states that the practitioner must be “normally intelligent,“ so that they know the things that affect the world around them and know the “consequences of frequently performed actions,“ must be “reasonable,“ and have “sympathetic knowledge” (Harris Jr 2005: 72). From Harris’s point of view, the criterion of “sympathetic knowledge” is especially important for normative questions. In his opinion, users should study the literature relevant to the topic in order to acquire this form of knowledge (Harris Jr 2005: 78). Lindenau and Kressig focus on a “competent professional” and emphasize the necessity of “professional virtues” in the moral decision-making process (Lindenau and Kressig 2019: 128).

Lindenau and Kressig also refer to criteria for data collection by stating that it is imperative to consider the interests and rights of the individuals involved (Lindenau and Kressig 2019: 131). Criteria for data collection are further addressed by Neiman. Here, it is noted that various sources for determining weighed judgments have been used to counter the critique of relativism (Neiman 2013: 87–88).

Criteria for the characteristics of the elements considered are named in nine applications. Green focuses on the determinacy criterion and argues that a theory of legitimate expectations must contain four elements: first, it must specify kinds of agents who have moral standing to make claims for transitional relief or incur transitional obligations; second, it must contain a concept of relevant transitional effects; third it must specify the available responses to a transition claim; and fourth, it must include a set of principles that determine when an agent is entitled to a recognized transitional response (Green 2017: 182–183). Harris, Van Thiel and Van Delden, and Rechnitzer address the criteria of “independent validity” and “independent credibility” of judgments or commitments. Harris connects the criterion of “independent validity” with the criterion of “sympathetic understanding” mentioned earlier (Harris Jr 2005: 78). Van Thiel and Van Delden identify sources of independent credibility such as “durability” (Van Thiel and Van Delden 2016: 163). Drawing on Van Thiel and Van Delden, Rechnitzer also takes up these sources of independent credibility. In addition to the criterion of “independent credibility” of commitments, Rechnitzer considers “theoretical virtues” as criteria for the elements or candidates she subsumes under the term “system” (Rechnitzer 2022: 19). Here, she refers to the virtues of “Determinacy, Practicability, Broad Scope and Simplicity” (Rechnitzer 2022: 114). Similar criteria are found in Tillson, who argues that judgments and principles that are “simplest, least extensive, and most explanatorily powerful” are preferred (Tillson 2018: 209). Lindenau and Kressig list criteria that facts must meet in order to be integrated into the WRE process, such as being “intersubjectively verifiable”, and Takahashi’s study states the criterion that the elements of equilibrium must be “compatible with existing law,“ (Lindenau and Kressig 2019: 129; Takahashi 2011: 2). Melin emphasises in his thesis that above all prescriptivity should be seen as a necessary criterion for moral beliefs (Melin 2001: 40). The study by Tiberius and Swartwood lists the following four criteria that their WRE must meet: they must be rationally compelling, empirically adequate, action guiding, and neutral with respect to moral theories (Tiberius and Swartwood 2011: 279–280).

The criteria for the characteristics of the elements can also be used as criteria for the adjustment. Beyond that, however, two applications can be found that explicitly refer to adjustment criteria. For this reason, Willems refers in his application to the six criteria of Beauchamp and Childress to justify an adjustment and especially an infringement of principles. These criteria are as follows: first, there should be stronger reasons backing the overriding norm; second, the moral aim warranting the infringement is realistically achievable; third, no morally superior alternative courses of action exist; fourth, the chosen mode of infringement is the least excessive required to attain the primary goal of the action; fifth, the adverse consequences of the infringement are reduced to a minimum; and sixth, a dialogue regarding the infringement has occurred among the involved actors (Willems 2001: 28). Rechnitzer cites the criterion that the adjustment process must consider that “the resulting commitments respect the input commitments” (Rechnitzer 2022: 24). Here she refers, first, to the criterion of “independent credibility” already mentioned and, second, to the point that the “resulting commitments do not constitute a radical change of subject when compared with the input commitments.“ (Rechnitzer 2022: 37–38).

Criteria to evaluate the coherence of each triple set of beliefs are included in the applications of Preusche, Rechnitzer, and Harris. Preusche argues that the system of beliefs should be free of contradictions, mutually explanatory and affirmative. Being free of contradictions expresses the prohibition of inconsistencies and anomalies between elements of the system. The terms “affirmative” and “explicative” are meant to refer to the mutually justifying dynamics of the elements in the system and the explanatory capacity of added elements (Preusche 2017: 77). Rechnitzer argues that it must be possible to extract commitments from the system via inference so that agreement can be measured in the form of an account. In this context, “inferences can be specified as deductive or also as non-deductive inferences and they will typically require that the system is applied to relevant background information” (Rechnitzer 2022: 50). Harris focuses on two requirements, the “interrelationship requirement” and the “strictness-of-relation requirement,“ to determine the degree of coherence of the moral system (Harris Jr 2005: 76).

4 Discussion

In the following section, the results are classified in terms of the open questions that exist in the discussion about WRE. A focus is placed on how the WRE method has become established in research and the challenges in its application.

4.1 Results in Terms of Frequency and Feasibility

The results of this systematized search show that the WRE method is used in different areas of applied ethics, for a variety of purposes and in diverse subjects.

In terms of frequency, the results illustrate that WRE is rarely applied explicitly as a method in publications. This is especially visible when we consider that in the 50 years since Rawls transferred the method to ethics, there have been only 23 publications found with explicit applications. This assertion is supported by the fact that 52% of the studies had a method-oriented research focus, so the applications were often only exemplary. According to these research findings, the claim of the widespread application of WRE in practical philosophy does not apply to the explicit applications. If it is true that WRE is the central method of practical philosophy, then the applications seem to be mostly implicit. In other words, philosophers use the WRE method to develop their positions and arguments, but merely present their results without explicitly explaining how they obtained them. Regarding explicit applications, there is a small but increasing number especially in the fields of bioethics, medical ethics, and political ethics. These results are consistent with the previously cited review by Doorn (2010a).

In terms of feasibility, the situation is ambivalent. Wide reflective equilibrium was not achieved in 44% of the identified applications. The reasons for this vary. For example, Berman argues that the task is daunting and that he cannot take the project very far, so he limits himself to two steps (Berman 2011: 261). In contrast, Sheridan’s reason for not reaching equilibrium is related to the fact that she uses the method but criticizes and rejects it. She argues that the method has neither descriptive nor justificatory power because the WRE causes individuals to move too far away from their practices (Sheridan 2007: 187). Also, the result that in 18% of the applications, it is not clearly clarified whether equilibrium has been reached or not can be understood as a critical limitation of feasibility. However, equilibrium was achieved in 38% of the applications, at least from the authors’ point of view. Among these studies, there are real and fictitious application examples, as well as four different application purposes, indicating that the method is feasible. However, there seem to be barriers and problems that hamper the feasibility of the WRE method.

4.2 Results Regarding the WRE Procedure

Despite some similarities, the results indicate that the WRE method is applied differently by different authors. These differences cannot be attributed solely to the different purposes or subjects of the applications. Even applications with the same purpose and comparable subjects differ from each other in the application procedure. First, there are differences in how detailed and comprehensive the respective applications are. These differences are not particularly surprising, although it should be noted that the number of elements considered does vary considerably.

Second, and also not particularly surprising, the results show differences in application arising from the different interpretations and concepts of WRE. These include the different categories used in the respective applications. One particularly striking difference is how “facts and information” are treated. Authors who consider facts as an independent level usually do it to emphasize the active role of facts. This point is especially evident in the context of Heeger’s network model, which some applications follow (Collste 1998; Rutgers 1998; Schroten 1998; J. J. M. van Delden and G. J. M. W. van Thiel 1998). The consideration of facts and information as an independent level is also understandable from the perspective of empirical ethics. However, this makes questions of adjustment relevant, as it is disputed whether facts can be adjusted in the same way as judgments or principles. Accordingly, the implications for the adjustment process must be carefully considered. Deciding at which point facts and information should be integrated into the WRE requires further analysis.

Third, there are also significant differences concerning the process steps themselves. Contrary to what might be expected from a method, certain process steps are not only carried out differently but in some cases do not occur or are not reported. This becomes most obvious with regard to adjustment. Although adjustment is a fundamental part of the WRE process, and most authors explicitly distinguish between WRE as a process and the equilibrium state, little attention is paid to this process step itself. 70% of the studies refer to adjustment in varying degrees of detail. Generally, reference is only made to the adjustment of one exemplary element. This scarce mention of adjustment can be explained by the fact that the research questions are usually directed at the equilibrium state rather than the process itself. Moreover, it is common in philosophical studies to omit the methods section in publications. However, a transparent explanation of the adjustment process increases the degree of justification by justifying the inclusion of the final elements and the exclusion of alternative thinking concepts. Although a description of the entire, and possibly extensive, adjustment process may be beyond the scope of an application, it seems reasonable to at least indicate that an adjustment has occurred. In addition, it would probably be beneficial to explain the main adjustments and make a comparison between the initial and final elements. The five studies in which the first level elements were subdivided in terms of adjustment can serve as an example. Future applications could benefit from a more intensive discussion of the transparent presentation of the adjustment process and, if possible, the formulation of appropriate guidelines and standards. Taken together, these points raise the question of whether more emphasis should be placed in philosophy on a more transparent description of the methods used.

Another relevant point is the criteria used apart from the criterion of coherence. The fact that criteria were only addressed in 57% of the applications can be assessed as a low application of criteria. Since coherence itself is an ambiguous and theory-based term (Hoffmann 2008: 430), there are few criteria for adjustment and no “metric for coherence” (Tomlinson 2012: 76). However, the fact that even well-known criteria such as independent credibility are rarely addressed shows that there is clear ambiguity in the application of such criteria. Van Thiel and Van Delden point this out regarding the independence constraint: “Currently, more-specific guidance on criteria for selection that allow the Thinker to defend his choice of background theories against critics is lacking” (Van Thiel and Van Delden 2016: 169).

Where the criteria themselves are concerned, it is also important to examine the extent to which the criteria are measurable and comparable. This focus is taken in particular in the application of Rechnitzer, which, for example, sets the standard that criteria such as the theoretical virtues should be measurable at least on an ordinal scale (Rechnitzer 2022: 37). Such concretizations could be first steps of standardization and seem helpful in increasing the degree of justification and transparency of the applications. Overall, an examination of the criteria indicates that they have the potential to relativize the criticisms of WRE, such as the accusations of subjectivism and relativism. However, it should be considered that the criteria do not limit the practicality of the WRE. Instead, the criteria should provide orientation and guidance to users. The extent to which the criteria identified in this work fulfill this requirement cannot be answered within the scope of this work. The number of identified applications and the experiences with the criteria seem to be too small to draw general conclusions.

4.3 Limitations of this Study

This study is limited to applications that consider at least three levels, that name the categories and elements of the levels, and that identify a connection between the levels. These criteria are based on Daniels’ definition with the variation that it is not specified which categories should be located on the levels. These criteria made it possible to distinguish between the WRE and the NRE and to justify the inclusion of studies. However, these criteria are also associated with inaccuracies that can lead to bias. One point to consider is when exactly the categories and elements of the levels were clearly designated. Here, there remains some scope for interpretation. Based on this criterion, studies were excluded that could well be understood as applications, such as “Disputes in just war theory and meta-theory” by Long, “Grounding Rights and a Method of Reflective Equilibrium” by Nielsen, or “Liberalism and Public Health Ethics” by Rajcz (Long 2012; Nielsen 1982; Rajczi 2016).

It is also the case that by softening Daniels’ definition, studies that are considered as WRE applications in this review could be understood as NRE. This applies to “A Rawlsian approach to distributing responsibilities in networks” by Doorn and “Democratic Theory and Political Science: A Pragmatic Method of Constructive Engagement” by Fung, which tend to assign their work to the version of NRE (Doorn 2010b; Fung 2007). In general, it should be noted that there is no established reporting scheme for the WRE topic area examined, so the categories used to describe the studies sometimes remain fuzzy.

Further, studies that do not contain the term reflective equilibrium in the title or abstract and, therefore, were not found in the systematized search may not be considered. For example, studies seeking OC using the WRE method may not be systematically included. The same applies to studies that refer to Heeger and Van Willligenburg, for example, and speak of the “network model” instead of WRE. In particular, older studies, which often do not include an abstract, may be overlooked by the search strategy.

As with any literature search, this review cannot be 100% sensitive. For example, it is possible that philosophical literature such as dissertations published as books and not listed in the specific philosophical series of the Philosopher’s Index may have been overlooked. In addition, it is possible that a bias exists with respect to specific academic disciplines or country-specific publications despite the consideration of various databases.

The method of systematized and critical review allows an overview of the explicit applications of the WRE method published so far. Since the review has to be limited to publications, no statement can be made about whether philosophers use the WRE in their daily work, for example when they set up their arguments. The implicit applications are thus beyond the scope of this review.

A systematic review is usually prepared by two independent reviewers. A limitation of this study is that a dual independent review was not conducted. However, this limitation is probably less relevant, as study selection errors or extraction errors only lead to relatively small biases in the results. This situation would be quite different for typical topics of health economic reviews, such as a clinical trial for reimbursement decisions, where individual studies could significantly bias the results of the review.

4.4 Implications for Further Research

The results of this review seem to support Hahn’s thesis that the WRE should be regarded as a metaphor rather than a standardized method. To further develop the WRE into a method, the focus should be on developing standards and guidelines. This is especially the case for the description of the adjustment process and the use of WRE criteria. With regard to the adjustment process, it is necessary to clarify which steps the adjustment process specifically comprises and which steps should be reported. Furthermore, the point at which the adjustment process concludes and the precise meaning of “agreement” need clarification. The question of which criteria are helpful for which purposes of application should also be answered. It should additionally be examined whether there are differences in the respective disciplines with regard to these questions. For example, it is to be assumed that the application of the WRE in the context of business ethics requires different standards than the application in the field of technology ethics. Further research is also needed to distinguish between the different types of reflective equilibria.

The answers require further applications to examine the advantages and disadvantages in an evidence-based manner. At the same time, further reviews could be helpful in determining the frequency of a characteristic and its expression so that a comparison of the different approaches would be possible. Such reviews could be helpful for the development of standards and guidelines. The low number of explicit WRE applications and the relatively high number of studies excluded because of the absence of three levels raises the question of whether NRE is more widespread than expected. A systematic review of NRE seems an interesting topic for further research.

5 Conclusion

The aim of this review was to provide an overview of the applications of WRE to date, focusing on the characteristics and methodological details and placing them in context with the WRE debate. The results of the systematized search show that WRE is applied in different ethical fields, for different purposes, and to different topics. First, this indicates that WRE is quite relevant and has the potential to be established as a justification method. At the same time, it was found that the number of applications is relatively low and that applications differ considerably even when they have similar purposes. Although different approaches are not fundamentally problematic, it seems that future applications would benefit from specific application steps being more standardized. This is especially true for the adjustment process and the criteria used. Here, further research is needed to determine which aspects of the adjustment should be reported and which criteria prove to be particularly purposeful. By answering these questions, the degree of justification and transparency of the WRE should be increased so that the criticisms of relativism and subjectivism can be countered more effectively.