INTRODUCTION

The rich contextual and interdisciplinary nature of international business (IB) allows us to better understand topics that are often neglected by other fields because they do not fit their main paradigm. Still, most of the advances in IB are based on a single study or on a series of studies that build on assumptions made in the seminal original. What is missing in IB, as well as in many other social sciences, is the kind of replication that can confirm or counter the findings of extant empirical work (Aguinis, Cascio, & Ramani, 2017; Bettis, Helfat, & Shaver, 2014, 2016; van Witteloostuijn, 2016, 2020; van Witteloostuijn, Dejardin, & Pollack, 2018; Walker, Brewer, Lee, Petrovsky, & van Witteloostuijn, 2019). The reason for this lacuna is that replication studies are often dismissed as “not original”. This considerably lowers their chances of being published in leading journals, obviously discouraging anyone from conducting them (Easley, Madden, & Gray, 2013; Evanschitzky, Baumgarth, Hubbard, & Armstrong, 2007; McKubre, 2008; Reid, Soley, & Winner, 1981; Rosenthal, 1990). This is harmful to the field. Much of our foundational knowledge is from work that has never been “tried and tested” by replication, whilst we well know that, methodologically, a single study seldom generates robust evidence.

We base our argument on the Popperian tradition. We turn to the definition of replication in A Dictionary of Social Sciences (Gould & Kolb, 1964, p. 748). We see replication as a scientific method of verifying research findings, whereby there is “(…) repetition of a research procedure to check the accuracy or truth of the findings reported.” Replication studies are vitally important as a means of (a) filtering out false positives, (b) producing robust evidence regarding the size of an effect, (c) providing greater generalizability of findings, (d) setting boundary conditions for findings, and (e) suggesting factors that may harmonize findings with those of other related studies. Systematic replication can lend credence to extant knowledge, and thereby advance theory. Far from being idle speculation, replication can add a dimension to prior work. It can provide support for prior findings, and if there are dissimilar results, open the door to productive debate. Thus, it is misguided to frame a replication as evidence of a “failure or success” as the objective is a more solid foundation for our collective work. In that way, irrespective of the findings, any well-executed replication must be seen as a success.

IB especially would benefit from well-developed replication studies. The array of topics and the extensive geographical scope covered in our work can exacerbate the data limitations inherent to empirical studies. Many primary and secondary datasets are limited to a single industry or subset of industries, to a single country or subset of countries, and so on – be it firms, regions, consumers, managers or any other entity. It may or may not be possible to generalize the ensuing findings across other samples or settings. Indeed, we routinely acknowledge this in our articles as a limitation, often adding that future research might examine our findings using different settings, but those studies are very rarely conducted – not by the original authors nor by others. As a scholarly community, we simply fail to work on the very limitations we acknowledge.

The contributions of this editorial lie in showing how replication studies can add to our body of knowledge and fine-tune theory. First, replication studies, like all good research, can be an instrumental part of knowledge-building by addressing important what, how, and why questions: What variables are used, how they are related, and why these connections exist. Second, replications can fine-tune theory by considering its where, who and when aspects, establishing boundary conditions in terms of time and space. We note that Bacharach (1989: 498) cautions that “Theories cannot be compared on the basis of their underlying values, because these tend to be the idiosyncratic product of the theorist’s creative imagination and ideological orientation of life experience”. If replications studies are to add to and fine-tune theory, they must be done well. Thus, we provide a step-by-step template with specific guidance on how to conduct them. We also introduce and discuss the possibility of adopting pre-registration as a way to increase credibility – of replication studies, and any other empirical study for that matter. Finally, we propose best practice guidelines on how IB journals can see to it that replication studies become part of the research mainstream.

The remainder of the paper is organized as follows. The focus of the next section is on why replication is important in general, and in particular for IB. The third section documents the rarity of replication in IB, and explains why. The fourth section turns to the types of replication studies, and the fifth provides a replication study template with specific steps and best practices, including a discussion on the possibility of pre-registration as a means of increasing the credibility of studies. The sixth section includes a number of suggestions on how journals can encourage the submission of replication studies and facilitate their publication. We end with a Discussion and Conclusion section.

WHY WE NEED TO REPLICATE

Replication studies are crucial to any scientific field because they ensure that research is not built on the biased findings of a large stock of single studies (Walker et al., 2019). This is commonly acknowledged across many disciplines, as is that not conducting replication studies will eventually cumulate in a ‘replication crisis’ (see, e.g., Loken & Gelman, 2017; Schooler, 2014). It should be obvious that trying to develop a field of research based on single studies piled one upon the other is like trying to erect a house of cards. One might even argue that a single observation is tantamount to no observation at all in any research tradition that involves contextual and probabilistic elements. Yet, running counter to this is a widespread publication bias in favor of “positives” (Ioannidis, 2005). We will argue that, for a number of reasons, IB is a field for which replication is particularly important.

The grounds for carrying out replication studies in social science can be found in the Popperian philosophy of science (cf. Popper, 1935/1959; see also Lakatos, 1976), according to which evidence in favor of a specific hypothesis should not be regarded as confirmation that that hypothesis is true, but only that that hypothesis is not (yet) falsified. Popper’s falsification principle holds that hypotheses survive only until outcompeted in later battles. Scholars in IB and in other fields that rarely – or never – practice systematic replication (Starbuck, 2016; van Witteloostuijn, 2020) have fallen into the habit of thinking that a hypothesis is ‘confirmed’ or ‘supported’ in contrast to a theory-less null simply because an arbitrary p value threshold criterion is met (normally, p < .01 or p < .001), in contradiction to the Popperian falsification principle. In an earlier editorial, JIBS set out why IB researchers should stop p-hunting for asterisks (Meyer, van Witteloostuijn, & Beugelsdijk, 2017). In this complementary editorial, we hope to establish the case for systematic replication work. Indeed, the reasons for replication are manifold and well established in the methodological literature. One of them is the need to accumulate findings to establish their robustness. Empirical research in IB, as well as in other social sciences, is largely probabilistic and so intrinsically error-prone (as are many of our measures). It is also design and context-specific. In quantitative work, the focus of this editorial, an estimate is but a first approximation of the “true” effect, both in terms of significance and size. Any estimate is conditional on the sample, context, period, measures, methods, analyses, and much more. Anyone who has ever run a regression knows that the effect size and significance level of any coefficient is sensitive to the choices made along the way, to the addition of controls or the removing of outliers, for instance. This is why it has become common for scholars to provide ever more sophisticated robustness checks and sensitivity analyses. Editors and reviewers have come to expect these as well as a transparent discussion of the boundaries of the analysis and hence of the findings. This is good as it goes, but as we will explain below in greater detail, robustness checks and sensitivity analyses do not make up for the lack of systematic replication studies.

Replication is anything but a way to simply add spice to research, IB being no exception (Aguinis et al., 2017). Many scholars in other fields have recognized the need to replicate, for example in strategy (Bettis et al., 2016; Hubbard, Vetter, & Little, 1998), in management (Singh, Ang, & Leong, 2003; Tsang & Kwan, 1999; Uncles & Kwok, 2013), in advertising (Kerr, Shultz, & Lings, 2016), in information systems (Berthon, Pitt, Ewing, & Carr, 2002), in sociology (Freese & Peterson, 2017), in social psychology (Brandt, Ijzerman, Dijksterhuis, Farach, Geller, Giner-Sorolla, Grange, Perugini, Spies, & Van’t Veer, 2014; Fabrigar & Wegener, 2016; Schmidt, 2009), in education (Makel & Plucker, 2014), and in public administration (Jilke, Petrovsky, Meuleman, & James, 2017; Pederson & Stritch, 2018; Walker, Brewer, & James, 2017a, Walker, Lee, & James, 2017b, 2019). The list may be long, but recognition alone is not enough to turn ambition into action. In actuality, replications are still scarce in these and many other disciplines.

In some fields, replication studies have long been seen as little more than “copies” that entail no original work. Scholars in management fields, including IB, have been socialized by their training to think that way. Those who believe that replication work should be an integral part of their research find themselves stymied by a deeply ingrained bias against replication. Social science journals rarely publish replication studies, and those that do give them such a limited platform that they represent a tiny fraction of the published output (see below). To turn things around demands a sea change – beginning with academic journals. There needs to be clear path for them to follow if replication studies are ever to be in the mainstream. It is worth the effort. After all, replication makes it possible to generalize findings, to identify their critical boundary conditions, and to uncover missing variables that might harmonize findings across studies. In the end, our field will become stronger, more mature. It will serve as an example for others to emulate.

WHY WE HAVE FAILED TO REPLICATE

The lack of receptiveness on the part of journals goes hand-in-glove with reluctance on the part of researchers. It is simply human nature: There is little incentive to devote time and effort to conducting a replication study. Intrinsically, a researcher may be motivated by the desire to make a novel contribution rather than to repeat the work of someone else. There may also be some discomfort about the possibility of being seen as part of some kind of truth police – something not likely to contribute positively to one’s reputation (Koole & Lakens, 2012). Extrinsically, academic institutions often have in place a wide variety of practices that effectively act as disincentives to replicate (van Witteloostuijn, 2016), perhaps chief among them being rewards associated with publishing, and replications are rarely published by top journals in the social sciences, including IB. Hence, concentrating on replication is very unlikely to boost an academic career. What might academic institutions do to change this? Given the lack of intrinsic motivation, it makes sense to concentrate on institutional incentives. Even though “good intentions” are not enough, it is a good sign that some top journals have begun to promote replication.

Incentives to replicate are much needed. The social sciences as a whole have a very weak record. Van Witteloostuijn and van Hugten (2021) looked at 18 top journals in six social science disciplines, including business and management. Not one of the 148 quantitative empirical studies in their sample is a replication study, and only three report replication-like results. While there is no IB journal included in the van Witteloostuijn and van Hugten sample, in a 2017 JIBS editorial, Meyer, van Witteloostuijn, and Beugelsdijk conclude that the situation is very likely to be no different in IB (Aguinis et al., 2017). In a bid to see where the IB community stands in 2020, we conducted a follow-up study.1 We first used the following search string in Scopus:

TITLE-ABS-KEY (replicat*) AND EXACTSRCTITLE (“international business” OR “international management”) AND (LIMIT-TO (DOCTYPE, “ar”) )

Thus, we asked the Scopus algorithm to search, in journals having “international business” or “international management” in their title, for articles in which the title, abstract or keywords include the word “replication” or a variant thereof.2.

In so doing, the IB journals included are the Journal of International Business Studies (JIBS), International Business Review (IBR), Management International Review (MIR), and the Journal of International Management (JIM) (Tüselmann, Sinkovics, & Pishchulov, 2016).3. We manually added the Global Strategy Journal (GSJ), and Journal of World Business (JWB). Our count includes all articles ever published in these and other IB journals from their origins to October 2020. Our search string produced 59 hits in total. Among all of the well-known journals listed above, after manual screening, only nine articles remained that actually reported findings from replication studies. Another six appeared in other IB journals. JIBS, arguably the leading IB journal, published only two replication studies and both more than two decades ago, in 1999 and 2000. JWB also published two, albeit more recently, in 2006 and 2013. Other journals published only one replication study each: GSJ in 2015, IBR in 2005, JIM in 2016, and MIR in 2020. The remainder were published in other IB journals: Critical Perspectives on International Business, European Journal of International Management, International Business Management, and Journal of International Teaching in International Business, one each in the first three, in 1999, 2005, and 2010, respectively, and three in the last of these journals, very recently, in 2020.4 In Table 1 we provide the number of replication studies in given years and the journals in which they appeared, as well as the complete reference.

Table 1 Replication studies published in IB journals

We reach two conclusions. First, 14 replication studies published across the broad landscape of IB journals over the course of the roughly five decades since the first one was established is an extremely low number. Second, the number of replication studies does not reveal a clear time trend in any direction. We cannot but conclude that the field of IB still has a long way to go. With this editorial, we hope to stimulate change, just as JWB has started to do with its recent launch of a replication special issue.

TYPES OF REPLICATIONS

There are multiple interpretations of what constitutes a replication. One can simply distinguish between replication and replication with extension (Brown & Coney, 1976). Bettis et al. (2016) offer a more complete classification. Perhaps the best-known classification is that proposed by Tsang and Kwan (1999), recognized now as a landmark in the management field. Their classification includes six different types based on two dimensions, (i) research design and (ii) source of data. Researchers may choose to keep the same research design (i.e., variable measurement and method of analysis) as that of the original study, or they may change one or both of these research design components. They may also choose to replicate the study on exactly the same sample as the original study, or with a different sample within the same population, or with a different population altogether (population referring to location, industry, time period, and subjects). Table 2 maps the six types of replication studies set out by Tsang and Kwan (1999), although we rename some of the categories to make them more intuitive, as described below.

Table 2 Types of replications: theory-checking vs. theory fine-tuning.

The narrowest type of replication uses the same data and research design as that of the original study – that is, the exact same analyses are conducted on the same sample. The intention is to check for possible errors in the original study. Tsang and Kwan (1999) refer to this as “checking of analysis”. We rename it reproduction to make clear that this involves an effort to reproduce as closely as possible the original study. This type of analysis is generally used by journals, in economics and finance for example, as part of their data transparency policy, and has more recently been adopted as an option by some business and management journals, including JIBS (Beugelsdijk, van Witteloostuijn, & Meyer, 2020).

A less narrow type, referred to by Tsang and Kwan (1999) as re-analysis of data, a term we adopt, uses the same data but a different research design, either in terms of variable measurement or data analysis. This type serves as a sensitivity analysis of the research design, and is usually performed by authors to prove the robustness of their baseline findings. Take for example a study of firm internationalization that uses different measures of internationalization, such as exports and foreign sales over total sales. Methodologies and computing have advanced considerably in the past few decades, so an older study using a straightforward linear regression analysis might be replicated with a multi-level methodology that may be more appropriate for testing the same data, so as to see if changing the measurements and/or the analyses would yield the same results.

From the perspective of the Popperian philosophy of science, neither reproduction nor re-analysis of data can be seen as ‘real’ replication, as the underlying data – and hence the analyzed observations – are identical (Schmidt, 2009). Popper (1959: 45) argued that “[o]nly when certain events recur in accordance with rules or regularities, as in the case of repeatable experiments, can our observation be tested – in principle – by anyone. (…) Only by such repetitions can we convince ourselves that we are not dealing with a mere isolated ‘coincidence’, but with events which, on account of their regularity and reproducibility, are in principle inter-subjectively testable.” So, for replication to solidify knowledge, replications must add new observations to the extant stock of findings. The other Tsang and Kwan (1999) replication types do this by offering complementary methodological tools that allow coming closer to the truth. In this editorial, we argue for the remaining four types of replication to be promoted in IB (see Table 2).

Replication can be used to assess the robustness of the analysis in terms of reliability and representativeness of the sample. In this case, the analysis is re-run on a different sample within the same population, using the same measurements of constructs and methods of analysis as the baseline. In essence, the analysis checks sample reliability and representativeness. Tsang and Kwan (1999) call this type “exact replication”. Along with Schmidt (2009) and Walker et al. (2019), we call it direct replication. An example would be a study originally conducted on a random sample of workers in a company that is replicated using a different random sample of workers in the same company, while still using the same variables and analyses as the original study. Clearly, direct replications add new observations to the extant stock of findings, but without sampling from another population.

The three types of replications we have described so far are shown in Table 2 clustered under the heading of theory-checking replications (in shaded boxes). Different types of sensitivity analyses (i.e., re-analysis of data) are increasingly required by journals, especially the top ones, and reproduction is more and more established as a means to check analyses as a core part of the data transparency policy of many journals. Hence, these types of replication make, in a Popperian optic, only a limited contribution, if any at all. It is fair to say that direct replications do, but they are limited to examining sample reliability and representativeness. That being the case, they do not bring anything to the table in terms of fine-tuning theory. This is why in Table 2 we set these types of replication apart from the other three. The conceptual extension, empirical generalization, and generalization and extension types can be fruitfully used to fine-tune theory, thus we name this set theory fine-tuning replications (in bold in Table 2).

Conceptual extension is a type of replication that does not use the original study sample, but nonetheless uses one within the same population in combination with different construct measurements or methods of analysis. This may or may not contribute to fine-tuning extant theory, depending upon the precise nature of the extension and the results. This type of replication has two forms: (i) ex ante fine-tuning of theory (i.e., an extension of theory), usually based on the same theory as the original study with an a priori extension through deduction; (ii) ex post fine-tuning of theory, which does not involve its ex ante extension, but has the potential to extend and revise it through ex post abduction if the replicated findings turn out to be different from those in the original study.

The conceptual extension type of replication might, either ex ante or ex post, (a) question what variables are involved in a causal model, (b) shed light on how these variables relate to one another by, for example, identifying confounding factors, and (c) help improve the theory as to why these variables are connected (i.e., elucidates the causal mechanisms connecting these variables). The theory in the original study may have to be adapted as to (a) the relative importance of the different hypothesized independent variables and their expected signs, (b) the relevance of moderation effects, and/or (c) the signs in the hypothesized relationships. In an ex ante approach, hypotheses regarding (a), (b), and/or (c) are deductively formulated before conducting the replication; in an ex post approach, they are developed abductively if the results of the replication study suggest that an alternative explanation is required. A hybrid approach can be adopted as well, with some new hypotheses provided ex ante (for example, regarding point (a) only), and some ex post.

Consider a study based on survey data and standard ordinary least squares regression that suggests that the effect of reversed knowledge transfer from US subsidiaries to their UK headquarters on the innovation performance of the MNE relies on knowledge transfers across units. A conceptual extension could replicate the study by (a) using patent citations as a proxy for subsidiary-to-parent knowledge flows, (b) re-assessing the confounding role of subsidiary capabilities or similarity of the knowledge base of subsidiary and parent, and (c) further exploring the modularization of R&D projects across the multinational network as an alternative causal mechanism. A full ex ante approach requires developing new hypotheses before conducting the replication. If the replication provides convincing evidence that the originally hypothesized positive relation between reversed knowledge transfer and MNE innovation performance is in fact significantly negative with a large effect size, researchers using an ex post approach would then search for a plausible explanation.

Second, empirical generalizations are replications that use the same research design as the original study, but draw data from a different population. They make it possible to test the generalizability of the original study to a different cultural, geographical or institutional context, as well as different industries, organizations, time periods, and levels of analysis. Replications that come up with different findings than the original study can help establish the boundaries of the original theory. Indeed, they can lead to the development of an improved theory that explains (i) why the original theory does not seem to work outside its boundaries and (ii) what happens outside these boundaries. As in the case of conceptual replications, the approach can be one of ex ante deduction or ex post abduction.

Consider a study of the internationalization of state-owned Chinese companies in the early years after the opening up of the Chinese economy that uses data on acquisitions. The generalizability of the “where” dimension could be examined by looking at state-owned companies from a different emerging country in the same period. To test the generalizability of the “who” aspect one might look at private Chinese companies in the same period. The generalizability of the temporal dimension could be probed by looking at state-owned Chinese companies in a more recent period. If they produced results at variance with those of the original study, replications would make it possible to extend and improve the theory on emerging market multinationals. Suppose, for instance, that a replication with recent data provides evidence for a switch away from acquisitions and towards greenfield investments. Then, the original theory would need to be adapted in order to find a plausible explanation for the shift.

Finally, generalization and extension is a type of replication in which both the data source and the research design differ from the original study. If the results of the original study are confirmed, then they are generalizable in terms of context and research design; if not, the reasons may be difficult to identify (Lindsay & Ehrenberg, 1993). Thus, this type of replication is best used as the final step in a sequence that runs from direct replication, through conceptual extension and empirical generalization, to generalization and extension (again, using either an ex ante deductive or ex post abductive approach). In such a sequence, the empirical foundation of a theory will have been firmly established in the first three steps, making it easier to identify the reason for disconfirmation in the last.

For example, one could do a direct replication of the study of the internationalization of Chinese state-owned companies using a different sample of the same population but with the same variables. The original study could be conceptually extended by taking, for example, exports as a proxy for internationalization. Further, it could be empirically generalized, for example along the where dimension, as illustrated above. Finally, the study could be generalized and extended by simultaneously considering all the variations each type of replication has sequentially made to the original study. The set of results coming from such a sequence of replications might both provide robust evidence regarding effect sizes and significance levels, or if they are at variance with the original results, lead to a modification of the original theory.

Were time and other resources limitless and authors of a replication study have full access to original datasets and analyses, they would ideally be able to conduct each of the six different types of replications in turn, as part of a systematic replication effort. See Figure 1 for a list of ‘Replications by Increased Complexity and Distance from Replicated Study’, adapted and extended from Tsang and Kwan (1999) and Walker et al. (2019). The latter provide a similar list, but only for four of the six types and in a different order.

Figure 1
figure 1

Source: Adapted and extended from Tsang and Kwan (1999) and Walker et al. (2019).

Replications by increased complexity and distance from replicated study.

Conducting different replication types in sequence would allow researchers to examine the potential generalizability of findings in different contexts, especially if they move beyond the reproduction and re-analysis types, as then the replication cycle starts to add new observations to the extant stock of findings. Researchers might report the findings of their reproduction and re-analysis replications before turning to those of more advanced types. All theory fine-tuning types of replication extend and improve theory by stimulating the development of hypotheses on the “what”, “how” and “why”, where”, “who”, and “when” of the original study.

We end this section with two remarks. First, our distinction between ex ante (deductive) and ex post (abductive) types of replication is crucial given the research practice of HARKing (Hypothesizing After the Results are Known). HARKing is a questionable research practice if it is not done in the open – that is, when ex post hypotheses that fit with the findings are presented as if they were developed ex ante (Kerr, 1998; see the general discussion in an IB context, not limited to replication, in Meyer et al., 2017). In other words, abduction is a credible theory-developing practice as long as it is not deduction in disguise.5 With that caveat, it has a place in the IB methodological toolkit along with deduction and induction (Dikova, Parker, & van Witteloostuijn, 2017; Schurz, 2008).

Second, while the number of replication studies in IB is stagnating at a very low level, that of meta-analyses is on the rise (Steel, Beugelsdijk, & Aguinis, 2021). The studies that are input to meta-analyses have similar controls or dependent variables, but are based on different theories and use different specifications. Very few of them are replication studies in the Popperian sense as defined in this editorial. Replication studies are very different from meta-analyses in design and philosophy, aiming to explicitly compare findings vis-à-vis an original study. The next section provides practical guidelines for their conduct.

A REPLICATION PROCESS TEMPLATE

To facilitate increasing replication studies in IB, we propose a template based on Walker et al. (2019), which we adapt, extend, and break down into additional steps. The template has three parts. One might see them as a To Do list: (i) determine replication desirability and viability, (ii) determine replication study design specifics, and (iii) conduct replication study and report findings (see Figure 2).

Figure 2
figure 2

Source: Adapted and extended from Walker et al. (2019).

Replication template.

Each of the parts is made up of a number of decisions to be made and subsequent steps to be taken. We discuss them in turn. Note that although our primary focus is replication as a means of fine-tuning theory, much of what we write applies to replication as a way of theory-checking as well.

Part 1: Determine Replication Desirability and Viability

What might be gained by conducting a replication study? Is it feasible to conduct one? Answering these questions requires evaluating the design of the original study and defining its statistical power. Three crucial decisions must be made at this stage.

Step 1. Desirability of the replication. First and foremost, it must be decided whether it is desirable to replicate a study. The object of replication should be prior work that is both highly influential and in need of theory improvement. Thus, before embarking on a replication study, the original study should be examined with a careful eye to its theoretical contribution. Would a replication study be likely to further it?

Next, what relevant aspects of the original study might be changed? Would the goal be to shore up the extant body of knowledge, or to fine-tune theory? The “what”, “how”, “why”, “where”, “who”, and/or “when” questions should be asked and answered. Another decision then comes to the fore: What type of replication study should be conducted – a direct replication, a conceptual extension, an empirical generalization, or a generalization and extension? This depends on the suspected theoretical weaknesses of the prior work, and on the availability of other data and methods. If alternative measures of key constructs or methods of analysis are available, a conceptual replication may yield an advance in knowledge. If the generalizability of the original article is in doubt, then an empirical generalization replication would be preferable. If both the measures and the methods of the original article might be improved, and its external validity is doubtful, then a generalization and extension type of replication is the way forward.

Step 2. Viability of the replication. It is imperative that there be a careful evaluation of the design of the original study – no replication should be attempted without first having sufficient grasp of the original. Also crucially important is determining whether there is access to the data originally used, and whether insight in the design of the original study is sufficient. Access and insight may be facilitated by involving the original author(s). But of course, this must be carefully weighed given the possibility of compromising the impartiality of the replication.

Step 3. Internal validity. Having come to this point, there needs to be an in-depth analysis of the internal validity of the original study – that is, its procedures, measures, and analyses. This needs to be done not only to further assess the feasibility of replication, but also to determine whether improvements to its internal validity can be made. Does the original study advance a convincing causal chain and rule out alternative explanations; and if not, how might a replication?

Part 2: Determine Replication Study Design Specifics

In this second stage, authors decide how to design their study.

Step 4. Type of replication. Which of the six replication types would be most appropriate? We believe that direct replication, conceptual extension, empirical generalization, and generalization and extension are best suited to Popperian knowledge validation and theoretical fine-tuning. Reproduction or re-analysis of data can be part of a replication sequence that leads to knowledge-validating and/or theory fine-tuning replication types, as discussed above. Although oftentimes a full sequential replication series is desirable, the reality is that resource availability can be a constraint. Nonetheless, even without a full sequence, the baseline in any theory fine-tuning replication is likely to be a theory-checking analysis (particularly direct replication).

Which of the three theory fine-tuning replication types should one choose? The best way forward is to proceed sequentially. First, if superior measures and methods are available, then the next step might be to conduct a conceptual extension by adopting these superior measures and methods. This will either yield similar results, or different ones will emerge, requiring fine-tuning of the theory. If not much is likely to be expected from using alternative measures or methods, then one might probe the degree of external validity of the original study by doing an empirical generalization replication. In which contexts are the original findings expected to be replicable and in which not? To answer that question, one must identify the boundary conditions of the theory, and test original and extended theories in samples where boundary conditions are expected to hold and in those where they are not (cf. John Stuart Mill’s method of agreement versus difference in Boone, Meuwissen, & van Witteloostuijn, 2009). Third, if the scope of application of the original theory is well established, one might opt for a generalization and extension replication where one uses a different population, different variable measurement, and a different methodology in order to examine the value added of theoretical extensions.

Step 5. Determine variations. Depending on the type of replication chosen, one must determine whether a different data source and research design (including different measures and methods) should be used rather than those of the original study. The examples given above of what this may imply for each replication type serve here as well.

Step 6. Statistical power. By statistical power we mean the ability of a statistical estimation to detect meaningful effect sizes. If optimization of the power of a replication study is a goal, one can pursue different avenues depending on the type of replication, balancing the cost of extra data collection against the benefit of larger sample size. Given the sample and effect size of the original study, one must calculate the power – and hence sample size – needed for a meaningful replication.

Step 7. Boundary conditions. This step entails establishing the boundary conditions of the theory by exploring its “where”, “who”, and “when” aspects. This is especially relevant in empirical generalization and generalization and extension replications, because these replication types are based on selecting a different population than that in the original study. Establishing the boundary conditions of the original theory is likely to require a thorough grasp of where, when, and for whom we expect the originally proposed causal mechanisms to hold and not to hold. Depending on the answers to these critical questions, a decision must be made regarding samples and measures, and perhaps methods, that might throw light on the boundary conditions – in the case of empirical generalization – and/or suggest alternative explanations – in the case of generalization and extension.

Step 8. Content validity. In this step, there needs to be an assessment of alternative methods. What is most appropriate, given the research question and the data? The reliability and validity of new variable measurements needs to be assessed. This is especially important in the case of conceptual extension and of generalization and extension replications, which involve changes in the research design.

Step 9. Pre-registration considerations (optional). Pre-registering the front end of the replication before the data is collected and analyzed increases the credibility of findings. This is a well-established practice in some fields (e.g., Nosek & Lindsay, 2018; Nosek, Ebersole, DeHaven, & Mellor, 2018), although it has received very little attention in IB (Meyer et al., 2017; van Witteloostuijn, 2016). With pre-registration, the front end of an empirical paper – theory and a detailed description of planned data collection and analysis – is submitted or otherwise made public before the data is collected or analyzed. This ensures that authors will not be tempted to non-transparently change the theory, methodology, or both, depending on the results.

There are two types of pre-registration: unreviewed and reviewed (van’t Veer & Giner-Sorolla, 2016). With unreviewed pre-registration, authors submit their proposal to a repository before collecting the data and running analyses, thus committing to making no changes to the front end and research design. Once the data are collected and the analyses done, this is added to the paper and then submitted to a journal, with mention that the work was pre-registered. With reviewed pre-registration, the front end of the paper is subject to a full review process, like that of any journal submission. If accepted, the authors proceed with collecting the data and running the analyses. The subsequent full-paper is guaranteed publication regardless of the results as long as the empirics are state of the art.

Part 3: Conduct Replication Study and Report Findings

In the final five steps, we outline how to conduct a replication study, and how to report and evaluate its findings.

Step 10. Develop front end. Before collecting data and conducting analyses, the methodological and theoretical motivations for conducting the replication study and its expected contribution to the field must be determined and clearly spelled out.

Step 11. Submit pre-registration (optional). If the pre-registration option is to be acted upon, this is the point at which the front end of the paper should be pre-registered. We stress again the value of pre-registration in avoiding conscious or unconscious HARKing, thereby conferring credibility (Bem, 2004; Kerr, 1998; Yamada, 2018).

Step 12. Data collection and analyses. Now, with just a few steps remaining, data is collected and analyses performed. The brevity of the description of what is done in this step belies the time and effort that must be expended.

Step 13. Report findings. Here any differences between the findings of the original study and those of the replication in terms of sign, significance, and estimate size, taking potential power differences into account, must be carefully and explicitly assessed and interpreted in light of the ‘“what”, “how”, “why”, where”, “who”, and “when” questions.

Step 14. Develop final paper. The write-up of a replication study is slightly different from that of the original one insofar as the goal of the replication must be explained. In the case of theory fine-tuning replications, it is essential to include an explicit discussion of what the replication adds to the original in terms of theory.

Step 15. Reflect on a next replication. This may be the end of the sequence. Depending on the nature of the replication, follow-up – that is, still another replication – may be required, in which case we are back to step one.

BEST PRACTICE GUIDELINES FOR IB JOURNALS

What are the measures IB journals can take to promote the submission and publication of replication studies? In what order should the measures be implemented? To avoid some of the pitfalls that have plagued IB and other disciplines in the past, we believe that the measures we suggest below need to be implemented in a well-planned and rigorous way. In Figure 3, we propose a sequence, starting with measures that are quite easy to implement and moving on to more radical reforms that will require considerable time and effort. First, it is important that it be made clear that what we propose is not now JIBS policy. It is, rather, based on our observation on how other fields and journals have progressively developed pro-replication policies.

Figure 3
figure 3

IB replications proposed trajectory.

Special Issue

A first challenge to be met is to get IB journals to encourage scholars to undertake replication studies. And of course, there would be no point in doing that unless they publish them. Looking to other fields, the natural first step is to launch a special issue on replication studies. JWB did this recently. This gets across that journals encourage their submission. In addition, the chances of a submission being approved would be better as an editor knowledgeable about replication studies could be selected and in turn would appoint qualified reviewers. Good submissions not selected for the special issue might later appear in regular issues.

Reviewer Selection

Contingent on reviewer availability and manuscript characteristics, reviewers – preferably established scholars – should be selected from the following three categories.

Subject expert. To see to it that the theoretical aspects of the original paper are faithfully replicated, and possibly appropriately extended, at least one reviewer should be an expert on the original submission topic, phenomenon, or theoretical lens.

Empirical expert. To guarantee that the methodology of the original study is properly replicated, and that replication authors have clearly explained whether deviations from the original study are due to limitations or are purposeful improvements, at least one reviewer (preferably two) should have expertise in the methodology or empirical setting of the submission.

Original author (optional). An editor might have reasons for wanting to involve an original study author (or someone else closely familiar with the study such as an author’s doctoral student who further developed the argument in their own work) in the review process, perhaps to ensure that the replication is thorough and rigorous. That said, one should not overlook the possibility that such reviewers may not be able to be totally objective.

Appoint an Area Editor for Replication Studies

Once it has been established that a journal is interested in publishing replication studies, the next step should be to assign a dedicated replication editor whose focus is exclusively replication studies. This would ensure consistent application of the guidelines we outline above, as well as any new guidelines that may be established in the future. Since anyone named to be replication editor is unlikely to be an expert in all of the topics and areas addressed by the replication studies that are submitted, there should also be co-editors, perhaps even acting editors, who between them have a wide range of expertise.

Introduce a Process for Pre-registration Submissions

Next, IB journals might introduce pre-registration. As a first step, which does not require developing a new portal or changing the current review process, would be for IB journals to announce that they will consider and encourage non-reviewed pre-registration on an external portal (such as the Open Science Framework)6 with public access. Authors eventually submitting their completed manuscripts would then be able to check a box indicating that it had been externally pre-registered and where. Editors and reviewers would be able to access the file to confirm that the front end was not changed along the way.

A more costly option – but one providing an even greater incentive to carry out a replication study – would entail developing a journal internal portal for pre-registrations. The advantages would be similar to those of external pre-registration, but the journal would have more control over what was submitted and when. Given the cost of an internal portal, perhaps a number of IB journals could jointly set up and run one.

Pre-registration has pros and cons (see Simmons, Nelson, & Simonsohn, 2021). We have already touched on how it might reduce HARKing and p-hacking. There are two potential drawbacks. The first one applies to reviewing pre-registrations. Having two reviewing processes means almost doubling the workload of editors and referees. The second one is that separating the publication of theory and method from that of analysis and interpretation may lead to a flood of submissions – many of which may well never result in a finished study. How that might be dealt with is unsure. Pre-registration is not a panacea; nonetheless its advantages make it worth considering.

Replication Corner

It would almost certainly encourage the submission of replication studies if they were to be published in regular issues of journals. They could be highlighted by placing them in a dedicated “replication corner”. It might be enough to feature in that way one or two replications every few issues, even once a year to start. As confidence in their being published grows, so will the number submitted. When that happens, there will be enough high-quality submissions in the field to allow for one or two replications in every issue.

Replications Recurrent Special Issue

JIBS publishes a yearly special issue that features review articles. The same could be done with replications. Such an issue need not be published every year. It might be less frequent than that, or start out being less frequent and then become an annually issue.

Point/Counter-Point with Original Authors

Another interesting initiative to be considered is to follow the publication of a replication with a response written by the author of the original study, a point/counterpoint exchange. This assumes that the original study and its replication are on the same level. While we argue that they go hand-in-glove in advancing our knowledge and in fine-tuning theory, this has yet to be established in some social sciences. We believe that this will come; and we predict that when it does, the kind of exchange we describe here will follow.

New IB Replications Journal

Once replication studies are fully established in IB, it might be possible to establish a journal dedicated to them. We have seen this kind of spinoff with JIBP.

DISCUSSION AND CONCLUSION

Replication studies are essential not only to come closer to “true” significance and size of effects, but also to further fine-tune theory (Tsang & Kwan, 1999; Walker et al., 2019; Yamada, 2018). This is especially the case for a context-bound field such as IB. In this editorial, we systematically made a case for replication in IB. We began with why we need to replicate in the first place, then analyzed why we fail to do so. We looked in detail at the six main types of replications, distinguishing between those used for theory-checking and those for theory fine-tuning, then provided a template of the replication process. We suggest the possibility of IB journals adopting pre-registration, an important research practice which has gained momentum in many disciplines (Bem, 2004; Kerr, 1998; Yamada, 2018), and propose best practice guidelines for IB journals to make replication studies an integral part of research in the field. Several important contributions are made with this editorial.

First, it emphasizes the importance of replication studies to the field of IB, where it is not yet an established practice. Limited training on replication in PhD programs has unsurprisingly resulted in incomplete understanding of their importance and of their different types, with some thinking replication only refers to the simplest same-sample same-research design type as shown in the top-left box in Table 2. In this editorial, we attempt to rectify this by naming the different types of replication studies and outlining how they differ. Specifically, a distinction is drawn between theory-checking and theory fine-tuning replication. As the latter type makes a theoretical contribution, it meets that publication requirement of many IB journals.

Second, this editorial provides a step-by-step template on how to conduct a replication study. In so doing, we carefully spell out the sequence of decisions and actions to be taken to come up with a good replication study, one that contributes to solidifying extant knowledge.

Third, we lay out a case in this editorial for adopting pre-registration, a possibility that has received little attention in the field of IB. As we have discussed, pre-registration can help increase the credibility of research by checking author use of data-mining in disguise – consciously or unconsciously developing or altering hypotheses and arguments presented as ex ante to fit ex post the findings, hence engaging in HARKing (Bem, 2004; Kerr, 1998; Meyer et al., 2017; Yamada, 2018). Pre-registration can clearly be useful regardless of the type of empirical work, but especially for replication studies, as it provides an additional layer of credibility to findings. At the same time, before a publication adopts a pre-registration policy, its potential advantages and disadvantages need to be carefully weighed.

Fourth, we offer best practice guidelines for IB journals to implement replication studies as a part of the publishing mainstream. Some of them apply to the near term and some to the far-away future. A rough order of implementation is given as establishing a culture of replication is likely to take a long time; and in that time, it is important that replication comes to be seen as an appropriate and desirable form of research that makes a meaningful empirical and theoretical contribution to the field. Among other guidelines, we posit that journals should name replication champions, for instance a special issue editor and / or a replication area editor. That especially is likely to advance the field significantly.

We end with a call – and a plea – for replication studies. As a research community, we need to add to the very small number of replication studies published in JIBS and other IB journals. An extensive search yielded just 14 over the more than five decades since the leading IB journal was inaugurated. We suggest that journals might stimulate submissions with a replication special issue to spark interest. Our plea is that over time steps should be taken to deepen that interest through the appointment of a dedicated editor to manage these kinds of manuscripts and then to feature them in a regular ‘replications corner’.

Finally, as we touched on above, changing academic routines, rooted and reinforced as they are by an ingrained cultural and institutional environment, will be anything but easy. Our academic community has adopted practices that are far removed from Popper’s methodological principles – truth is what cannot be shown to be false. Instead, our journals have norms that implicitly stimulate questionable research practices such as HARKing and p-hacking (Ioannidis, 2005). Key stakeholders – from journals editors to deans and PhD advisors – socialize novices into a tradition that values only novelty and for which anything that is not “groundbreaking” is of no interest (van Witteloostuijn, 2016). As one of the key stakeholders in the IB community, JIBS tries to lead by example, promoting new reporting practices (Meyer et al., 2017), open science principles (Beugelsdijk et al., 2020), and meta-analyses (Steel et al., 2021). We hope that this editorial adds to that list by making the case for replication.

Notes

  1. 1

    We gratefully acknowledge the assistance of Joeri van Hugten.

  2. 2

    We thus exclude conference proceedings and book chapters.

  3. 3

    Tüselmann et al. (2016) also include Management and Organization Review (MOR) as a major IB outlet, but MOR does not position itself as a ‘pure’ IB journal, so we do not include it in the count exercise. See Judge, Fainshmidt, and Brown (2020) for a replication published in MOR.

  4. 4

    Of course, we could have broadened our search scope by including IB-related outlets specialized in specific functional disciplines (e.g., accounting, HRM, finance or marketing), but we decided to focus only on broad IB journals with a multidisciplinary focus. For instance, our search string also picked up one replication study in Research in International Business and Finance (in 2020), not included in our table because this outlet specializes in finance.

  5. 5

    We introduce this distinction in the context of theory fine-tuning replication only, as theory-checking replication is not aimed at theory extension anyway. Of course, abductively, the latter can potentially lead to theory improvement, too, would the replication’s results be so different from the original study that an alternative explanation is needed ex post.

  6. 6

    https://osf.io/.