Using single impact metrics to assess research in business and economics: why institutions should use multi-criteria systems for assessing research

Purpose – Despite the general recommendation of using a combination of multiple criteria for research assessment and faculty promotion decisions, the raise of quantitative indicators is generating an emerging trend in Business Schools to use single journal impact factors (IFs) as key (unique) drivers for those relevant school decisions. This paper aimsto investigate the effects of using single Web of Science (WoS)-based journal impact metrics when assessing research from two related disciplines: Business and Economics, and its potential impact for the strategic sustainability of a Business School. Design/methodology/approach – This study collected impact indicators data for Business and Economics journalsfromtheClarivateWebofSciencedatabase.WeconcentratedontheIFindicators,theEigenfactorand thearticleinfluencescore(AIS).Thisstudyexaminedthecorrelationsbetweentheseindicatorsandthenrankeddisciplinesandjournalsusingthesedifferentimpactmetrics. Findings – Consistent with previous findings, this study finds positive correlations among these metrics. Then this study ranks the disciplines and journals using each impact metric, finding relevant and substantial differences,dependingonthemetricused.ItisfoundthatusingAISinsteadoftheIFraisestherelativerankingofEconomics,whileBusinessremainsbasicallywiththesamerank. Research limitations/implications – This study contributes to the research assessment literature by addingsubstantialevidencethatgiven thesensitivityofjournal rankings to particularindicators,the selection of a single impact metric for assessing research and hiring/promotion and tenure decisions is risky and too simplistic. This research shows that biases may be larger when assessment involves researchers from related disciplines – like Business and Economics – but with different research foundations and traditions. Practicalimplications – Consistentwiththeliterature,giventhesensibilityofjournalrankingstoparticular indicators, the selection of a single impact metric for assessing research, assigning research funds and hiring/ promotionandtenuredecisionsisriskyandsimplistic.However,thisresearchshowsthatrisksandbiasesmay be larger when assessment involves researchers from related disciplines – like Business and Economics – but with different research foundations and trajectories. The use of multiple criteria is advised for such purposes. Originality/value – This is an applied work using real data from WoS that addresses a practical case of comparing the use of different journal IFs to rank-related disciplines like Business and Economics, with important implications for faculty tenure and promotion committees and for research funds granting institutions and decision-makers.


Introduction
There is a continuous and increasing interest in how to assess research at academic institutions (Adler and Haring, 2009). University and school administrators need to manage their resources to increase research output and school reputation, raise rankings, achieve or keep international accreditations and maintain or increase external funding (Peters et al., 2018). Research assessment then is linked to relevant strategic goals of these institutions. At the same time, research assessment plays an important role at the micro or individual faculty level. Research assessment practices may be linked to research promotion policies, economic incentives, academic careers and school-and university-level promotions. Good assessment practices may improve individual and institutional research output, due to the direct and indirect effects of assessment methods on individual performance. Moreover, shrinking budgets and increased societal pressures regarding the sustainability of universities in terms of fulfilling the needs of multiple stakeholders (Jack, 2021) suggest that sound research assessment practices may be more important if universities and schools want to fulfil their strategic goals and remain sustainable over time.
Universities, schools and national agencies establish assessment procedures to evaluate existing/previous research and assign research funds and benefits (e.g. courses reduction, travel funds, etc.) honours and awards, academic promotion and direct economic incentives. Different assessment methods have been used including journal lists (institutional or external lists like ABCD in Australia, ABS in the UK or the Financial Times), individual citation patterns, peer-reviewed assessments and collegiate review committees. Strategic control and assessment systems are crucial for guiding an institution's behaviour and performance (Kaplan and Norton, 1996).
With the increasing bibliographic information on journals and citations (e.g. Salcedo, 2021a), and the rising burden/complexity of faculty and school assessment tasks, quality peer evaluation has been somewhat substituted for the use of journal impact metrics (Garfield, 1972(Garfield, , 2006Adler and Harzing, 2009;Rizkallah and Sin, 2010;Haustein and Lariviere, 2015;Brown and Gutman, 2019). Two factors are probably driving this trend, their availability and their objectivity status. These effects might be even more relevant for institutions where management needs to use discretion and judgement rather than just financial measures to assess performance or for institutions that have a less "formal" or well-understood strategy (Gibbons and Kaplan, 2015). In those cases, Gibbons and Kaplan (2015) argue that formal measuresincluded in assessment systemsmay give "clarity to the strategy" (p. 449) and the school and faculty actions. The design of a school's research assessment system is then a key element for facilitating the implementation of a higher education institutions' strategy. This choice of research impact indicators will affect both individual and institutional research behaviour (Fischer et al., 2019;Jack, 2021).
Research assessment systems that are based on single impact indicators may be risky for institutions because they may channel faculty and school efforts towards indicators that are consistent with particular disciplines, stakeholders or goals that do not consider the entire spectrum of outcomes that are expected for a sustainable Business School or university. These results may be very complex when university or business school revenues are contingent upon serving those other needs (Peters et al., 2018;Morales and Calder on, 1999). We argue that these challenges are even higher when schools and institutions embrace different disciplines and are included in the same assessment process.
Despite the problems derived from overestimating the value of these impact metrics institutions continue using them, with potentially complex implications for the assessment process itself, and for achieving the schools' strategic goals and sustainability (see for example Jack, 2021, for the challenges of using too narrow metrics in business school rankings).
Only a few authors have addressed this issue empirically, warning about the problems of using single journal/level indicators for assessing research contribution. Mingers and Yang JEFAS (2017) in a recent study for business and management journals provide evidence that in the business disciplines multiple impact indicators should be used in order to overcome the biases that particular indicators may entail when ranking journals and using those rankings for assessing business research. We aim to provide further empirical evidence regarding the risks of using single indicators in assessing research outputs, especially when assessing journals or researchers from different disciplines.
This paper explores the effect of using particular single impact metrics when assessing research contributions in related disciplines, in this particular case: Business and Economics. Even though both disciplines are regularly taught in Business Schools and programs their relationship is not as strong as one might think. Azar (2009), for example, reports that only 6.9% of citations in business journal articles come from economics, and with a reducing trend over time. For Business, other disciplines like psychology, sociology, decision sciences and communications, have a strong influence. Since specific research impact indicators have different objectives and assumptions and are sensitive to specific citation patterns (the raw input for those indicators), the use of particular impact indicators may significantly affect the relative assessment of scientific work when different scientific disciplines are evaluated together.
In this paper, first, we briefly cover the literature of research and journal assessment and impact metrics and its connection with university rankings and strategic performance and sustainability. Then we define the main Web of Science (WoS)-based impact metrics and analyse these metrics for Business and Economics journals. We analyse the effects of using single impact indicators: standard impact factor (IF) measures and the new eigenfactor and article influence scores, for ranking Business and Economics journals and assessing the work of Business School scholars. As in previous research, we compute the correlations of these different indicators finding generally consistent results with existing literature. We then generate relative rankings for all journals in the Business and Economics WoS categories, using these different indicators. Significant changes in rankings are identified depending on the type of measure used (e.g. standard WoS impact factors vs eigenfactor scores or AIS scores). By calculating the implicit academic value of different disciplines using the AIS journal scores, we provide further insights regarding the reasons for these different results, providing additional support for the need to use multiple families of indicators when attempting to design a sound and fair research and promotion assessment system that helps institutions to achieve their strategic goals. Implications for theory and practice of research assessment, future research avenues and conclusions are provided in the last section of the paper.

Impact research assessment in higher education and business schools
The evaluation of the research output is very important in academic life since it drives hiring, funding and tenure and promotion decisions. The implications are very relevant for individual researchers since their academic careers and economic incentives may be driven by these decisions. In the following sections, we will examine relevant literature addressing research assessment systems and metrics.

Research assessment systems and indicators
As stated earlier, research assessment is a relevant but very complex process that affects the behaviour of individual faculty and the whole institution. For this reason, the academic tradition established peer review committees of senior faculty members as a reasonable way to deal with this strategic process. These committees normally review individual manuscripts and outputs for quality, relevance and overall value. As a way to provide a more standard rule Impact Metrics and research assessment to compare different research productions, some schools developed internal lists of desired journals, ranking them in terms of subjective quality. Other schools also used journal quality lists developed by external parties and associations (e.g. ABCD list in Australia, ABS in the UK, Univeristy of Texas-Dallas list in the USA, Capes/Qualis in Brazil, see for example Harzing.com). Additionally, research publications may be evaluated through quantitative indicators like the direct citations to the paper or through some sort of impact metric of the journal (based on the total citations to the journal, Garfield, 1972Garfield, , 2006Franceshet, 2010). The availability of large bibliometric databases (WoS, Scopus or even Google Scholar), has made citation-based metrics easier to find and use and a more common assessment approach (Haustein and Lariviere, 2015;Harzing, 2019). Journals and editors engage in reputation, through the expansion of indexing and becoming more known and cited by relevant research communities (see for example Salcedo, 2021b).
Despite some concerns regarding the validity of impact metrics (see for example Carey, 2016;Paulus et al., 2018), the burden of assessing research output for an increasing faculty body has made a common practice the use of journal impact metrics to assess individual faculty research outputs in many institutions. Here we present the main impact metrics used in academia separated into two groups: the standard or more traditional IF scores and the newer eigenfactor-related scores.
Standard/traditional impact factor scores Total cites (TotCite). The total number of citations in a year received by a journal for its articles published in the previous two years.
The journal impact factor (IF). It represents the total citations obtained in a year by articles published in the previous two years divided by the total number of articles published (by the journal) in the previous two years. Self-citationscitations to journals from articles published in the same journalare included in the count and computations.
The 5-year impact factor (5YIF). The five-year impact factor is similar to the regular IF, but it considers the articles published in a journal in the previous 5 years. Then the five-year impact factor is defined as the total citations obtained in a year by articles published in the previous five years divided by the total number of articles published by the journal in the previous five years.
The impact factor without self/cites (IFwoSC). It is the same as the journal IF, but the selfcitations are excluded in the numerator. It represents the total citations (without self-cites) obtained in a year by articles published in the previous two years divided by the total number of articles published (by the journal) in the previous two years. Self-citationscitations to journals from articles published in the same journalare not included in the count and computations.
Immediacy index (IMMI). This index can also be defined as a zero-year IF and is computed as "the total citations to papers published in a journal in the same year divided by the total articles published by the journal in that year" (Chang et al., 2016).

Eigenfactor-related metrics
The creators of the eigenfactor metrics indicate that they derived an algorithm based on the idea of Google Page Rank for sorting and ranking web pages, i.e. based on the networks that visited particular websites. Instead of the connections or visits used for ranking webpages, they use the citations in the WoS database a particular journal receives to compute the eigenfactor through this iterative algorithm. Bergstrom (2007) argues that a "single citation from a high-quality journal may be more valuable than multiple citations from peripheral publications". The importance of a single citation can be computed by the "influence of the JEFAS citing journal divided by the total number of citations appearing in that journal." By this procedure, they argue they "aim to identify the most influential journals, where a journal is considered influential if it is cited by other influential journals". However, they recognize that the eigenfactor aggregates the individual influence of all papers appearing in a particular journal, and for this reason, it will be higher for larger journals. Larger journals will generate more visits, more citations and, therefore, larger eigenfactor scores. The authors suggest that this procedure corrects for differences in citation patterns and propensities across disciplines but also hinders more peripheral or newer disciplines and journals. Therefore, its computation is not neutral to the newness or centrality of disciplines, particularly when the number of citations and academic reputation is built through time, and when these variables may also affect centrality in the whole scientific field.
Eigenfactor (EIG). For the reasons indicated above the eigenfactor score is calculated annually by a PageRank type algorithm, based on five-year citation data and published in eigenfactor.org and is defined as "the journal's total importance to the scientific community" (for a more detailed description of the method, www.eigenfactor.org, Eigenfactor, 2009). An important element of the eigenfactor computation is that it excludes journal self-citations and that citations are normalized by the total number of outgoing citations of each journal. A journal's recent eigenfactor scores are scaled so that the sum of all journals included in the Journal of Citation Reports (JCR, 2017Journal Impact Factors, 2018 of the WoS add up to 100. Then if a journal has an eigenfactor score of 0.085 (the average journal eigenfactor), it means that this journal has 0.085% of the total influence of all indexed publications. The eigenfactor score can also be labelled a journal's influence score (Chang et al., 2016).
Normalized eigenfactor score (NEig). It is a rescaled eigenfactor score so that the average journal scores 1 (instead of 0.085) and can be computed as the Eig*N/100, where N is the number of journals included in the JCR. Therefore, correlations between the eigenfactor and its normalized version are 1.0, and rankings of journals using both impact metrics generate the same results.
Article influence score (AIS). The article influence score is calculated by dividing the journal eigenfactor score by the fraction of the number of articles in the journal to the total articles published in the 5-year window (0.01 3 eigenfactor score/(5-year journal article count/ 5-year all journals article count)). The AIS is then scaled to a mean of 1.0, meaning that the average article published in the WoS database (Sciences and Social Sciences) in a particular year is 1.0. Then, Bergstrom suggests that a journal with an AIS of 17 means that the average influence of an article appearing in that journal has 17 times the influence of the average article in all sciences. There are two important clarifications about this number. Firstly, that AIS are scaled to 1.0 does not mean that the average AIS for a journal in the database is 1.0. The average AIS for journals in all sciences is 0.84. Secondly, social sciences and sciences have different average AIS (science larger than social sciences). Despite the intended objective to control for differences across disciplines, several studies have indicated that this is not the case favouring more traditional and central basic sciences and disciplines compared to newer and more applied social disciplines (Waltman and Van Eck, 2010;Dorta-Gonz alez and Dorta-Gonz ales, 2013;Walters, 2014;Merigo et al., 2016).
Other impact metrics: Journal lists, Scopus-based, Google Scholar and web-based measures In addition to WoS-based impact metrics, there are other sources of journal impact and quality measures. SCOPUS and Google Scholar are the two most relevant ones (apart from WoS) and have the advantage over WoS of including a broader array of journals in most disciplines (37.000þ in Scopus and 11,500þ in WoS). For example, in a recent revision performed by the authors of journals in Business, SCOPUS includes 1,742 journals and WoS only 448. Scopus and other institutions publish impact metrics based on the citations and Impact Metrics and research assessment records included in SCOPUS and are easily available on the net. They publish the cite score, SNIP and SJR, the latter indicators being attempts to normalize and measure "prestige" or influence of journals within the SCOPUS database (Gonz alez-Pereira et al., 2010). Google Scholar, on the other hand, uses the information available on the Internet, thus providing an even wider set of titles and citations. Google scholar publishes the H-5 index, which is the h-index for a journal, calculated based on the articles published in the last five years (the h-index is the number of papers in a journal having at least h-citations, see Harzing, , 2019. The availability of web-based information on research manuscripts has generated the use of alt metrics that are evaluation metrics that do not use citations and that focus on attention and visibility by measuring views, hits, downloads or other indicators of reader engagement with the research piece (Weller, 2015). Newer developments, using text mining and data science techniques, have focused on examining the relevance of academic research. For example, Jedidi et al. (2021) have recently published the R2M (relevance to marketing) index, by contrasting top concepts appearing in practitioner marketing journals with the ones published in academic journals.
Finally, an alternative approach to impact metrics is the development of journal lists that consider several indicators but are normally curated by a group of peer scholars (see the different lists available in Harzing.com). These journal lists are published by universities and academic institutions and provide a more holistic perspective on the impact and relevance of journals. One of the most comprehensive lists is the one published by the Australian Business Deans Council, the ABDC list, which includes over 2,700 journals in Business or Management. However, most of these lists have a language bias, underrepresenting journals published in Spanish and Portuguese and other languages.
The comparison with those different research assessment metrics is beyond the scope of this paper.

Issues and challenges in using research impact metrics
The use of research assessment metrics to measure the impact and to rank journals is quite controversial. This controversy is now expanding into the defenders of particular impact metrics. For example, Carey (2016) mentions eight major criticisms regarding the computation of the traditional IF including (1) citation mingling; (2) self-citations; (3) restricted evaluation period (in the case of the impact factor just 2 years); (4) subject dependency; (5) publication emplacement dependency; (6) indiscriminate parity among authors; (7) disproportionate significance of highly cited articles; and (8) different citation patterns by discipline. As stated by Carey, editorial teams can game the system by including highly citable items or by encouraging citation stacking. Some authors suggest that impact metrics may be considered good measures of the visibility of publications instead of their quality (Gorraiz et al., 2017).
The eigenfactor metrics creators  suggest that the eigenfactor metrics do provide a fix to some of these problems like: selfcitations and different citation patterns across disciplines and should be preferred to assess the real influence of journals. Opposing this view, some authors argue that since IFs (2-year, 5-year and 2-year without self-cites) highly correlate with AIS and that total cites highly correlate with eigenfactors, parsimony and simplicity will advise the use of existing simpler metrics (Davis, 2008;Arendt, 2010;Elkins et al., 2010;Salvador-Oliv an and Agust ın-Lacruz, 2015).
Other authors have taken a more neutral and pragmatic approach. They do not argue against the general high correlations between IF and eigenfactor metrics, but they suggest that they are not perfect and that assessment may benefit from using the different specific JEFAS information provided by these different measures (Chang et al., 2011(Chang et al., , 2016Kianifar et al., 2014). Additionally, the high correlations may also suggest that disciplines are different in terms of their citing patterns and traditions. Therefore, they indicate that using just the IF (or eigenfactor metrics) will be risky and advise for the combined use of research assessment metrics. They provide some examples for the neurology paediatric and economics journals, using harmonic means of rankings based on these different metrics to provide a unified ranking.
Several issues have been raised regarding the inconvenience of using citation-based indicators for assessing research (Paulus et al., 2018). At the more general level, two major critiques are presented. First, direct citations are a proxynot a perfect measureof the quality of a paper. Papers with errors and controversial papers may be very highly cited but cannot be considered a signal of quality. Excellent or very relevant papers may be published in less known or newer journals (particularly if new subdisciplines or themes are rising) or in working papers or document series not considered by the established databases, getting very few citations due to the outlet published. Other criticisms focus on the validity of citations and the way they might be manipulated (Carey, 2016).
Secondly, aggregate impact measures such as the IF of a journal are also a distant proxy of the quality of a particular paper published in that journal. Journal quality does not equal paper quality. As Brito and Rodr ıguez-Navarro (2019) show, the difficulties of assessing and discriminating paper quality based on journal impact are even higher if the differences in those impact factors are lower, penalizing new research or research in fields that are less cited. An interesting point is made by Paulus et al. (2018), who suggest that the use of single IF metrics may in fact imply that peers or assessors have weak arguments to justify the quality of a research piece or that they are uncertain of its particular value. Consistentlyon a specific application for the business field -Mingers and Yang (2017) offer a similar but extended perspective favouring the use of multiple indicators. They rank business and management journals based on research assessment metrics computed with WoS, Scopus and Google Scholar information, deriving a synthetic rank from the total sum of the different ranks for each journal. Based on their results, they suggest the Google Scholar h-index and Scopus-based SNIP index should be preferred for assessing business journals.
Research assessment, rankings and business school strategies Research assessment systems are relevant at the individual researcher level but are also crucial for Schools attempting to fulfill their established missions and serve stakeholders in a highly competitive and globalized world. Research assessment systems are relevant for explaining both individual and school/university behaviour and performance and, therefore, for strategy implementation.
Business school education in particular has experienced important transformations in the past 50 years (Peters et al., 2018). Starting as a more practice-oriented discipline, the business discipline has transformed itself moving towards a more scientific and theoretically strong field of study, borrowing from the traditions of other related fields such as sociology, psychology, decision sciences and economics. The theoretical advancement in particular business disciplines like management, finance and marketing, the strengthening of business doctoral education, global competition and international accreditation agencies and rankings have played an important role in this development process.
Today, business schools face two main evaluation systems: accreditation and rankings (Pitt-Watson and Quigley, 2019). Despite the advances in business schools and business school education, there is a wider debate about business school curricula and research outputs that are consistent with societal needs of a more sustainable and inclusive 21stcentury economy (Pitt-Watson and Quigley, 2019). This debate has generated important Impact Metrics and research assessment changes in accreditation standards of the major agencies (AACSB, AMBA and Equis) to include and value societal impact beyond academia (AACSB, 2020). Business school rankings are also embracing these challenges and institutions like Financial Times are adapting their methodologies to include the broader impact of business schools (Jack, 2021). These changes are recent and may not be completely understood in the inner discussions of academic, research assessment and promotion committees within business schools, which represent the academic stakeholders. For example, although several research outputs or intellectual contributions can be identified (AACSB, 2012), academicians tend to focus on articles published in peer-reviewed journals. Most of these evaluations are based on the quality of research publications (i.e. journal impact metrics) despite the multi-dimensional nature of business school missions. Business school and higher education administrators face an important challenge then, as to how to integrate these external changes and expectations for Business Schools to their business models formulation and implementation. The promotion and assessment of the adequate school's research mix or portfolio are part of these key challenges. Earlier on Ghoshal (2005) and other business scholars were warning regarding the distancing between business schools and scholars and business practice and suggested that the excess of bad or not fully tested theories were destroying management practice. More recently, the Responsible Research in Business and Management network (see RRBM position paper 2020) also poses that business schools and scholars should "transform their research toward responsible science, producing useful and credible knowledge that addresses relevant problems for business and society".
In order to fulfil their institutional mission, university administrators have different levers for implementing a defined Business School strategy. They can assign resources and define systems and processes that may help generate the desired behaviours and outcomes. The research assessment system within a business school is one of these key levers since it has strong effects on directing faculty resources, behaviours and energies, which may be reproduced in the future (see for example Riazanova and McNamara, 2015). Figure 1 exhibits a guiding framework based on the reviewed literature that describes the role of research assessment systems for the implementation of a business school strategy and its sustainability. Figure 1 presents a process model with three phases: strategy formulation, implementation and outputs/feedback from stakeholders. Schools develop their strategies to provide research and teaching (and some other) outputs to fulfil their mission and serve societal needs. Business schools implement their strategies through securing and deploying resources and the functioning of designed systems and processes. In this framework, we focus mainly on the research value chain, which will generate effects on research and overall school outputs. As suggested by O'Brien et al. (2010), business schools' research production may generate economic value for students and constituencies, measured by salary differences in the USA. Based on this belief, schools have developed systems for stimulating research outputs through strong faculty recruitment and selection processes, research assessment systems and faculty promotion procedures aimed towards publishing in the best journal outlets. All these systems combined with the faculty body deployment and the resources allocated to research (funding, incentives and support) will interact and produce individual faculty outputs, which in turn will generate the school's research and teaching aggregate outputs. These outputs may be mediated by the effects of intrinsic and extrinsic motivations (see Gibbons and Kaplan, 2015 for the effect of formal measures on individual behaviour and organizational culture, and Fischer et al., 2019, for the effects of intrinsic and extrinsic motivation on creativity and innovation). Other individual differences like particular starting conditions such as the original doctoral school research emphasis or research collaboration opportunities and strategies, may also help to explain particular individual research outputs (Li et al., 2019;Riazanova and McNamara, 2015). These individual and synergistic (or not) organizational behaviours will produce the school's overall real research and teaching outputs. These outputs will be contrasted with planned and expected outputs by stakeholders strengthening or diminishing the school's sustainability.
As suggested by Peters et al. (2018), in addition to research, business schools have strong value chains dedicated to the delivery of different business education programs and are at the core of the new emerging business models in today's competitive world. Business schools are increasingly depending on the revenues generated by these programs. Stakeholders like the students, employers, academia, the government and accreditation and ranking agencies will assess the business school performance and provide feedback in terms of opinions; recommendations; money; purchasing of services, etc.; promoting or hindering the school sustainability (see e.g. AACSB, 2020; RRBM, 2020; Jack, 2021). School rankings and accreditation agencies representing and anticipating such stakeholder opinions and assessments will generate the needed feedback to institutions to modify strategies, resources and systems.
According to strategic theory, the specificities of a school research assessment system should be consistent with external standards and expectations (society, accreditation agencies and ranking makers) and with the school strategy to compete and become sustainable in today's competitive environment. In particular, our framework in Figure 1 suggestsconsistent with strategic fit and control literature (Gibbons and Kaplan, 2015;Kaplan and Norton, 1996) and dynamic capabilities and micro-foundations approach to strategy (Teece, 2007(Teece, , 2017 that the alignment (misalignment) of systems/processes with the school strategy can be considered a key driver of strategic success (failure).
Based on these trends, one may argue that research assessment may also require adaptations that converse with these changing external assessment criteria and that are broader in nature. AACSB, the global accreditation agency, for example, has included within its new standards the need to report on the impact of scholarship, considering the quality of intellectual contributions, the ability to contribute to a wide variety of eternal stakeholders, through a mix of basic, applied and/or pedagogical research. Very narrow research assessment systems based on specific and single metrics may generate large risks for Business Schools in the pursuit of their strategic goal and sustainability. Adding to this growing literature, in this paper, we examine the effect of particular impact measures on the ranking of journals from two related but different disciplines like Business and Economics. We want to study the effects of using single indicators on individual and schoollevel research assessment, focusing on the strategic and managerial implications of such systems design. Based on the research assessment and the university strategic management literature, we hypothesize that combined and multiple indicators designs will be much more appropriate to assess individual research outputs, particularly when faculty members participate in different disciplines with different research traditions and communities.
The use of single impact metrics may produce strong misalignments between a school research output and the expected impact and the sustainability potential of the school.

Data
We collected all the data from the Clarivate Web of Science database, particularly from the Journal of Citation Reports Social Science and Science collections (2017Journal Impact Factors, 2018. As indicated earlier, other databases like SCOPUS may contain a broader and more diverse collection of journals in the social sciences and business and economics. However, the most used impact indicators are the impact factor indicators computed using the WoS database. Also, the eigenfactors scores and AIS are computed using this database. Data gathered contained journal and publisher information, total cites, citable items, journal impact indicators, WoS categories and other relevant information. We concentrated on three impact factor indicators: general IF, the 5YIF and IFwoSC and two eigenfactor impact scores: eigenfactor (Eig) and AIS. Additionally, we also consider the total cites indicator, and the immediacy factor (IMMI). We obtained all these impact factors for all the journals included in the above databases.
WoS category assignment and management Journals in the WoS database are organized under categories. A major category is the one that distinguishes the Sciences (SCIE) from the Social Sciences (SSCI). Within these general categories, journals are included (they can request it) in particular categories defined by WoS. However, many journals are included in more than one category which makes the definition of the main category of a journal a relevant issue, particularly if you want to have single journal records. While some authors have suggested more complex and combined methods to assign journals to a single WoS category (see for example Dorta-Gonz alez and Dorta Gonz alez, 2013), we decide to use a simpler procedure.
Journals indexed in one category were considered in the registered WoS class. For journals that appear in 2, 3 or 4 WoS categories, we used the following procedures to assign the journal to a particular one. We considered the main business/economics category and the relative percentile of the journal IF for the different categories as the main criterion for classification. We can explain it through a hypothetical example. Journal X is classified under three WoS categories: management, psychology and economics. Considering the journal IF (the default information presented by JCR), Journal X is ranked 140 out of 200 in category M 5 management (percentile 70%), 65 out of 100 in category P 5 psychology (percentile 65%) and 250 out of 300 in Economics (percentile 17%). Then, given our specific focus of considering business and economics journals, our procedure favours the assignment of journals to those business and economics categories. Therefore, we assigned the journal to the top percentile category with Business and Economics categories. In our example, we assigned journal X to the category M: management, even though in psychology the journal had a better percentile or rank (65%).

JEFAS
We performed this journal allocation process manually, going over all journals case by case. In a few cases, we observed obvious misclassifications by using this rule. For example, some journals might be assigned to a secondary category just by minor differences in the percentile rank -85% vs 86%. For those cases, we added an expert judgment rule to the best percentile rank rule. The expert judgment rule involved the examination of the journal title and considering: (a) the inclusion of the name of the WoS categories in the title (management, business, finance and economics) and (b) the order in which they appear in the title and general subject area classifications particularly for management (e.g. strategy, general management, OB and entrepreneurship were considered under management and marketing, logistics and multidisciplinary journals were included in the business category).
We used this combined procedure for two purposes: (1) to preserve all the selected 669 journals in the Business and Economics journals within these fields since the use of simple top percentile rule will leave some of the journals in other SSCI or SCIE categories and (2) have a stronger validity for our results within the field of business and economics. We believe that since authors can send their papers to any journal, business schools tend to favour those journals that fall under the Business and Economics WoS categories. Therefore, assigning journals to the top Business or Economics category appears to be the better solution to solve this multiple category issue.

General results
Correlations between journal impact indicators Consistent with previous studies in different disciplines (see, for example, Salvador-Oliv an and Agust ın-Lacruz, 2015), we found significant relationships between many IF indicators. In Table 1, we include the correlation matrix, and most indicators are large, positive and significant. Particularly strong correlations were found between the eigenfactor score, and total cites (0.93) and between the AIS and the 5YIF. Correlations of the AIS with the impact factor was 0.83 and with the IF without self/cites was 0.89. These results are consistent with previous studies as presented in Table 2. Also, we computed the correlations for different specific disciplines (within the sciences and social sciences), finding consistent results across all of them. We also report the correlation coefficients for the business and economics WoS categories.
Those relationships can also be graphically visualized ( Figure 2). Eigenfactor scores are highly correlated with total cites, and AI scores show a stronger linear relationship with the 5YIF. Of course, since most relationships were not 1.0, some variance was not captured by the other indicators, but they are pretty good predictors of eigenfactor and AI scores. Are different citations patterns an issue? Many researchers have indicated that self-citations (in this case, citations to a given journal coming from articles in the same journal) may differ across disciplines and have stated the need to control for it. In fact, WoS publishes the IF without self-cites as a way to have a cleaner IF.
We computed two self-citation effect variables: the absolute increase in IF due to self-cites (DiffIF 5 IF -IFwoSC) and the percentual or relative increase in IF due to self-cites (PercDiff 5 DiffIF/IFwoSC). We provide the graphs in Figure 3, showing no strong relationships between the AIS and eigenfactor indices and absolute self/citation effects. It also shows a soft negative relationship in the case of percentage or relative effect of self/citation patterns, thus indicating some evidence for AIs and eigenfactor to potentially reduce these effects. As stated by Arendt, the persistent correlations between AIS and IFs (particularly the 5YIF) provide a stronger argument that the differences in citation patterns across fields (if exist) are not removed using the AIS. Two explanations provided by Arendt (2010) might explain this phenomenon. The first one is that structural differences between scientific fields do exist. Some fields cite more, and some fields are more citable and influential or "prestigious" than others. The second explanation is linked to the particular field connection (position) to the citation network. Fields with a larger number of journals and with already prestigious journals (higher IFs or AIs) will be favoured by citations and will be more influential.  Dorta-Gonz alez and Dorta-Gonz alez (2013) cover some of these issues suggesting four potential sources of citation variance in addition to the number of references per an article in the field, like different dissemination channels (e.g. books and proceedings vs journal articles and relative coverage of the WoS of different disciplines), different field growth (reduction) rates, the ratio of total citations in the discipline within the target window and different ratios of cited to citing (or citation exchange between fields). Similarly, Waltman and Van Eck (2010) suggest that no impact measure (not even the eigenfactor) can manage two main opposite properties: the insensitivity to field differences and the insensitivity to insignificant journals. The eigenfactor and AIS cannot deal with both situations simultaneously, and their capability to deal with one more than the other will depend on the parameter alpha (0-1) used for the algorithm estimation. In fact, several other researchers have offered their own metrics for trying to account for these field differences, like the Audience factor (Zitt and Small, 2008) and the Source Normalized Impact or the SCImago Journal Rank, which is basically a Page Rank inspired indicator, similar to the eigenfactor, but computed using the SCOPUS database (Gonz alez-Pereira et al., 2010). No single impact metric can capture the complexity of quality assessment and control at the same time for all intervening variables.
Does the use of impact factor vs eigenfactor metrics affect the assessment of business and economics research? As stated earlier, we wanted to examine the effects of using particular impact metrics when assessing research outputs from Business and Economics. Therefore, we computed the average scores for all six previously mentioned journal assessment metrics, plus the total cites, also published by WoS, using the Business and Economics journals included in Web of IgPercDiffself -1.00 -3.00 -4.00  Table 3, mean scores for the IF, 5YIF, IFwoSC and total cites are higher for All Business journals compared to Economics journals. Eigenfactor metricseigenfactor and article influence scoreson the other hand, are larger for Economics journals. What is the explanation for these results? Are there specific assumptions that may affect the computation of IF metrics and eigenfactor metrics that may induce these inconsistent results?

Note(s): Dispersion graphs showing lower correlations between logarithms of AIS scores and cites; Eigenvalues and AIS sores; AIS scores and 5-year impact factor scores, and eigenfactor scores and impact factor scores without self-citations Source(s): Own elaboration
An answer can be found in the computation logic of the eigenfactor scores. According to Bergstrom (2007), the total influence of a discipline in a year is the sum of the eigenfactors of all journals in that discipline, and the total production of science (defined as the WoS citable pieces) is defined to be 100. The eigenfactor is a measure of the influence of a particular journal on the sciences. Since they are approximately 11,681 journals in the database, the average estimated contribution of a journal (assuming all journals contribute the same) will be 100/11,681 or 00.085 (which can also be read as a percentage of influence). Table 3 reports the average eigenfactors for all Business (00.033) and Economics journals (00.046) and individual Business categories.
Using this information, we can also calculate the relative influence of each WoS category on Science in general (to the science included in all 11,681 journals). Multiplying the average eigenfactor score by the number of journal titles provides the following WoS Category influence scores: Business (0.25 5 0.0024 3 99), Management (0.54), Business-Finance (0.36), All Businessthe sum of Business, Management and Business-Finance (1,145) and Economics (1.47). If we divide these numbers by the total citations obtained by journals in those disciplines, we can get the value of a citation in a business journal (000,000.076) versus the value of a citation in an Economics journal (000,000.192) or in an average scientific journal (000,000.155).
We can estimate the relative value of a citation in different disciplines by dividing these citation values by the citation on an average journal. Relative values of citations are Business (0.486), Management (0.554), Business-Finance (1,048), and Economics (1,223). According to these computations based on the Eigenfactor metrics algorithm, a business citation is worth half an average cite in all scientific journals, while a citation in an Economics journal is worth 1 and 2, an average cite. It is important to notice, that these values would change if the database considered is different (e.g. SCOPUS) and is sensible to the disciplinary coverage of each database. These different relative values in citationseconomics journal citations counting 2; 5, a business journal citationis relevant to explain the positive differences in eigenfactor and AI scores for economics journals when compared to business journals.
Using impact factor vs eigenfactor AIS for assessing specific business and economics journals After providing a general overview of the impact of using standard IF or eigenfactor metrics to assess research from different disciplines, we wanted to examine its effects at the specific journal level. Using the different impact and eigenfactor metrics, we ranked all 669 Business and Economics journals. Tables 4 and 5 show the top 50 journals under each of these rankings. Major differences can be observed between these rankings in terms of the representation of different disciplines.
In Table 6, we present a summary of the presence of journals from each WoS category in the top 50 rankings when a particular impact or influence metric is used. As can be seen, the effect is quite dramatic: the presence of Economics journals goes from 11 to 12 (23%) if you use the IF or the 5YIF to 32 and 29 (61%) if the eigenfactor scores or AIS are used to prepare the ranking.
The stronger presence of economics journals in the top 50 list when you use the eigenfactor metrics may be derived from the higher value the extracting algorithm     Table 4. TOP 50 journals in Business and Economics according to different journal metrics, rank based on impact factor metrics Impact Metrics and research assessment procedures assign to economics journals (1.4 vs 1.1 for business), which can be associated with the characteristics of each disciplinary network (size, centrality and density) within the whole network of Science. Rosvall and Bergstrom (2011) present a hierarchical map of science consistent with this construction, where Economics is more central and larger/dense network within the Social Sciences field and is closer to the gateways (e.g. Statistics) to the other major scientific fields: Physical Sciences, Ecology and Earth sciences, and Life Sciences (see Figure 3 for a graphical representation of these arguments).
Using impact factor vs eigenfactor AIS for assessing journals in the social sciences To look for more generalizable findings beyond Business and Economics, a similar analysis was performed for all Social Sciences. As in the previous analysis, the presence of different disciplines in the top 100 journals (in this case representing the top 3% of the social science journals) changes depending on the type of metric used ( These results are consistent with the average scores and ranks by discipline. We ranked all scientific disciplines according to the different impact metrics, and we establish the IF rank as the benchmark. Then we computed the differences in ranks when using a particular metric compared to the benchmark IF-based rank. In Table 8, we include the 20 disciplines that increase their rankings the most if the AI score is used (last column in the table). Mathematics (#1) rises 172 places, Statistics and Probability rises 170 places (#2), Applied mathematics 129 places (#3), Economics 127 places (#4) and Business-Finance (#8). Most of these 20 fields with the top larger ranking increases are older fields and very quantitative in nature (therefore more central to the Total Science spectrum, and according to Rosvall and Bergstrom, 2011, more influential).
We also include the 10 disciplines that face the largest reductions. Smaller fields appear as the ones most affected by the AIS ranking. Also, Business remains basically in the same position (rank is only two places higher), while Management experiences a slight increase of 18 places. These results produce, however, important changes in the relative positions and distances among Business and Economics disciplines, with Economics in the 25th place of all disciplines, Business-Finance in the 35th place, Management in the 40th place and Business in the 74th place (the opposite will happen if you use the impact factor metrics). The prevalence of quantitative methods and mathematical modelling, cross-citation between fields and the relative (in) balance in citing vs cited patterns, centrality in the scientific database chosen and the underrepresentation of disciplines in the database (e.g. WoS vs Scopus), number of journals and tradition/history of the field might be potential reasons for these differences that need further study. In the case of business disciplines, Business-Finance with a heavier empirical and mathematical approach (similar with Economics in the use of Econometrics) is the field with a large ranking increase.
The category Management includes the most traditional and original subdisciplines (organization theory and behaviour, general management and management science) and  Table 6. Journals by discipline in the top 50 Business and Economics rank, given a particular single metric is used to rank the journals    (Table 9).

Theoretical implications
Our results confirm previous studies that show high correlations between different impact factor metrics, confirming the idea that they are somewhat measuring a similar underlying construct. Most of the important research quality variance is already captured in the regular impact metricstotal cites and 5-year JIFwhich are highly correlated with the eigenfactor score and the AIS, respectively. Then the arguments of Chang et al. (2011Chang et al. ( , 2013Chang et al. ( , 2016 and other scholars in favour of simpler and more transparent impact metrics gain additional support. However, our results confirm that for certain cases, eigenfactor metrics, in particular, the AIS can capture some information that is not captured by standard IF metrics. They seem to provide some control for self-citations and place a major weight on the centrality (or influence) of particular journals and disciplines in the overall scientific network. Older, more traditional, more interconnected and dense disciplines will be favoured when using the eigenfactor, and particularly the AIS score. However, the main assumption that the value of a discipline is represented by the size and centrality of its network needs to be further discussed and justified. When using these assumptions, the Social Sciences represent less than 10% of the value provided by all sciences (see for example Rosvall and Bergstrom (2011), who estimated 4%).
Then when assessing the research outputs of researchers, social science scholars will be undervalued compared to science researchers because their contributions are part of a smaller and less influential subnetwork according to the way the AIS is computed. The same effect occurs when you compare Business and Economics disciplines. When using the AIS, Business journals represent 1.14% of the total value in Science and Economics journals represent 1.47% of the value of all sciences. Then, when considering both types of researchers together and using just AIS scores to give recognition, awards and promotions or to assess individual research output, business research will be considered on average as having a lower impact. SCOPUS  Impact Metrics and research assessment Therefore, a stronger theoretical discussion is needed regarding the underlying assumptions of specific metrics on the relative value or influence of sciences, social sciences and its particular disciplines.
At the individual level, we suggest that these results may also be connected to the literature incentives on intrinsic and extrinsic motivations, fairness and internal and external equity and the overall design of incentive systems in research organizations (see Welpe et al., 2015, for a compendium on this issue; Rizkallah and Sin, 2010;Fischer et al., 2019).
Our research has also some implications for the business school management literature. Based on the capability-based and strategic fit approaches in strategic theory (Teece, 2007;Kaplan and Norton, 1996), we argue that school outcomes will be better and more sustainable the larger the fit between systems and resource allocations with the original strategy and mission. Research assessment systems are at the core of these processes since they may enhance faculty selection (due to candidates' self-selection or cultural self-replicating actions) and affect faculty promotion and tenure decisions. Research assessment systems may also generate stress with faculty deployment systems to fulfil the faculty needs of both research and teaching value chains.
We argue based on the above theories that these effects may be more negative for Business Schools and universities, the larger the misalignment between the needed research outcomes and the type of research favoured by particular metrics. Consistently, these initial results provide support to the literature, suggesting that journal impact metrics cannot be used as complete substitutes for qualitative assessment of individual research contributions. This is in line with new developments on the research incentives literature examining broader research outcomes given stakeholders' expectations, like research translation, dissemination and utilization (Jessani et al., 2020). Additionally, since assessment systems are developed by those being assessed (professors) affecting their promotions, benefits and internal power, there are obvious self-regulation risks and issues (Gomes and Frade, 2019) that need to be accounted for in the design process. Further research needs to examine these particular relationships in more detail.

Implications for research assessment systems and strategic management of business schools and universities
The previous discussion has placed special attention on the issues of research assessment to build more valid and fair systems that generate the conditions and incentives to improve the research outputs at the individual, school or institutional level. A related but different issue has to do with the linkage between research assessment systems and strategy formulation and implementation at higher education institutions.
Some scholars will still argue that economic growth has a stronger intrinsic value than fashion management, but if you work for a State School of Fashion and you are training fashion marketers in a province of Colombia or Guangzhou, maybe this "less central research" will be the basis for better professional training, and the main driver of economic growth in those regions and will be very valuable for serving your school mission. However, the use of eigenfactor and AIS will value the economic growth article as more "influential" than the fashion management piece, just because economic journals are cited by more influential journals and are overrepresented in the WoS database (compared to fashion or management journals). This element should be considered when designing research promotion policies to provide better guidance to faculty and to have a consistent strategy and use of resources within Schools and Universities.
Then, as a general implication, our research confirms the notion that the design of research assessment systems needs to consider both qualitative and quantitative indicators and JEFAS should be administered by senior scholars with a sound knowledge of the disciplines, the school's mission and the expectations of a wide variety of stakeholders (not just Academia).
Additionally, since business schools (and other schools within universities) are becoming increasingly multidisciplinary, the inclusion of multiple impact metrics is advised. If a single factor metric would be used for assessing research from different disciplines, e.g. Psychology, Economics, Sociology and Business, the fight over which single/metric to use will be filled with conflicts of interest and not necessarily follow the aims of promoting research. Besides, the use of single/metrics will also provide a fruitful scenario for the appearance of winnertake-all markets, particularly if some fields or subfields have an initial advantage in terms of research traditions, number of existing journals, use of mathematical/quantitative methods and modelling vs cases and qualitative research, previous publications by the particular subfield.
Similar implications can be drawn for the design of university-wide and national research assessment systems, which should take into consideration a wide variety of fields and disciplines, from the Sciences, Social Sciences and Humanities. Despite the intent to control for disciplinary differences in citations, peer review, authorship and publication patterns, it would be difficult to justify that the disciplines in the Natural Sciences have nine times more value (influence) than the Social Sciences. University-wide research assessment and tenure and promotion systems need to have higher legitimacy within all disciplines, and the use of multiple metrics may be relevant for reducing the undesired effects and biases produced by the embedded computation logic of eigenfactors and AIS calculation, against Social Sciences, and more peripheral/newer/practical disciplines.
The above tables indicate that the relative ranks of both disciplines and journals may be very sensitive to the type of impact measure used, and undesired winner take all markets instead of competitive markets may be fostered. Also, since dramatic changes are present at the journal level rank, special care is needed, when schools and research bodies are using journal-level ranks to assess article quality and research productivity (Mingers and Yang, 2017).
Our results suggest that the use of multiple impact metrics may provide a better solution and a broader perspective on journals and research assessment. No particular metric fulfils all desirable criteria and despite the claim that some impact metricslike the Eigenfactor and AISinclude the implicit assumption that certain fields are more influential than others given the existing network size and cross citation patterns, which goes against the original objective and may reduce its acceptance. Additionally, for business schools with balanced teaching-research missionslike most business schools in emerging nationsit would be less advisable to use impact metrics that consider business disciplines as less valuable and more peripheric (e.g. eigenfactor and AIS scores than other support disciplines to the Business profession (e.g. political science or statistics). It will be difficult to justify that business schools' research assessment systems would be rewarding research on these disciplines more than research on core business subjects.

Implications for individual research assessment
Finally, research committees at the national, university or school levels, when considering individual research records, should use multiple indicators and have discipline-based benchmarks. Even bibliometric studies in particular fields or regional areas will provide a better understanding of the contributions of a researcher (school or country) to a particular field (see for example Cancino et al., 2018;Olavarrieta and Villena, 2014). Like traditional IFs, AIS and eigenfactor scores do vary considerably between disciplines. To assess individual research performance, you need to combine information from different indicators, and you need to go to the particularities of each case. Recently, Nature (2017 Journal Impact Factors, Impact Metrics and research assessment 2018), one of the top Science journals, decided to diversify the presentation of its impact and performance indicators. They decided to do so since they recognize the differences in citation patterns across disciplines, and that IFs sometimes overrate journals with few very highly cited papers and undervalue research with few citations, particularly in fields with lower citation propensities. Although AIS may reduce the effects of self-citation patterns, most differences across fields cannot be understood as based mainly on this particular dimension.
Our results indicate that network-related indicators as the eigenfactor and the AIS are particularly conditioned by the characteristics or structure of the network, making comparisons based on the AIS particularly complex if you have researchers publishing in different sub-networks or disciplines. Internal and external equity issues should be considered to stimulate extrinsic motivation and avoid the negative effects of unfairness perceptions. Let us consider two cases: (a) individuals with few non-cited papers published in good influential central journals and (b) researchers with several highly cited papers published in journals of more peripheric disciplines. Would it be fair to rule out the second, and not even let them compete for research funds, awards or promotions, in favour of the first just based on the comparison of journal level impact factors? Would this promote relevant and good research in your institutions and countries? Would this promote a good resource allocation process? Research profiles, individual article level information and disciplinary peer judgment cannot be substituted for algorithms based on single metrics (Mingers and Yang, 2017;Adams et al., 2019). Multiple indicators are advised and given the lack of relative coverage of business journals in WoS (Harzing, 2020), the use of additional indicators based on Scopus or Google Scholar will also add relevant information.

Future research agenda
In this study, we present descriptive evidence regarding the general effects in assessing and ranking journals and disciplines when using standard impact metrics compared to eigenfactorderived metrics. We developed plausible explanations for these effects based on the design and computation definitions provided by their developers (Bergstrom, 2007;Begstrom et al., 2008;Rosvall and Bergstrom, 2011). Future research may further investigate the importance of different factors on the scores and rankings of journals based on these scores. Some of the key variables that may need to be further researched are the age of the field and journal; journal and disciplinary network size and density; centrality and closeness and cross-citation patterns of the journal/discipline with other disciplines outside social sciences. Other important variables to be explored are the newness and practice orientation of a particular field. Again, the use of eigenfactor indicators may generate lower impact levels for journals and disciplines that are rising and that is fostering innovation. If this were the case, the relative validity of standard IFs versus eigenfactor metrics may rise, and the case for peerbased multi-dimensional assessment systems will be stronger. Future studies may extend the study of impact metrics on journal and discipline rankings beyond the WoS database to the Scopus database. It might be possible, that since Scopus has a wider representation of business journals and territories, smaller changes may be found when using the Scopus database.