Systematic Evaluation Model for Developing Sustainable World-Class Universities: An East Asian Perspective

Due to the unbalance between Asian and Western countries in terms of higher education development and pressure from global competition, universities in several East Asian countries have striven to become world-class universities (WCUs) by actively assessing themselves using various global ranking systems and subsequently investing in key performance indicators. Numerous scholars have suggested that for these East Asian catch-up universities (EACUs), independently improving the elements related to high-weight indicators could produce short-term increases in ranking performance; however, this approach is not conducive to sustainable development. In addition, little is currently understood regarding sustainable development strategies for developing EACUs into WCUs. This study proposes a systematic evaluation model for self-assessment and the creation of strategies to transform EACUs into sustainable WCUs. The fuzzy Delphi method was used to determine criteria for a new evaluation framework, and the decision-making trial and evaluation laboratory method was employed to construct the influential relationships among the criteria. Two cases were then selected to demonstrate the superiority of the model for creating sustainable development strategies for EACUs. This study provides a systematic perspective and a useful tool for decision-makers at EACUs to achieve sustainable development goals.


Introduction
Higher education (HE) systems worldwide have undergone dramatic structural changes since the late twentieth century. The development of HE has profound and lasting effects on the political and economic development of a country. The traditional mode of higher education institutions (HEIs), which emphasizes basic research, clearly distinguished hierarchies, and promotes autonomy among scientists in various disciplines [1], has substantially changed into a model comprised of government-industry-university networks that emphasize that knowledge production is socially distributed, applicationoriented, transdisciplinary, and subject to multiple accountabilities [2][3][4]. Additionally, the globalization of communication and the emergence of the Internet has promoted fierce competition and strategic cooperation among HEIs [5,6]. In this context, certain governments and HEIs are striving to improve their global competitiveness and develop into "world-class universities" (WCUs). In addition, research on the global rankings and reputation of HEIs has gradually received an increasing amount of attention [7][8][9][10][11][12]. Therefore, the concept of WCUs has become a popular and crucial topic of discussion. Although (ACUs). However, research has not produced sufficient knowledge to formulate a sustainable development strategy for ACUs. Specifically, ACUs are universities that emphasize their ranking results and are extremely eager to become WCUs, consider top-tier Western research universities as benchmarks, can obtain substantial strategic investments to develop into WCUs, have a foundation for development, and have already been ranked in certain reputable ranking systems. ACUs are mainly located in highly developed regions of East Asia (Southeast Asia and Northeast Asia), including mainland China, Hong Kong, Taiwan, Macao, Singapore, Malaysia, South Korea, Japan, and other Northeast Asian countries/regions [2,20]. Therefore, this paper focuses on research universities in East Asia that fit into the aforementioned catch-up model-namely, East Asian catch-up universities (EACUs).
This paper proposes a model for EACUs to formulate development strategies from a systematically sustainable perspective. The proposed model is a tool for the senior decision-makers of EACUs to examine their development status and rationally formulate sustainable development strategies for continual self-improvement in pursuit of becoming a sustainable WCU. On the basis of sustainable development, East Asian universities can formulate their own policies according to local conditions rather than copy the policies of European and American universities, thereby improving the overall level of HE in Asia and balancing the development of global educational resources. The proposed model not only supports the sustainable development of HE at a theoretical level, but at a practical level, also contributes to the balanced development of global education institutions.

Methods
This section provides an overview of the theoretical background to the implemented approach. As displayed in Figure 1, the proposed model includes evaluation criteria established through inductive analysis and application of the fuzzy Delphi method (FDM; Steps 1 and 2) as well as the influential relationships among the criteria, which were constructed through the decision-making trial and evaluation laboratory (DEMATEL) technique (Step 3). Two cases were selected as examples to demonstrate the model's superiority in terms of its ability to systematically identify improvement priorities (Steps 4 and 5).
ranking systems exert certain positive effects in terms of self-assessment among HEIs, completely relying on a ranking system or high-weight criteria for self-improvement under performance-driven accountability is unsuitable for Asian catch-up universities (ACUs). However, research has not produced sufficient knowledge to formulate a sustainable development strategy for ACUs. Specifically, ACUs are universities that emphasize their ranking results and are extremely eager to become WCUs, consider top-tier Western research universities as benchmarks, can obtain substantial strategic investments to develop into WCUs, have a foundation for development, and have already been ranked in certain reputable ranking systems. ACUs are mainly located in highly developed regions of East Asia (Southeast Asia and Northeast Asia), including mainland China, Hong Kong, Taiwan, Macao, Singapore, Malaysia, South Korea, Japan, and other Northeast Asian countries/regions [2,20]. Therefore, this paper focuses on research universities in East Asia that fit into the aforementioned catch-up model-namely, East Asian catch-up universities (EACUs).
This paper proposes a model for EACUs to formulate development strategies from a systematically sustainable perspective. The proposed model is a tool for the senior decision-makers of EACUs to examine their development status and rationally formulate sustainable development strategies for continual self-improvement in pursuit of becoming a sustainable WCU. On the basis of sustainable development, East Asian universities can formulate their own policies according to local conditions rather than copy the policies of European and American universities, thereby improving the overall level of HE in Asia and balancing the development of global educational resources. The proposed model not only supports the sustainable development of HE at a theoretical level, but at a practical level, also contributes to the balanced development of global education institutions.

Methods
This section provides an overview of the theoretical background to the implemented approach. As displayed in Figure 1, the proposed model includes evaluation criteria established through inductive analysis and application of the fuzzy Delphi method (FDM; Steps 1 and 2) as well as the influential relationships among the criteria, which were constructed through the decision-making trial and evaluation laboratory (DEMATEL) technique (Step 3). Two cases were selected as examples to demonstrate the model's superiority in terms of its ability to systematically identify improvement priorities (Steps 4 and 5).

Establishing Evaluation Criteria
First, a comprehensive review of the literature on famous ranking systems worldwide and related research was conducted to identify criteria for evaluating university performance.
Step 1: NVivo 11 was used to generate a predetermined list of criteria according to the consensus of the analysts.
Second, the FDM was applied to exclude criteria that were inapplicable to East Asia in terms of the global frameworks used by prestigious ranking systems. The Delphi method was first proposed by Dalkey and Helmer in the early 1960s [36] and was used to estab-Mathematics 2021, 9, 837 4 of 20 lish a set of evaluation factors affecting decision-making according to consensus among anonymous experts [37,38]. The FDM, which combines the Delphi method and fuzzy set theory and is used to address the vagueness and uncertainty of judgments, was proposed by Ishikawa et al. [39]. The FDM has been widely applied to construct key performance appraisal indicators in domain applications such as the service industry [40] and sustainable ecotourism [41]. The FDM has been iteratively developed with the discovery of problems encountered during application such as "anonymity", "iteration", "controlled feedback", and "statistical group response" [37]. The FDM used in the current study integrates expert opinions by using the "double triangular fuzzy number" [42] and was employed with a "gray zone verification method" to determine whether expert cognition demonstrates a consistent convergence effect (Steps 2.1-2.3). The advantages of this approach are as follows: (1) fewer surveys are required; (2) less time is required as expert surveys are conducted separately; (3) experts' views are more effectively incorporated according to their professional perspectives; and (4) consensus among experts is more effectively evaluated through the gray zone verification method. The concrete steps in this approach are described subsequently.
Step 2.1: The "most pessimistic cognitive value" and the "most optimistic cognitive value" provided by all experts for each factor i are statistically analyzed, and the extreme value outside "2 × standard deviation" is eliminated. Next, the minimum value C i L , geometric mean value C i M , and maximum value C i U in the remaining "most pessimistic cognitive value" as well as the minimum value O i L , geometric mean value O i M , and maximum value O i U in the "most optimistic cognitive value" are calculated.
Step 2.2: On the basis of the calculation results in Step 2.1, the three-angle fuzzy number the "most pessimistic cognition" and the three-angle fuzzy number the "most optimistic cognition" for each factor i are calculated ( Figure 2).

performance.
Step 1: NVivo 11 was used to generate a predetermined list of criteria according to the consensus of the analysts.
Second, the FDM was applied to exclude criteria that were inapplicable to East Asia in terms of the global frameworks used by prestigious ranking systems. The Delphi method was first proposed by Dalkey and Helmer in the early 1960s [36] and was used to establish a set of evaluation factors affecting decision-making according to consensus among anonymous experts [37,38]. The FDM, which combines the Delphi method and fuzzy set theory and is used to address the vagueness and uncertainty of judgments, was proposed by Ishikawa et al. [39]. The FDM has been widely applied to construct key performance appraisal indicators in domain applications such as the service industry [40] and sustainable ecotourism [41]. The FDM has been iteratively developed with the discovery of problems encountered during application such as "anonymity", "iteration", "controlled feedback", and "statistical group response" [37]. The FDM used in the current study integrates expert opinions by using the "double triangular fuzzy number" [42] and was employed with a "gray zone verification method" to determine whether expert cognition demonstrates a consistent convergence effect (Steps 2.1-2.3). The advantages of this approach are as follows: (1) fewer surveys are required; (2) less time is required as expert surveys are conducted separately; (3) experts' views are more effectively incorporated according to their professional perspectives; and (4) consensus among experts is more effectively evaluated through the gray zone verification method. The concrete steps in this approach are described subsequently.
Step 2.1: The "most pessimistic cognitive value" and the "most optimistic cognitive value" provided by all experts for each factor i are statistically analyzed, and the extreme value outside "2 × standard deviation" is eliminated. Next, the minimum value i L C , geometric mean value C i M , and maximum value i U C in the remaining "most pessimistic cognitive value" as well as the minimum value  Step 2.3: Whether the experts' opinions exhibit a consistent convergence effect can be determined using the following methods: Step 2.3: Whether the experts' opinions exhibit a consistent convergence effect can be determined using the following methods: (1) If no overlap exists between the two triangular fuzzy numbers-that is, C i U ≤ O i L , then this indicates that the opinion interval value of each expert has a consensus section and the opinion tends to be within this consensus section; therefore, the "consensus value" G i U of this facility factor i can be calculated using Equation (1).
(2) If an overlap between the two triangular fuzzy numbers is observed-that is, C i U > O i L , and the gray area Z i = C i U − O i L of the fuzzy relationship is smaller than the range M i = O i M − C i M between the "geometric mean of the optimistic cognition" and the "geometric mean of the pessimistic cognition" for the facility factor by the expert, then although no consensus section exists for each expert's opinion interval value, the two experts who provided extreme opinions (the most pessimistic expert of the optimistic cognition and the most optimistic expert of the pessimistic cognition) did not differ considerably from the other experts in terms of their opinions. Therefore, the "consensus value" of this facility factor i can be equal to the fuzzy set obtained by the intersection (min) operation of the fuzzy relation of two triangular fuzzy numbers, and the quantization score of the fuzzy set with the maximum membership value can be obtained.
(3) If an overlap between the two triangular fuzzy numbers is observed-that is, C i U > O i L , and the gray area M between the "geometric mean of the optimistic cognition" and the "geometric mean of the pessimistic cognition", then no consensus section exists for each expert's opinion interval value, and the two experts who provided extreme opinions (the most pessimistic expert of the optimistic cognition and the most optimistic expert of the pessimistic cognition) differed considerably from other expert opinions, resulting in divergent opinions. Therefore, a new round of questionnaires must be administered, and Steps 1-3 must be repeated until all evaluation items have reached convergence and the corresponding "consensus value" is obtained.

Constructing the Influential Relationships among the Criteria
In Step 3, the DEMATEL method was adopted to construct causal relationships among the various criteria. This method was proposed by the Science and Human Affairs Program of the Battelle Memorial Institute of Geneva and was used for solving intertwined problem groups [43]. An outcome of the DEMATEL method, influential network relationship maps (INRMs) [43,44] offer visual representations for decision-makers to organize their actions according to which criteria are prioritized in real-world situations [45,46]. The DEMATEL method responds to the requirement to identify priorities from a systematic perspective rather than by "treating symptoms but not the disease" [47,48]. The DEMATEL method is widely implemented in the creation of sustainable development strategies with causal influence to manage real-world decision-making problems associated with concerns in tourist attraction development [43,49], creative community development [48] and design scheme improvement [50]. The concrete steps are as follows.
Step 3.1: Establish a direct influence relation matrix E. By employing H expert questionnaires on a scale of 0 (absolutely no influence) to 4 (highest influence), data for each influential relationship between any two criteria can be obtained. The pairwise comparison method is used to evaluate the degree of influence. The direct influence relation matrix E, an n × n non-negative matrix, is presented in Equation (4), and the direct influence relation matrix from each expert is Step 3.2: Constitute the average direct influence matrix A. The average scores of the H direct influence relation matrices are calculated using Step 3.3: Determine the level of experts' consensus. The consensus can be calculated using Equation (6). The recommended threshold for the average gap ratio is 5%. If an unstable system (a value larger than 5%) is obtained, then the operations in Step 3.1 should be reimplemented to ensure the correctness of the collected data and the adequacy of the quantity of experts.
Step 3.4: Formulate the normalized average direct influence relation matrix D. The matrix D is obtained through the normalization of matrix A with Equations (7) and (8).
Step 3.5: Formulate the total influence relation matrix T. A n × n matrix T is calculated with Equations (9) and (10), where I is an n × n unit matrix.
Step 3.6: Generate the INRM. Each row sum and column sum in matrix T can be acquired with Equations (11) and (12). o − r and o + r serve as the horizontal and vertical axes of the INRM, respectively. In Step 4, two examples of EACUs creating sustainable improvement strategies are provided to verify the practicability of the proposed evaluation model ( Figure 1). The DEMATEL-based analytic network process (DANP) method proposed by Tzeng [44] was used to calculate the influential weight of each criterion. The concrete steps are described as follows.
Step 4.1: Calculate the unweighted supermatrix W α . Normalize the total influence relation matrix T C by dimensions, as presented in Equation (13): where T α C is the normalizing total influence relation matrix of criteria by dimension. According to pairwise comparisons of the criteria and the basic concept of the ANP, the unweighted supermatrix W α can be calculated through the transposition of the normalized influence relation matrix T α C by dimension-that is, W α = (T α C ) , as presented in Equation (14): Step 4.2: Calculate the weighted supermatrix. The total influence relation matrix T D is divided by d i = ∑ m j=1 t ij , i = 1, 2, . . . , m, and then the normalized total influence relation matrix of dimensions T α D can be obtained, as presented in Equation (15): Mathematics 2021, 9, 837 8 of 20 Matrix T α D , the unweighted supermatrix W α , and the weighted supermatrix W can be obtained with Equation (16), where t αD ij is a scalar and ∑ m j=1 m j = n: Step 4.3: Limit the weighted supermatrix by raising it to the zth power until the supermatrix becomes a stable supermatrix. The DANP method is used to obtain the global priority vectors-that is, the global weights w g , also known as influential weights (IWs). For example, lim Z→∞ (W) Z , where z represents any number of powers. The local weight of dimension w l D can be obtained from the summation of the IWs of all the criteria in each dimension. The local weight of criterion w l c can then be obtained through the division of the global weight of each criterion by the local weight of its own dimension. In Step 5, the modified VIseKriterijumska Optimizacija I Kompromisno Resenje (m-VIKOR) method is applied to obtain the gap value between the current performance level and the aspiration level of each criterion [43]. The total performance gap of each case is calculated according to the weighted gap value of each criterion. The concrete steps are presented as follows: Step 5.1: The aspiration level and the worst value is detailed as follows: The aspiration level: f aspired = ( f is the worst value. In the current study, performance scores ranged from 0 to 10 with natural language in the semantic questionnaire. Accordingly, f aspired j = 10 is defined as the aspiration level and f worst j = 0 is defined as the worst value.
Step 5.2: Obtain the mean group utility for the gap and then create priority strategies.
where s k is the normalized ratio (%) of the distance to the aspiration level, and w j denotes the IWs of the criteria generated through the DANP technique.

Criteria Evaluation System
The predetermined list of evaluation criteria used in Step 1 is based on several widely used ranking systems. Evaluation frameworks and performance measurements for WCUs can be extremely diverse. Problems involving WCUs do not fall under a single category that defines a certain position; thus, addressing such problems involves a holistic and inclusive approach covering various quality dimensions. Therefore, the current study employed a general framework that can evaluate the qualities of a university and can therefore be embedded in a framework suitable for evaluating EACUs. On the basis of Step 1, a comprehensive literature review and an inductive analysis of widely used ranking systems and research papers was conducted to determine the final criteria for the proposed model. The ranking systems were selected with reference to Vernon, Balas, and Momani [29]. The ranking system we quoted is consistent with our research. After excluding two ranking systems as they had not been recently updated, 11 of the 13 ranking systems (released 2018-2020) were selected for the final framework (Table 1). Ranking systems were included if they met the following criteria: (1) the ranking system included more than 100 doctorate-granting universities; (2) the rankings were current and continually published; (3) the rankings included international universities; (4) the ranking system published their ranking methodology in English; (5) the ranking system published reputation surveys and research outcome indicators; and (6) the ranking system evaluated between 500 and 5000 institutions. For inductive analysis, Nvivo11 was used to classify the contents of the ranking systems and add certain indicators (i.e., E16 and E17) repeatedly emphasized by scholars in the relevant literature. The predetermined list contains a total of five dimensions and 17 criteria (Table 1). To obtain a list of evaluation criteria for EACUs, the FDM was used to determine the final criteria from the initial list according to consensus among the experts, who were all from EACUs (Step 2).
The pretest questionnaires were collected from January to September 2020. Of the 40 questionnaires distributed, 34 were recovered, yielding a recovery rate of 85%. In total, 20 experts were academics in education management, all of whom had focused on Asian HE for more than 5 years. A total of 14 experts held senior management positions at HEIs, including president, dean, and other middle management positions such as administration in a college department of instruction. Of the 34 experts, 6 were from Macau, 1 was from Hong Kong, 18 were from mainland China, 3 were from Taiwan, 2 were from Singapore, 1 was from Japan, 1 was from South Korea, and 2 were from Malaysia. The FDM questionnaire consisted of two parts: one assessing the importance of the evaluation criteria for EACUs and the other assessing the criteria descriptions. Four criteria (i.e., E14-17) on the initial list with scores (Table 1) lower than the threshold value (i.e., 6) were deleted. Finally, 13 criteria were included. Table 2 presents descriptions of these criteria.

Dimensions Criteria Descriptions
Research performance (D 1 ) Number of SCI, SSCI, and A&HCI papers 2.
Number of research papers appearing in highly influential journals 3.
Own journals (number of journals published by the institution) 4.
Monograph publications and formal publications of academic conference proceedings Quality of information technology services 3.
Size and overall setting Period of study 3.
Research orientation of teaching methods 4.
Innovative methods of teaching and assessment Note: All of these criteria should be adapted to student development goals C 25 International diversity of teaching 1.
International academic staff ratio 2.
Percentage of international (degree and exchange) students 3.
Percentage of foreign language programs 4.
International orientation of programs (joint/dual degree programs) that provide opportunities to study abroad Knowledge transfer (D 3 ) Income from the private sector 2.
Income from continual professional development (private sources) 3.
External research income (e.g., research grants from national and international funding agencies, research councils, research foundations, charities, and other nonprofit organizations) C 32 Regional engagement 1.
Graduates' (bachelor's and master's degrees) employment and student internships in the region 2.
Strategic research partnerships in the region, including regional publications with industrial partners and the proportion of external research revenue from regional sources (i.e., industry, private organizations, and charities) Number of patent applications (simple families or copatents in the industry) 2.
Scientific publication output from an institution cited in patents 3.
Number of awards for inventions 4.
Public knowledge shared

Influential Relationships among the Criteria
The DEMATEL method was adopted to identify causal relationships among the various criteria (Step 3). The questionnaires were collected from October to November 2020. A total of 16 questionnaires were distributed, and all 16 questionnaires were recovered, yielding a recovery rate of 100%. The 16 experts were consisted of senior decision-makers who had relevant management experience, including 3 presidents, 1 dean from the development management committee, 4 deans from different departments, and 8 professors who had rich knowledge of HE development. In addition, the confidence level of the expert consensus was 98.52% (i.e., the average ratio-gap was 1.48%).
Accordingly, the INRM in Figure 3 reveals the influential relationships among the criteria and dimensions. As indicated, at the dimension level, teaching and learning (D 2 ) directly affect research performance (D 1 ) and knowledge transfer (D 3 ), whereas research performance (D 1 ) directly affects knowledge transfer (D 3 ). At the criterion level, in D 1 , excellence with leadership (C 13 ) directly affects research output (C 11 ), citation impact (C 12 ), international collaboration in research (C 15 ), the scientific talent pool (C 14 ), and other criteria. In D 2 , the provision of facilities (C 23 ) directly affects the quality of academic staff (C 21 ), learning experience (C 22 ), the quality of courses (C 24 ), the international diversity of teaching (C 25 ), and other attributes. In D 3 , institutional income (C 31 ) influences regional engagement (C 32 ), innovative knowledge (C 33 ), and other attributes. Mathematics 2021, 9, x FOR PEER REVIEW 13 of 21 The causal relationships identified among the criteria in this study from an East Asian perspective are consistent with the results of relevant empirical studies. For example, regarding the influential relationships between the provision of facilities (C23) and the quality of academic staff (C21), in a report on California State University [53] titled "Faculty Compensation and the Crisis in Recruiting and Retaining Faculty of High Quality", the survey results indicated that the quality of the built environment is a highly crucial consideration when faculty members consider accepting or rejecting job offers. This finding suggests that the provision of facilities has a direct bearing on employee quality [53]. In another example, international collaboration in research (C15) affects research output (C11). A 2019 article by Christopher D. Hammond titled "Dynamics of higher education research collaboration and regional integration in Northeast Asia: a study of the A3 Foresight Program" reported that the amount of research on transregional and cross-border cooperation in East Asia has been increasing rapidly. This indicates that international collaboration in research (C15) promotes the knowledge innovation, patent development, and publication volume of regional cooperative countries. The study also mentioned that international collaboration in research (C15), such as the A3 cooperation plan established by China, Japan, and South Korea, affects citation impact (C12). Thus far, A3 projects have led to the publication of numerous works, including internationally coauthored papers. Another common activity cited was mobility and exchange of researchers-including many postdoc, graduate, and undergraduate students-between the three countries [2].

Case Selection and Data Collection
The University of Macau (UM) and Shenzhen University (SZU) were selected to demonstrate the superiority of the model for creating sustainable development strategies for EACUs. Both universities are relatively new among EACUs with a strong desire to The causal relationships identified among the criteria in this study from an East Asian perspective are consistent with the results of relevant empirical studies. For example, regarding the influential relationships between the provision of facilities (C 23 ) and the quality of academic staff (C 21 ), in a report on California State University [53] titled "Faculty Compensation and the Crisis in Recruiting and Retaining Faculty of High Quality", the survey results indicated that the quality of the built environment is a highly crucial consideration when faculty members consider accepting or rejecting job offers. This finding suggests that the provision of facilities has a direct bearing on employee quality [53]. In another example, international collaboration in research (C 15 ) affects research output (C 11 ). A 2019 article by Christopher D. Hammond titled "Dynamics of higher education research collaboration and regional integration in Northeast Asia: a study of the A3 Foresight Program" reported that the amount of research on transregional and cross-border cooperation in East Asia has been increasing rapidly. This indicates that international collaboration in research (C 15 ) promotes the knowledge innovation, patent development, and publication volume of regional cooperative countries. The study also mentioned that international collaboration in research (C 15 ), such as the A3 cooperation plan established by China, Japan, and South Korea, affects citation impact (C 12 ). Thus far, A3 projects have led to the publication of numerous works, including internationally coauthored papers. Another common activity cited was mobility and exchange of researchers-including many postdoc, graduate, and undergraduate students-between the three countries [2].

Case Selection and Data Collection
The University of Macau (UM) and Shenzhen University (SZU) were selected to demonstrate the superiority of the model for creating sustainable development strategies for EACUs. Both universities are relatively new among EACUs with a strong desire to develop into WCUs, and they have recently demonstrated significant growth in both international rankings and resources obtained from central and local governments. The total performance scores of UM and SZU were similar to the EACUs examined in the current study. The senior managers of these universities actively cooperated with the research team and provided information on potential strategies, which was conducive to the validation of our model. The main goal of these two EACUs is to rationally allocate obtained resources, gradually improve in stages, and finally achieve sustainable development. Therefore, in terms of development path, vision, resources, and demands, the UM and SZU are typical EACUs that aspire to increase international competitiveness and regional service responsibility and are suitable examples for demonstrating the superiority of our model.
A total of 16 experts from the two universities were invited to partake in performance self-assessment (Step 5). This group of experts comprised senior leaders at the respective university, including previous and current presidents as well as deans of the global affairs management office. The group also included experts from the UM and SZU who primarily research HE development. As presented in Table 3, the performance value of each criterion was calculated according to the average of the 16 experts' performance scores, and the m-VIKOR method was applied to calculate the gap values according to the performance scores. The total gap value of each case was calculated according to the weighted gap value of each criterion, and the DANP method was used to obtain the weights from the IWs of the criteria (Step 4, Figure 1).

Results and Discussion
The results revealed several similarities between the UM and SZU. The total performance/gap values of the UM and SZU were 7.241/0.276 and 7.157/0.284, respectively (Table 3). Their total performance self-assessment scores were extremely similar. The senior decision experts at the two universities all agreed that the current performance of their respective universities was favorable, but that a gap still existed between the current performance level and their goal. Although the overall self-assessment scores were extremely similar for the two cases, differences in performance on certain criteria still existed.

Influential Weights (IWs)
The If improvement strategies are formulated through the traditional approach of "treat symptoms but not the disease", then on the basis of the performance scores, C 32 (5.875) and C 33 (6.375) should be given the highest priority in terms of improvement strategies for the UM (the red dots in Figure 4). The strategies would be (UM1) actively seeking support through regional research funding, including coestablishment funding from Guangdong Province and the Macao Science and Technology Development Fund; (UM2) establishing internship and employment partnerships with relevant enterprises in Guangdong, Hong Kong, and Macao to actively promote internships for and employment of graduates in the region; (UM3) promoting welfare incentives for or accountability assessments of staff and encouraging students to participate in competitions for patents or invention awards; and (UM4) recruiting competent staff capable of developing patents. If improvement strategies are formulated through the traditional approach of "treat symptoms but not the disease", then on the basis of the performance scores, C32 (5.875) and C33 (6.375) should be given the highest priority in terms of improvement strategies for the UM (the red dots in Figure 4). The strategies would be (UM1) actively seeking support through regional research funding, including coestablishment funding from Guangdong Province and the Macao Science and Technology Development Fund; (UM2) establishing internship and employment partnerships with relevant enterprises in Guangdong, Hong Kong, and Macao to actively promote internships for and employment of graduates in the region; (UM3) promoting welfare incentives for or accountability assessments of staff and encouraging students to participate in competitions for patents or invention awards; and (UM4) recruiting competent staff capable of developing patents.  Similarly, if the traditional approach is adopted, then C 33 (5.625) and C 22 (5.750) should be given the highest priority in terms of improvements strategies for SZU. For C 33 , the aforementioned UM3 and UM4 strategies could be prioritized. For C 22 , the strategies would be (SZU1) setting strict, necessary targets for secondary colleges in terms of graduation rate and number of graduates in normative time; (SZU2) attracting business executives from the world's top companies to become alumni; (SZU3) establishing internship and employment partnerships with relevant enterprises to provide students with job opportunities; and (SZU4) increasing the scale of enrollment.
However, in terms of systematic improvement through application of the proposed model (the green dots in Figure 4), the influential relationships among the criteria should be considered during the creation of systematic improvement strategies. Although the UM and SZU both underperformed on C 33 , their improvement strategies should not be the same.
This evaluation is based on the assumption that the gap threshold should be set at 0. (0.325), and C 15 (0.325) for SZU. In pursuit of a "piecemeal" approach, improvement criteria should be prioritized in the order of C 32 > C 33 > C 22 > C 24 . However, as indicated by the INRM, D 2 influences D 3 through D 1 ; therefore, C 22 and C 24 should be given higher priority than C 32 and C 33 . In D 2 , C 22 is affected by C 24 . Learning experience (C 22 ) tends to result from student training; therefore, universities should provide students with both practical and theoretical learning opportunities to experience various working environments. In theoretical learning, quality of courses (C 24 ) is critical and is related to factors such as curriculum planning, teaching form, and teaching quality. Therefore, in light of the student development goals of the UM, the quality of instructional programs and innovative forms of teaching should be improved first (UM_1). The other three criteria (C 23 , C 21 , C 25 ) in D2 directly affect C 24 and C 22 . Therefore, these three criteria must be considered together in efforts to improve C 24 and C 22 . In terms of C 33 , the Hengqin Campus at the UM was recently built. Governments invested substantial funds in new teaching and living facilities; thus, C 23 exhibited a relatively favorable performance. In addition, facilities such as libraries, laboratories, and seminar rooms should also be constructed and maintained (UM_1a). Regarding C 21 and C 25 , performance in C 21 is attributable to the recently implemented policy of introducing overseas talent, which also affects performance in C 25 . Efforts to improve performance on these two criteria should be maintained to increase the influence on C 24 and C 22 . In terms of developing course reform strategies, human resources academic staff and teaching resources of international partners can be integrated (UM_1b). In terms of practical opportunities for C 21 improvement, internship and employment partnerships must be established with relevant enterprises in Guangdong, Hong Kong, and Macao (UM_2). These improvements would affect performance in D 3 and facilitate knowledge transfer. Improvement in C 33 can be achieved through the combination of practical and theoretical innovations as well as cooperation with regional public and private enterprises. Instead of implementing strategies for making quick improvements in C 33 , including recruiting new employees who can quickly obtain patents and who are accountable for patent applications or other knowledge innovation outcomes, sustainable strategies such as course planning, academic staff training, and cooperation with relevant regional institutions should be given high priority at the UM. In addition, effective collocation of these strategies would facilitate improvements in intellectual autonomy and sustainable innovation.
Regarding SZU, effective strategies can also be created through application of the proposed model in the context of systematic improvement. The lowest performance was observed for C 33 (0.438), and poor performances were also noted for C 22 , C 23 , C 24 , and C 25 in D 2 as well as C 15 in D 1 . The order of influence (from lowest to highest) among these three dimensions is D 2 , D 1 , and D 3 (Figure 3). Relevant strategies for systematic improvement can be created according to the following priorities: (SZU_1): SZU should first emphasize policies related to D 2 . The provision of facilities (C 23 ) can influence other criteria. Attention should be directed toward reviewing support for C 25 , C 24 , and C 23 , including research laboratories and teaching facilities.
(SZU_2): In terms of international teaching (C 25 ) and international research (C 15 ), international faculty is considered a related factor influencing the internationalization of teaching and research performance. Recently, SZU insisted on introducing international talent. SZU can keep investing in C 13 and C 14 to improve performance in C 15 , such as attracting scientific talent, expanding the talent pool, and training excellence with leadership talents. Therefore, for C 25 , attention should be directed toward teaching facilities, library facilities in C 23 , and improving performance in C 25 , and maintaining the performance in C 21 .
(SZU_3): Regarding C 24 and C 22 , existing faculty can be effectively managed to enhance the quality of courses, thereby increasing students' competitiveness after graduation. These students will receive more favorable evaluations from future employers, thereby improving the reputation of SZU and attracting more international students. Improvements in these key criteria in D 2 and D 1 can positively influence C 33 in D 3 , and the resources allocated to improving C 33 can be reduced.
As indicated by the comparison between the two universities, the UM and SZU had similarly poor performances for C 33 , but the improvement priority of criteria and collocation relationships differed between the two cases in terms of systematic improvement. A systematic approach to self-development is more suitable for sustaining development than investing in criteria for which the university performs poorly in international rankings. Improvement strategies incorporating cause-effect relationships can assist decision-makers in making decisions according to systematic perspective policies instead of simply pursuing a piecemeal approach.
In general, EACUs are accustomed to employing methods for achieving immediate results in the pursuit of becoming a WCU in terms of ranking. These piecemeal methods include encouraging academics to publish papers in international journals, attracting foreign professors and returnees from overseas [15], and increasing the proportion of international students [54]. Universities should not "treat symptoms but not the disease" when formulating strategies [55] as although such strategies may lead to short-term performance improvements, they may also produce negative externalities. For example, to improve performance in internationalization indicators, Shandong University introduced a plan to attract international students, which included raising the allowance and reducing the assessment standard for international admission. One strategy, the "buddy program," which paired international students with local students of the opposite sex, caused a widespread uproar online. The incident had an extremely negative impact on the reputation of the university [56]. In another example, several universities in China partially adopted accountability guidelines from tenure track policies common in North America without supporting resource allocation strategies for young teachers, which resulted in tenure track candidates spending most of their work time building their publication portfolios instead of developing their teaching activities and serving the community, which are the core functions of university professors [57].
If the UM and SZU adjust their strategies as suggested herein, they should consider their status holistically and not simply focus on high-weight indicators when making decisions. In the future, EACUs such as the UM and SZU can apply our model to sustainably develop into WCUs. This paper details the construction and application of our model, which concerns the improvement of evaluation for EACUs that endeavor to sustainably develop into WCUs. As demonstrated by the examples of the UM and SZU, the proposed model can be used to systematically formulate policies, adjust the priority order of existing policies, and create complementary policies with limited resources. In contrast to the traditional strategy of "treat symptoms but not the disease", the proposed model emphasizes phased improvement based on the characteristics of various HEIs to achieve the goal of sustainable development.

Conclusions
EACUs eager for success can easily succumb to the fruitless pursuit of short-term improvements in performance. Therefore, the current study proposed a systematic evaluation model for self-assessment and the formulation of strategies to develop EACUs into sustainable WCUs. As indicated by the results, the proposed model differs from traditional strategies by pursuing a "stopgap piecemeal" approach involving systematic development strategies. The proposed model emphasizes the rational allocation of resources with consideration of the influential relationships among the relevant criteria. Therefore, by conceptualizing development expectations for EACUs, the current study improved the theoretical knowledge related to HEI development in a sustainable context. This research provides a systematic perspective and a useful tool for senior decision-makers at EACUs to achieve sustainable development goals.
The findings of the pilot studies must be considered within the context of their limitations. Employing a larger sample could produce results that more effectively represent other EACUs. Likewise, the limited number of studies on this subject makes comparisons of the results obtained from the pilot studies challenging. Nevertheless, the proposed model was largely effective in comparing and analyzing the two cases. Future studies should emphasize the implementation of more powerful and randomized control designs with larger sample populations.
External circumstances are constantly changing; in particular, international students have been affected by the COVID-19 pandemic and the rise of populist movements advocating isolationism and deglobalization in the West. Unique situations may affect indicator composition in relevant university ranking systems. In the future, our research may incorporate additional perspectives to elucidate more information on EACUs. Moreover, further research can extend the scope of application of the developed model from decision-making at the university level to that at regional and national government levels. A scientific, global, and sustainable approach to future research may contribute to mutual understanding and regional cooperation, which is a valuable endeavor.