Evaluation of lecturers’ performance using a novel hierarchical multi-criteria model based on an interval complex Neutrosophic set

. ce, Canada by the authors; licensee Growing Scien 20 20 ©


Introduction
In recent years, university lecturers have become the key factors in educational goals of national development strategies (King, 2014;Wiliam et al., 2017). Therefore, performance assessments of their teaching competency were used as an effective tool for supporting personnel decision-making through rewards, punishments, employment and dismissal. The assessments can also serve as the main criteria for verifying qualification certificates in academic institutions and universities (Wu et al., 2012;Jiayi & Ling, 2012;Maltarich et al., 2017;Zhou et al., 2018). A university should not only function as a training institution but also as a scientific research center that encourages lecturers to carry out scientific research activities (Fauth et al., 2014;Cuevas et al., 2018). Moreover, this approach could enhance the creation of an equal environment that improves the cooperative strategies (Cegarra et al., 2016;Wu et al., 2018), learning spirit (Cegarra et al., 2017) and autonomy of each student (Parrish, 2016;Darling-Hammond, 2017;Fischer et al., 2018). An effective system of assessing lecturers' performance can 120 directly help to estimate educational achievements from many perspectives, such as: improving meaningful and sustainable learning (Almeida, 2017); finding and fostering young talents (Bohlmann & Weinstein, 2013); and indirectly impacting the wealth of each country (Lazarides et al., 2018), and becoming a preferred policy at the global and local levels (Steinberg & Garrett, 2016;Tuytens & Devos, 2017). Lecturer assessment has been regarded as a complex issue with several complicated factors such as personal interests and the development strategy of the education system (Schön, 2017). One of the most difficult issues at any university is to have a fair and accurate assessment of its lecturers' activities from which to delegate their respective tasks and positions. The absences of an appropriate set of standards and tools may lead to inaccuracy and subjectivity in assessing the competence of each lecturer. There is a need to constantly evaluate lecturers from principals/managers (OECD, 2009;Marzano & Toth, 2013), through self-report (Singh & Jha, 2014), students (Kilic, 2010;Nilson, 2016;Lans et al., 2018) and peer-review from colleagues (Alias et al., 2015). Needless to say, most lecturers expect to receive good and fair reports, regardless of reality (Liu & Zhao, 2013;Nahid et al., 2016). Indeed, a multi-dimensional assessment could augment lecturers' knowledge background, expand their teaching repertoires and develop them professionally (Malakolunthu & Vasudevan, 2012;Skedsmo & Huber, 2018). It has been argued that a multi-objective formal process can improve lecturers' ability make professional decisions and judgments (Bambaeeroo & Shokrpour, 2017). Furthermore, since each locality and context is unique, lecturer evaluation should consider different local characteristics and various methodology and data resources (Sonnert et al., 2018). Criteria for this assessment include standards related to research capacity, teaching capacity and service activities that act as a multistandard decision-making process (Wu et al., 2012).
Currently, multi-criteria decision-making (MCDM) is used to navigate real-world problems and the uncertainty of human thinking at large (Li et al., 2015;Yang & Pang, 2018). The Analytical Hierarchy Process (AHP) and Technique for Order preference by Similarity to Ideal Solution (TOPSIS) are the most popular MCDM models that allow identifying and selecting the best solutions that use heterogeneous data (Torkabadi et al., 2018). The AHP model can be used to analyze complex problems by separating branching structure systems to calculate the weight of each criterion (Saaty, 2008). Although this model has some weaknesses regarding the number of criteria for quantitative analysis, it does not require clear information (Ishizaka & Labib, 2009;Karthikeyan et al., 2016). In contrast, TOPSIS allows determining the ranking through the use of many criteria (Hwang & Yoon, 1981;Chi & Liu, 2013;Wang & Chan, 2013). The basic principle of TOPSIS technique is that the most preferred alternative should simultaneously have the shortest distance from the positive ideal solution and the farthest distance from the negative ideal solution. It also reflects the rationale of human choice (Baykasoğlu et al., 2013). This method requires using the input data to find the weight of each criterion. Thus, integrating these popular MCDM techniques could effectively improve quantitative assessments for determining the performance and relative importance of lecturers. Smarandache (1998) proposed the neutrosophic set, which is independently characterized by a truthmembership degree (T), an indeterminacy-membership degree (I) and a falsity-membership degree (F), all of which are within the real standard or nonstandard unit interval (Chi & Liu, 2013). If a range is restrained within this interval, the neutrosophic set can be easily applied to problems in education (Akram et al., 2018). In this aspect, Wang et al. (2010) introduced the concept of the single-valued neutrosophic set as a sub-class of the neutrosophic set. Wang also proposed the use of interval-valued neutrosophic set as a subclass of neutrosophic sets with the values of truth-membership, indeterminacymembership, and falsity-membership. This set has been applied in different fields, such as decisionmaking sciences, social sciences, and humanities to solve problems involving imprecise, indeterminate and inconsistent information (Zhang et al., 2014). Later, Ye (2014) introduced another concept, the interval neutrosophic linguistic set, that involves new aggregation operators for interval neutrosophic linguistic information. In the same effort, Said et al. (2015) proposed another decision-making method that extends the TOPSIS method to deal with uncertain linguistic information in interval neutrosophic sets. However, to the best of our knowledge, there has been no research on integrating hierarchical TOPSIS in interval neutrosophic complex sets, especially for lecturer evaluation. Therefore, the A. D. Do et al. / Decision Science Letters 9 (2020) 121 combination of two useful techniques of MCDM such as AHP and TOPSIS and the interval-valued complex set in neutrosophic environment can reduce the shortcomings of traditional approach for lecturer evaluation (Biggs & Collis, 2014;Gormally et al., 2014).
In this paper, lecturer evaluation is the particular case study of the MCDM models. However, the complexity and uncertainty of this approach mean that it is necessary to integrate the hierarchical neutrosophic TOPSIS and the interval-valued complex set. Thus, this study presents the results of weighting performance evaluation criteria to rank five different lecturers of the University of Economics and Business -Vietnam National University, Hanoi. The rest of the study is organized as follows: Section 2 displays a review of the principal characteristics of lecturer evaluation. Section 3 presents the methodology of using the hierarchical neutrosophic TOPSIS to rank alternatives. An illustrative application is then presented in Section 4 to describe how the model works. Finally, conclusions and discussion are given in Section 5.

Lecturer evaluation methods
In recent decades, lecturer evaluation has received much attention from researchers seeking to enhance professional teaching (King, 2014). According to Colby et al. (2002), lecturer evaluation concerns competency, professionalism, advancement and student achievement. Buttram and Wilson (1987) suggested that the best evaluation is identifying the effective approaches used in teaching and knowledge at the university level. Doing this can improve the quality of students in the future. In another study (Davey, 1991), a lecturer was evaluated based on the dimensions of effective job performance, comprehensive excises and the use of multiple objects to eliminate bias. This process required frequent assessments and appropriate development strategies from the relevant institution. It has been argued that, assessment is primarily an organizational problem, not a technical problem (Schön, 2017). However, ineffective efforts are typically diagnosed in terms of a useless assessment instrument, prompting the search for better instruments (Lans et al., 2018). Evaluation experiences have long been considered influential in organizational behavior as sources of support for feedback, need satisfaction, feelings of competence and psychological success. Moreover, a lecturer evaluation system should include different components: vocational morality, attendance rate in school meetings and events, teaching and researching ability and student performance (based on student tests and report scores) (Chi & Liu, 2013;Reddy et al., 2018).
Lecturer evaluation requires the establishment of reference standards and evaluation criteria (OECD, 2009). Traditionally, this approach depended on classroom observations conducted by managers of a university (Danielson, 2000). This approach gave powerful tools for human resources, but the effects of this system are mixed (Zerem, 2017). Indeed, manager-based evaluation has many disadvantages regarding transparency and promoting the image of the university. Furthermore, the traditional approach used the test scores of students to determine lecturer performance (Tondeur et al., 2017). It was based on an image of the lecturer and beliefs about teaching that is inconsistent (Zare et al. 2016). Consequently, it had negative impacts on professional development and failed to improve the quality of teaching (Chappuis et al., 2016). Lecturers tended to try to impress managers and compete with their peers all of cost (Liu & Teddlie, 2007). Current curriculum reforms focus on the participation of managers, peers, students and the lecturers themselves for self-evaluation (Ovando, 2001;Muijs & Reynolds, 2017). Even if given feedback from such an evaluation system, lecturers might not be inclined to reflect on their practices Kurtz et al., 2017). In the best practices, the evaluation of teaching should provide an opportunity for dialogue between lecturers and evaluators based on a shared understanding of good teaching (Nilson 2016).

Main criteria used in the lecturer evaluation framework
In this study, a multi-criteria evaluation process was introduced to assess the efficiency and capacity of lecturers at the university level. Based on previous studies, the following criteria were divided into four 122 main groups: self-, manager-, peer-and student-based evaluation -and 13 sub-criteria (Wu et al., 2012) as shown in Fig. 1. These four aspects can be used to clearly evaluate and improve lecturer performance (Odden, 2014).

Scientific publication (C11)
Scientific publications are an effective criterion for academics, methodical evaluation and human recruitment. Published articles treat complex problems (Zare et al., 2016). Also, having journal articles published important in the academic community in developing countries (Jaramillo et al., 2017). Writing an article for publication is difficult, so publications can be used to determine and classify a lecturer's academic ability (Wu et al., 2012). Thus, in this study, this criterion was assessed as the ratio between the number of articles over two per year and the total in a year. Another important aspect is the duration of research: a good lecturer usually requires less time to publish an article (Zerem, 2017).

Supervising postgraduate students (C12)
Students expect their supervisors to have sufficient professional ability and knowledge to provide advice on research (Wu et al., 2012). Thus, lecturers must be up-to-date and pursue research activities in numerous aspects and multiple fields (Sharp et al., 2017). They must supervise many students and act as learning mentors (Wisker, 2012). In addition, publications made during the supervision of trainees have certain scientific value. When a lecturer's trainees require less time to publish, it reflects well on the lecturer.

The journal peer-review process (C13)
Researchers usually seek various opportunities to indicate their knowledge and skills. One of the key steps in the climb to academic success is becoming a peer reviewer (Wolf, 2016). Content and methodology experts review papers and create recommendations to increase the value of the publications for a specific journal (Thomas et al., 2014). They supply feedback for articles and research, suggest improvements and make recommendations to the editors about whether to accept, reject or request changes to articles (Iantorno et al., 2016). Thus, to become a journal peer reviewer, researchers must spend a great deal of time accumulating professional experience (Wu et al., 2012). As one of the aspects of evaluation, this study used the length of time before becoming a journal reviewer.

Lecturing activities (C21)
This criterion involves preparation time and statutory teaching time. It is the number of hours spent teaching a group or class of students according to the formal policy in a country. At universities, lecturing time is counted by the number of lessons. The duration of each lesson is regulated at 45 or 50 minutes, and this was used to determine the time spent teaching. The ratio between the number of lessons and subjects per year and the total in a year was used to evaluate standard lecturing time. Additionally, the number of scientific publications can be compared with lecturing time (Zare et al., 2016).

Language of instruction (C22)
A search of the relevant literature revealed a lack of research on pre-service English lecturer teaching programs. There is a concern for the standards of teaching and learning in a non-native language (Wu et al., 2012). Consequently, lecturers are compelled to constantly adapt their lecture, which affects the standards and amount of content taught during a semester. Thus, limitations in lecturers' linguistic competencies have negative effects on program quality (Bradford, 2015). A. D. Do et al. / Decision Science Letters 9 (2020) 123 2.2.6. Lecturing attitude and spirit (C23) Lecturing attitude and spirit are a sum of several behaviors. However, a gradual decline in attitude over a lecturer's career may flatten variations between these behaviors (Frunză, 2014). Lecturing attitude is a common concern in psychology pedagogy research (Zyad, 2016). For example, some lecturers come to class late, which has negative impacts on student acceptance.

Evaluation and scoring system (C24)
The procedures used for training lecturers to score students' work dependably are consistent across colleges (Wu et al. 2012). Student work samples represent completely different levels of performance on rubrics. To score fairly, lecturers should give examples and clarify the distinctions between score levels. Lecturers rated a pre-selected example to evaluate scoring (Tondeur et al., 2017).

Cooperation in research projects (C31)
Research projects require cooperation between researchers and a good working environment (Hein et al., 2015). Management is accountable for the conduct of editors, for safeguarding research records and for ensuring the reliability of published research. It is important for researchers to communicate and collaborate effectively on cases related to research integrity (Wager & Kleinert, 2012). Cooperation involves not only reducing the time spent on projects but also advancing and exchanging knowledge. This study used for this criterion the number of co-worker cooperation projects over two projects and the duration of these projects.

Teamwork in scientific and teaching activities (C32)
The advantages of cooperation and teamwork for researchers include assistance for testing and measuring, access to vast amounts of knowledge and assistance in developing new initiatives (Johnson et al., 2012). Furthermore, different researchers contribute different types and amounts of resources, which increases the number of publications of all involved in the cooperation (Wardil & Hauert, 2015). Research teamwork refers to a broad variety of activities, from simple opinion exchanges to side-byside work in the laboratory. Thus, it is important to the evaluate lecturers' cooperation and teamwork.

Participation in school meetings and events (C33)
Lecturers build good relationships with their co-workers and students when they take part in school meetings and events (Wang & Hsieh 2017). It is widely acknowledged that school meetings and events are important for guaranteeing cooperation, ensuring that lecturers are professionally ready for work and identifying basic problems related to their work (Frunză, 2014;Zyad, 2016). Lecturers can be evaluated at a high level for behavior that demonstrates professional responsibility (Frunză, 2014). Thus, the ratio between the number of attended school meetings and events and total compulsory school meetings and events was used to evaluate lecturers.

The content of the lessons (C41)
Regarding teaching and learning, students especially evaluate lecturers based on the quality of teaching content (Nilson, 2016). Alongside these two players (lecturer and students), this approach evaluates school factors that are expected to influence teaching and learning (Shingphachanh, 2018). Moreover, lecturers should offer real-world examples to create interest for students (Brookfield, 2017). This research used the number of students who understood lessons and the theoretical learning duration to finish the subject to a satisfactory level for this criterion. Lecturer-student relationships are associated with both attrition and the general mental and physical health of lecturers (Kupers et al., 2015). These relationships are usually characterized by respect, warmth, and trust, as well as low levels of social conflict. Likewise, lecturers have more experience, education, and skills than their students, and thus they have a unique set of responsibilities to students (Aldrup et al., 2018). They are expected and trained to act in the best interests of their students. Therefore, they should be motivated to act appropriately and responsibly toward students.

The irrelevance of the subjects (C43)
This criterion involves the essential and traditional method whereby practitioners request information on whether their teaching has impacts (Nilson, 2016). Subjects should be sporadically assessed and reviewed. The issues include content and objectives, teaching plans, assessment procedures, the behaviors of students in the class and the experience of the lecturers (Brookfield, 2017). This includes the expectations for students' educational outcomes in a subject matter, as well as the appropriateness of the objectives and content in achieving these outcomes. Thus, any irrelevance in teaching can have negative impacts on the behaviors and outcomes of students and trainees. The list criteria used in evaluating a lecturer performance can be exhaustive. However, they can be summarized in Figure 1, which shows that lecturer evaluation depends on four main groups for assessment. Each of these patterns consists of three sub-criteria, except for the second group of manager-based evaluation, which includes four sub-criteria. This study provides an integrated approach to find the best alternative. It presents a hierarchical structure and provides the most appropriate approach to evaluate lecturer.  Table 1 also summarizes and explains the selected criteria based on the literature review. Particularly, each criterion was identified to have three corresponding aspects with the complex neutrosophic set (truth, indeterminacy, and falsity) and its real and imaginary parts. Two features of each criterion describe the amplitude and phase terms in this set, which are represented by intervals. These are the background for determining the input values based on the available data in the educational system. Consequently, experts can make changes to the levels of these parameters for a given year based on three patterns, as shown in Table 1.  (Wu et al. 2012;Wisker 2012;Sharp et al. 2017) Real part: The number trainees who were guided (h12)/ Total trainees per year (t12) The ratio is between the number of trainees over 05 trainees per year and the total in a year.
h12 completed/ t12 h12 uncompleted or processing/ t12 h12 could not complete or rejected/ t12 Imaginary part: The number of standard graduation reports (h )/ total graduation reports (t ) The ratio is between the number of standard graduation reports over 05 reports per year and the total in a year.
h published to articles /t h can publish lately to articles /t h did not publish to articles/t 1.3. The journal peer-review process (C13) (+) (Wu et al. 2012;Zare et al. 2016) Real part: The number of journal publications reviewed (h13)/ Total number of journal publications were suggested per year (t13) The ratio is between the number of journal publications over 02 publications per year and the total in a year.
h13 which completed/ t13 h13 which are submitting or processing/ t13 h13 which did not complete or rejected/ t13 Imaginary part: The duration to become the journal reviewer (months) (h )/Total duration joined in the scientific publication process (t ) The duration to become the journal reviewer (months))/ Total duration joined in the scientific publication process. The ratio is between the number of subjects which lecturers were assigned and total subjects which student registered in a year.  (Wu et al. 2012;Bradford 2015) Real part: The number of courses taught in English (h22)/ Total courses (t22) The ratio is between the number of number of courses taught in English and total courses in a year. h22 had over 50 peoples/t22 h22 had from 30-50 peoples/t22 h22 had under 30 peoples/t22 Imaginary part: The number of students who did not understand lessons (h )/ Total students (t ) The ratio between the number of who did not understand lessons and total students in a year. The ratio is between the number of lessons which lecturers came to class lately and total in a year. h23 accounts for under 30% of t23 h23 accounts for 30-50% of t23 h23 accounts for over 50% of t23 Imaginary part: The duration which lecturers came to class lately (h )/Total duration of lessons (t ) The ratio is between the duration which lecturers came to class lately and total duration in a year.
h accounts for under 20% of t h accounts for 20-50% of t h accounts for over 50% of t 2.4. Score evaluation process for students (C24) (+) (Bradford 2015;Tondeur et al. 2017) Real part: The number of exams which lectures organized (h24)/Total standard exams (t24) The ratio is between the number of exams which lecturers organized and total in a year.
h24 accounts for over 80% t24 h24 accounts for 60-80% t24 h24 accounts for under 60% t24 Imaginary part: The average duration which lecturers paid each exam (h ) The difference between the average duration which lecturers paid each exam and the exam time before.
h are under 01 month h are from 01-02 months h are over 02 months 3. Peer-evaluation (C3) 3.1. Cooperation in research projects (C31) (+) (Wager and Kleinert 2012;Wu et al. 2012;Hein et al. 2015) Real part: The number of co-worker cooperation projects (h31)/ Total projects (t31) The ratio is between the number of coworker cooperation projects over 02 projects and the total in a year. h31 at ministerial level/ t31 h31 at school level/t31 h31 did not belong to the above two categories / t31 Imaginary part: The average duration to carry out each project (months) (h )/12 months The ratio between the average duration to carry out project and the total in a year h at over 12 months/12 months h at over 05-07 months/12 months h at under 05 months/12 months 3.2. Teamwork in scientific and lecturing activities (C32) (+) (Johnson et al. 2012;Wardil and Hauert 2015) Real part: The number of initiatives to improve lecturing effectiveness (h32)/ Total initiatives (t32) The ratio is between the number of initiatives to improve lecturing effectiveness and total initiatives in a year. h32 at ministerial level/t32 h32 at school level /t32 h32 did not belong to the above two categories /t32 Imaginary part: The duration to complete initiatives (months) (h )/12 months The ratio between the number of initiatives to duration and total in a year.
h which are under 05 months/12 months h which are from 05-07 months/12 months h which are over 12 months/12 months

Participation school meetings and events (C33) (+) (Frunză 2014; Zyad 2016)
Real part: The number of participation school meetings and events (h33)/ Total compulsory school meetings and events (t33) The ratio is between the number of participation school meetings and events and total compulsory school meetings and events in a year.
h33 accounts for over 80% t33 h33 accounts for 60-80% t33 h33 accounts for under 60% t33 Imaginary part: The number of school meetings and events were on time (h )/ Total compulsory school meetings and events (t ) The ratio is between the number of school meetings and events be on time and total compulsory school meetings and events in a year. h which accounts for over 80% t h which accounts for 60-80% t h which accounts for under 60% t Imaginary part: The number of lessons which students asked the questions (h )/Total lessons (t ) The ratio is between the number of lessons which students asked the questions and total lessons in a year.
h have under 05 questions from students/t h have from 05-10 questions from students/t h have over 10 questions from students/t 4.3. The irrelevance of subjects (C43) (-) (Nilson 2016;Brookfield 2017;Shingphachanh 2018) Real part: The number of lessons had irrelevance of subject to practice (h43)/Total lessons (t43) The ratio is between the number of lessons had irrelevance of subject to practice and the total in a year.
h43 accounts for under 30% t43 h43 accounts for 30-60% t43 h43 accounts for over 60% t43 Imaginary part: The number of students who complained the irrelevance of lesson into the reality (h )/ Total students (t ) The ratio is between the number of students had irrelevance of lessons to practice and the total in a year.
h accounts for under 30% t h accounts for 30-60% t h accounts for over 60% t

Interval Complex Neutrosophic Set
The neutrosophic set is a generalization of the classic set, fuzzy set (Zadeh, 1965), interval-valued fuzzy set (Turksen 1986) and intuitionistic fuzzy set (Atanassov, 1986) that was proposed by Smarandache (1998). Many real-life problems have not only truth and falsehood, but indeterminacy between several suitable opinions (Ali et al., 2018). This method represents an extension of the standard interval [0, 1] used for interval fuzzy sets. To deal with this situation, the concept of interval neutrosophic sets (INSs) can be used to make these values intervals rather than real numbers. Furthermore, the Hamming and Euclidean distances between the INS and the similarity measures are based on distances (Ye, 2014). Moreover, based on complex numbers, Ali and Smarandache introduced the complex neutrosophic set to handle the amplitude and phase terms of the set's members (Ali & Smarandache, 2017). In real problems, it is difficult to find a crisp neutrosophic membership degree with unclear information, so Ali proposed the interval complex neutrosophic set (ICNS) (Ali et al., 2018). In this set, the terms of the ICNS can handle unsure values in the membership. This section provides some basic definitions of the neutrosophic set proposed by Smarandache (1998).

The definition, operation rules and distance of ICNS
The interval neutrosophic linguistic set, developed based on the theory of the INS, allows solving complex problems in quantitative assessments, as shown in the following (Ye, 2014). (Smarandache 1998): Let X be a universe of discourse, with a generic element in X denoted by . A neutrosophic set (NS) ̅ in X is:

Definition 1. Neutrosophic set
x of ]0 − ,1 + [ define the degree of truth-membership function, indeterminacy-membership function, and the falsity-membership function respectively. There is no restriction ( ), ( ), ( ) (Wang et al., 2010): Let X be a universe of discourse, with a generic element in X denoted by x . An interval neutrosophic set A in X is: define the degree of truth-membership function, indeterminacy-membership function, and the falsity-membership function respectively, so: Definition 4. Interval-valued complex neutrosophic set (Ali et al. 2018) An interval-valued complex fuzzy set ̅ is defined over a universe of discourse X by a membership function In the above equation, [0,1]  is the collection of interval fuzzy sets and R is the set of real numbers. ̅ ( ) is the interval-valued membership function while is the phase term, with 1 j   .

Definition 5. Union of interval complex neutrosophic sets (ICNSs)
Two complex fuzzy sets A and B were defined by Ramot et al. as follows: Then, the union of two interval neutrosophic sets was defined as follows: where  and  denote the max and min operator respectively. To calculate the phase term Then, the intersection of two interval neutrosophic sets was defined as follows: Let A and B be two IVCNSs in X. Then:  and  denote the max and min operator respectively. To calculate the phase term

The operation rules of interval-valued complex neutrosophic set
Definition 7. The operational rules of the interval complex neutrosophic sets neutrosophic sets over X which are defined by ( ), ( ) , , , .
ii) The addition of A and B is defined as: The bounded between A and B is defined as: , .
The product of A and B is defined as: , .
(v) The scalar multiplication of A is defined as: , .
The difference between A and B is defined as follows:

Linguistic Variables
A linguistic variable is a variable whose values are words or sentences in a natural or artificial language. For instance, some matters are characterized by linguistic terms, such as unsatisfied, fair and satisfied. Each linguistic variable can be assigned one or more linguistic values, which are in turn connected to a numeric value through the mechanism of membership functions. The linguistic terms shown in the following were used to quantify each attribute to normalize the decision-making matrix. Then, the linguistic variables of each criterion were weighted for importance. Accordingly, this study classified the data resources into five scales to express un-quantified matters, as shown in Table 2.

Brief description of the hierarchical MCDM model
The TOPSIS method does not consider a hierarchical structure for the main criteria and sub-criteria. Thus, the relationship between overall weight and global weight lacks information about (i) the comparative analysis of different criteria and (ii) the approach separates qualitative and quantitative variables (Wang & Chan, 2013). These reasons make the TOPSIS technique more compatible with decision-making problems than others (Bottani & Rizzi, 2006). Especially, the accuracy of this framework depends on three issues: organizing the model's structure; choosing and selecting the criteria set; and weighting each criterion for calculation in the TOPSIS technique. This paper proposes a hierarchical MCDM approach to present the framework for evaluating lecturer capacity. First, the AHP is used to determine the weights of criteria through pair-wise comparisons. Then, the TOPSIS method is used to acquire the comparative ratings of alternatives for lecturer performance. This evaluation of lecturer performance will lead to better results for two reasons: (i) the AHP technique can describe the correlations between criteria in the model; (ii) the aim of the TOPSIS technique is to convert multiple choices to a single choice.
Many researchers have highlighted the drawback of TOPSIS regarding the weight allocated to each response (Bottani & Rizzi, 2006;Wang & Chan, 2013). These weights all must sum up to one and may vary from person to person. So, these weights were determined using the AHP method for each response in order to consider which information is superior for solving complex decision problems (Saaty 1980). Any complex problem can be decomposed into several sub-problems with hierarchical levels using the AHP technique, where each level represents a set of criteria or attributes relative to other problems. This hierarchy of criteria and sub-criteria could be either quantitative or qualitative in nature. This can be done by introducing pair-wise comparisons between criteria, as assessed by professionals or experts in the corresponding areas. Implementing this lecturer evaluation could help determine the benefits and costs of educational activities. This study proposes a hierarchical TOPSIS approach in the interval-valued complex neutrosophic environment. This approach was extended by applying interval linguistic variables and complex numbers.

Analysis of criteria weights with the AHP technique
In this methodology, the problem is structured in a hierarchy of different levels consisting of the main goal, criteria, sub-criteria, and alternatives. When assessing the improvement areas for implementing lecturer evaluation, it is essential to know the relative importance of the criteria and sub-criteria. In other words, assessors must determine the weights of the main criteria. The AHP is a capable technique for comparing the short-and long-term impacts of the gauge year. Additionally, the subjective assessments were converted to numerical values and processed to rank each alternative on the scale. The AHP method solved the problem with four necessary steps (Saaty, 1980): Step 1: Define the problem, determine the goal of analysis and build the hierarchical structure model to evaluate the quality of the lecturer. First, the patterns needed to be defined. The criteria, sub-criteria, and alternatives were also determined. Second, all information was put in the hierarchical structures of the AHP technique. These illustrated the range of the problem from general to more detailed (Fabjanowicz et al., 2018). Note that the quality of performance affects the correctness of results, especially the consistency between the pair-wise comparisons of elements.
Step 2: The construct pair-wise comparison. This study involved collecting data from decision-makers and consultations with experts to compare alternatives. The relative importance of different elements was determined using the standard scoring values given in Table 3. The pair-wise comparison matrices for all factors were composed using expert opinions.

Table 3
Standard values of the relative importance of factors (Saaty 1980) Intensity of importance Definition Explanation 1 Equal importance Two activities contribute equally to the objective. 5 Essential or strong importance Experience and judgment strongly favor one activity over another. 7 Very strong importance An activity is favored very strongly over another, its dominance. 9 Demonstrated importance The evidence favoring one activity over another is the highest possible order of affirmation. 2,4,6,8 Intermediate values between the two adjacent judgment When compromise is needed This scale aims to determine how many times a more important or dominant element is prioritized over another element concerning the criterion or property with which that element is associated in non-trivial comparisons, according to the formula presented below (Saaty, 2008).
Step 3: Check the Consistency Ratio (CR). If matrix A is a consistent matrix, then the maximum eigenvalue of A should equal its number of orders. However, in practice, the pair-wise comparison matrix cannot achieve complete consistency. This shows the consistency of elements when the 132 decision-makers make pair-wise comparisons. The execution of this step in the algorithm confirmed that each matrix was within a permissive CR. Note that the result of multiplying the values in diagonal elements is 1 as shown in Table 4.  If the consistency ratio ≤0.1, the evaluation within the matrix is acceptable. In contrast, if CR is more than 0.1, the judgments are untrustworthy, because they are too close to randomness and the assessment is valueless or must be repeated. The average consistency of random matrices (RI) as shown in Table  5.

Hierarchical TOPSIS model based on ICNS
The classical TOPSIS approach was identified as the best alternative based on the estimation of Euclidean distance. This solution describes the relationship between the shortest distance from the positive ideal solution and the farthest distance from the negative ideal solution. The ideal classical TOPSIS method can be presented using the following steps: The advantages of TOPSIS are its logicality, rationality and computational simplicity. The classical TOPSIS method and its extensions have demonstrated their capabilities and potential in dealing with MCDM problems in various fields. These extensions involve integrating the interval complex neutrosophic set and TOPSIS.
Step 1: Construct the decision-making matrix The process starts with the construction of a decision matrix, D, based on a given set of criteria and sub-criteria. The data of decision matrix D come from different sources.  Step 2: Normalize the decision matrix (Chi & Liu 2013) In general, the criteria can be classified into two categories: benefit and cost. The benefit criterion means that a higher value is better, while for the cost criterion the opposite holds true. Therefore, it is necessary to normalize them in order to transform them into a dimensionless matrix that allows comparing the various criteria. In this research, the normalized decision matrix is denoted by R , as follows: if criterio n is b e n efit typ e if c rite rio n is co st typ e ij ij ij ij x and min x respectively present the largest and lowest values of each sub-criterion.

Step 3: Determine the positive ideal solution (PIS) and negative ideal solution (NIS)
The study involved selecting the virtual positive ideal solution and negative ideal solution by selecting the best values for each attribute from all alternatives.
Step 4: Calculate the separation measures from the positive ideal solution (PIS) and negative ideal solution (NIS).
The distance between the alternative i A and the positive ideal solution/negative ideal solution is: Step 5: Select the best alternative Rule 1: When i d  is smaller, Ai is nearer to the best * A , i.e., Ai is the best solution. Rule 2: When i d  is bigger, Ai is farther from the worst * A , i.e., Ai is the best solution.
x and p X x be the amplitude and phase terms of interval complex number X . The comparison between A and B is defined as follows:

Steps in the novel hierarchical TOPSIS model
From the proposed algorithm of the AHP technique and the integration of the interval complex neutrosophic set and TOPSIS, it was possible to compute the basic steps of the novel hierarchical TOPSIS models as follows:

134
Step 1: Determined the criteria, sub-criteria and alternatives/objects of the decision-making problems and establish a hierarchical structure.
Step 2: Obtained the degree of importance and the performance of the alternatives/objects for all criteria using expert opinions.
Step 3: After collecting information from experts, the pair-wise comparison matrix was constructed based on the standard values of the relative importance of factors.
Step 4: Used the Lambda-max method to calculate the weight of each criterion given by the experts. The consistency index of a comparison matrix must be below 0.1. If it exceeded this value, the decision maker checked the pair-wise comparison matrix in Step 3.
Step 5: Applied the AHP method to integrate all expert opinions to obtain a weight for each aggregative criterion.
Step 6: Constructed the decision matrix from the input data that were collected in Step 1 based on the interval-valued linguistics and complex neutrosophic set. The matrix was built by putting Step 7: Established a normalized performance matrix where two types of criteria (benefit and cost criteria) were converted.
Step 8: Calculated the weighted normalized matrix.
Step 9: Determined the maximum and minimum of the differences and calculated the positive and negative ideal solutions (PIS and NIS).
Step 10: Used the separation measures of each point to PIS and NIS and compared them to select the best alternatives.

Application of the proposed hierarchical multi-criteria model for lecturer evaluation
This section presents a numerical example to demonstrate the proposed method. A university needs to choose the best lecturer from five possible options, denoted as A1, A2, A3, A4 and A5. There are 13 subcriteria in four groups in the database; these represent the average values of each surveyed group. The four criteria are: C1, Self-evaluation; C2, Manager-based evaluation; C3, Peer-evaluation; C4, Studentbased evaluation. Sub-criteria C23 and C43 are cost criteria and the others are benefit criteria. In order to select the best lecturer, the hierarchical TOPSIS in the interval-valued complex neutrosophic set is applied as follows.
Steps 1-5: Calculate the weight of criteria and sub-criteria Tables 6 and 7 show the weight of the sub-criteria and criteria based on the AHP methodology using formulas (8-10).

Table 6
The values of pair-wise comparison matrices and weights of criteria Table 7 The values of pair-wise comparison matrices and weights of sub-criteria Step 6: Construct the decision matrix from the collected input data, based on the interval-valued linguistics and complex neutrosophic set as seen in Table 2. The hierarchical linguistic variables for the importance weight of each criterion was presented and the ratings of alternatives under all criteria by decision-makers in Table 8 and Table 9.  Step 7: Establish a normalized performance matrix in which two types of criteria (benefit and cost criteria) are converted using formula (11), as shown in Table 10. Step 8: Calculate the weighted normalized matrix by multiplying the weight of each criterion and the corresponding values from the decision-making in step 7 (shown in Table 11).   Step 11: Rank the alternatives: A1 > A3 > A2 > A5 > A4.
A total of 13 evaluation dimensions exist for the four options in this study. Tables 6 and 7 illustrate that a consistency ratio of 0.10 or less is acceptable. If the consistency ratio is greater than 0.10, the weight assignments must be re-evaluated within the pair-wise comparisons matrix. It is clear that the highest number of results is for the manager-based evaluation (C2) (0.454), while self-evaluation has the lowest (C1) (0.084). Therefore, the manager aspects (lecturer activities and lecturer style) and the content of the lessons are the major factors for the results of evaluation, with 0.214, 0.129 and 0.193, respectively. Managers and students are the direct and objective assessors of the quality of teaching and the capacity of the lecturer. Hence, these are the main subjects who are affected by the results of this assessment.
After identifying the relative weights of the evaluation criteria, five lecturers of the University of Economics and Business, Vietnam were ranked. We first summarize all the input data and then standardized the different units of each evaluation criterion based on a linguistic variable. Next, we utilized the hierarchical TOPSIS method in the interval-valued complex neutrosophic environment by using the relative weights acquired from the AHP in the previous Section to calculate the weighted values of each lecturer. The results derived for this approach show that A1's teaching activities have the highest level and is recommended to enhance capacity. This is due to the fact that there is a focus on investment for teaching and scientific activities (at FAS and MS, respectively). Additionally, this lecturer has good lecturer-student interactions and improves the quality of the subjects. The medium grades belong to A2, A3, and A5, who have the low values of input data for most criteria. However, from the results indicated in Table 10, we discovered that all five criteria in A4 achieve the meets standards (MS) grade, which indicates that this alternative does not focus on cooperation with co-workers and behavior. Thus, based on the results of the evaluation, this subject was rated with the worst grade. Therefore, the proposed approach is useful for lecturer evaluation and can improve effectiveness and sustained competitive advantage. The implementation of this assessment will not only improve the lecturers' performance but also enhance the University's brand image.

Comparing with other models
In this Section, the proposed method is compared with other methods. Table 14 lists the results of the comparison. The proposed method and the classic TOPSIS method can solve problems in uncertain environments. However, the TOPSIS and AHP techniques have some disadvantages in terms of calculation methods and results. Besides, the extent of the interval neutrosophic TOPSIS does not consider the capacity of each lecturer in the specific time period. The comparisons between the previous methods and the proposed hierarchical TOPSIS method are summarized in Table 14.  (Saaty, 1980) 5.164 7.060 7.460 6.040 6.906 A3> A2 > A5 > A4 > A1 TOPSIS (Hwang and Yoon, 1981) Table 14 shows the comparison between the proposed method and three different methods for the aggregative value and the ranking of total alternatives. The table shows that the TOPSIS and AHP methods have the same results in classifying the abilities of the lecturers. Accordingly, A3 is the best lecturer (n=7.460 for the AHP results), while the worst lecturer is A1 (n=5.164 for the AHP results). Additionally, in terms of the results for the interval neutrosophic TOPSIS, there is a difference with that of the below techniques. A2 is evaluated with high values as the best alternative, and A3 and A5 follow. Moreover, the lowest level for lecturer evaluation is for A4, instead of A1 in the results of the TOPSIS and AHP methods. In contrast, A1 has the highest rating in the integrated hierarchical TOPSIS and the interval-valued complex neutrosophic set. Furthermore, similar to the interval neutrosophic TOPSIS, A4 has a medium rating (at the final class). When the study used different methods, the ranking of the alternatives was disturbed. This demonstrates that the number of criteria and the sample size play important roles in decision-making problems. Thus, depending on the complexity of the issues, an assessor may require customizable selections and a suitable method.

Limitations of this model
The current study has several limitations. First, lecturer evaluation is a difficult issue in employee recruitment in the educational system (Malen et al. 2015). Any assessment must be effective because it includes the possibility of job loss (Kunter and Baumert 2006;Odden 2014). For some lecturers, a low rating may motivate them to enhance their professional level. Others may try to build nice relationships with management, fellow workers and students. Hence, determining the weaknesses of lecturers and universities is key to supporting improvement, but it also brings negative impacts such as competition between colleagues (Fischer et al. 2018). Additionally, this approach should be tested for a long time before being applied (Derrington and Campbell 2015). Secondly, although the TOPSIS method in interval-valued complex neutrosophic is effective, this method requires complex calculations to ensure exact results (Chi and Liu 2013). Thus, the method depends on the professional knowledge of experts to identify the importance of each criterion (Malik et al., 2016). Third, limitations in the number of samples may undermine the robustness of the findings, such as low reliability of the cognitive measure (n = 05). It would be beneficial for further research to use larger samples from different educational levels. Moreover, the study findings rely on four objective reports to create a set of criteria. Further research can evaluate many methods for selecting the data, such as interviews and survey reports. The survey method could be used to obtain the information and data required by an investigator (Wu et al. 2012). Finally, this study is limited by the fact that it only reflects some subjects' responses, which may or may not share the perspectives of the evaluated lecturers. Still, school leaders are objective observers of lecturer-level effects, perhaps more so than the lecturers themselves (Taut et al. 2011). For this reason, we recommend further work on the coverage and depth of the problem.

Conclusions
The implementation of lecturer evaluation can generate competition between lecturers in a university. It also requires a strategic approach to fairness and transparency. Because of its complexity, assessment has become a hot topic in education and management systems. This study has presented a comprehensive assessment based on a hierarchical structure for the criteria set. The hierarchical TOPSIS approach was developed with the interval-valued complex neutrosophic set to create an assessment for determining lecturer capability. The proposed hierarchical approach was further compared with the related methods to demonstrate the advantages and applicability. The results show that the proposed approach is efficient and can be used to solve other decision-making problems. However, there remains certain limitation of the study and future work are proposed to make lecture evaluation more accurate, which will support the dynamic decision-making procedure in education contexts in the real world.