Automatically evolving preference-based dispatching rules for multi-objective job shop scheduling

Dispatching rules represent a simple heuristic for finding good solutions for job shop scheduling problems. Due to their fast applicability and easy handling, they are often used in manufacturing companies to create appropriate production schedules. It has been shown that dispatching rules that are specifically designed for the requirements of a particular environment improve the performance of schedules. Hyper-heuristics based on genetic programming can be used for the automated generation of such dispatching rules. Evolutionary algorithms search the space of dispatching rule components for the most effective priority function to optimize the performance of the resulting schedule. Various studies have highlighted the advantages in the single-objective case, which made it possible to derive a large number of new dispatching rules that exceeded previous benchmark rules. Because it is usually necessary to consider more than one objective simultaneously to ensure effective creation of schedules, the need for a multi-objective optimization method arises. In this paper, we propose an interactive multi-objective optimization method, namely the reference point method, implemented in a hyper-heuristic genetic programming framework. A decision support system has also been developed and implemented in a web-based application to facilitate interaction with the user. Incorporating preferences into the solution process aims to efficiently evolve a dispatching rule that meets the expectations of a decision-maker. A fictitious experiment was carried out in a benchmark job shop environment. The results show that the final solution selected by the decision-maker can produce schedules achieving a desired compromise between the makespan, total tardiness, and total waiting time. Testing the evolved dispatching rule on an independent set of instances and comparing its performance with other benchmark dispatching rules revealed that the proposed method successfully finds dispatching rules that meet the decision-maker’s expectations and are capable of reproducing similar compromise schedules for unseen problems in the same environment.


Introduction
Efficient allocation of production resources is one of the most important tasks in the operational management of a manufacturing company. A well-performing production schedule, as an important lever for achieving crucial corporate goals, is therefore indispensable in order to remain competitive in a dynamic and global market environment. In particular, to meet the lean requirements of modern productions, an optimization of the production processes is essential, which is why research in this area has received increasing attention over the past decades. As one of the most widely studied production scheduling optimization problems, the job shop scheduling problems are well-known combinatorial NP-hard problems (Garey et al., 1976). Therefore, the use of exact methods will be efficient only for a very low number of small size instances. To solve more complex shop configurations of larger size, heuristics have been found to be suitable approaches to find "good" solutions in reasonable CPU time.
One frequently used class of scheduling heuristics is the so-called dispatching rules (Blackstone et al., 1982;Panwalkar & Iskander, 1977). Due to their ease of implementation and low computational cost, dispatching rules represent a popular technique applied by industry operators to generate acceptable schedules in real time. Whenever a machine becomes idle, dispatching rules are used to assign a priority index to all jobs waiting in the queue in order to select the one to be processed next. However, one of the major drawbacks of such techniques is the strong dependence of the performance of dispatching rules on the state of the system. Because of the wide variety of different requirements a manufacturing system can embody, as well as the various objectives that can be striven for, no single dispatching rule is superior to all others (Rajendran & Holthaus, 1999). Instead, dispatching rules are problem-specific priority rules that need to be specially designed in order to perform well in a particular environment. The development of such rules can be time-consuming, leading to an expensive trial-and-error process (Geiger et al., 2006). To overcome this issue, hyperheuristics (HH) have been proposed to automatically search for a good combination of existing dispatching rules or to develop an entirely new rule. Instead of searching directly in the search space of solutions, HH typically uses evolutionarybased algorithms to operate in a search space of heuristics or heuristic components (Burke et al., 2003). The approach presented in this paper belongs to the latter, where an evolutionary algorithm (EA), the so-called genetic programming (GP), which represents a special version of a genetic algorithm (GA), is proposed to automatically evolve dispatching rules.
Optimization of production schedules can be performed to achieve a variety of objectives. Typical examples and frequently investigated performance measures include the makespan, tardiness, or number of tardy jobs (Graham et al., 1979). While many approaches for achieving a single objective were developed in the early stages of HH research [e.g., Dorndorf and Pesch (1995), Hart and Ross (1998), Dimopoulos and Zalzala (2001)], multi-objective optimization (MOO) problems are increasingly coming to the fore [e.g., Vázquez-Rodíguez and Petrovic (2010), Yu et al. (2020), ]. Since production schedules have an impact on a variety of objectives, and manufacturing companies usually do not focus solely on achieving a single goal, the simultaneous consideration of multiple objectives becomes increasingly important for the application in real industrial settings. One of the main challenges in MOO is to find compromise solutions that best meet the preferences of the decision-maker (DM).
Interactive approaches as a class of methods for dealing with MOO models are iterative algorithms where the DM can specify their preferences progressively in each step of the interactive procedure . A major benefit of interactive procedures is that the DM neither needs to determine their preferences (a priori) nor choose an appropriate solution out of a sometimes overwhelming set of options (a posteriori). Instead, the DM can learn more about the prob-lem, such as the trade-offs between conflicting objectives and the possible limitations during the solution process.
To the best of the authors' knowledge, there is no existing approach that interactively incorporates the DM's preferences in the automatic evolution procedure of a dispatching rule that considers multiple objectives simultaneously. With the interactive multi-objective (MO-)GP-HH procedure proposed in this paper, we aim to fill precisely that research gap. In particular, we focus on the implementation of the reference point method making use of achievement scalarizing functions (ASF) in a GP-HH framework. To verify the developed procedure, we test its functionality on a benchmark job shop scheduling environment with the aim of minimizing the makespan, the total tardiness, and the total waiting time.
The remainder of this paper is organized as follows. In Sect. 2, we provide a comprehensive review of the use of HH to solve production scheduling problems. In Sect. 3, we introduce the basic concepts and methods of both MOO and HH for automatically evolving dispatching rules. In Sect. 4, we describe the job shop scheduling problem along with its notation and performance measures. In Sect. 5, we present the proposed interactive MO-GP-HH framework. In Sect. 6, we provide the design of the experiments to validate the developed approach, and in Sect. 7, we illustrate and discuss the results. In Sect. 8, we provide some conclusions and possible avenues for future research.

Literature review
The first attempts to optimize production schedules by HH can be traced back to the early 1960 s. A study by Fisher and Thompson (1963) examined the hypothesis that the combination of simple dispatching rules leads to better results than the individual application of these rules. The authors showed that a random combination is superior to the individual rules, and that it is possible to make use of probabilistic learning mechanisms. We had to wait until the 1990 s to see the topic taken up again, when some promising studies on the use of HH to solve complex scheduling problems were published. Since then, the topic has become increasingly popular, with a growing number of publications. In this section, we will present and discuss a selection of relevant pioneering works as well as recent publications on HH for solving production scheduling problems. A classification of HH that is widely used in the literature divides the methods according to the nature of the search space into heuristic selection methods and heuristic generation methods (Burke et al., 2019). The structure of our literature review is driven by the same taxonomic logic.

Heuristic selection methodologies
Heuristic selection methods search for a good solution by selecting existing heuristics. Instead of searching in the solution space, they operate on a set of low-level simple heuristics. In the scope of production scheduling problems, the low-level heuristics are usually represented by dispatching rules. These rules are used to assign a priority index to each job waiting in the queue. Hence, whenever a machine becomes idle, the job with the highest priority is selected to be processed next on that particular machine. One well-known and frequently investigated group of dispatching rules are simple priority rules, such as the earliest due date (EDD). For a comprehensive review of dispatching rules, the reader is referred to the surveys by Panwalkar and Iskander (1977) and Blackstone et al. (1982). Dorndorf and Pesch (1995) proposed a GA to perform a high-level search for a sequence of dispatching rules to minimize the makespan for job shop scheduling problems. For the job sequencing decision, they consider a set of 12 simple priority rules, including the shortest processing time. Experimental results showed that the developed heuristic performs better than two previously published heuristics.
Vázquez-Rodíguez and Petrovic (2010) proposed a heuristic selection HH method to solve MO job shop problems. They aimed to overcome the sensitivity of their previously developed dispatching rule-based GA to the number of operations that each dispatching rule handled. The authors proposed using the well-known EA, NSGA (Non-dominated Sorting Genetic Algorithm)-II, as the high-level algorithm. The developed approach simultaneously searches for the best sequence of dispatching rules and for the number of operations to be handled by each dispatching rule. The authors tested their approach on different MO job shop problems, with a resulting improvement in the hyper-volume on all instances relative to their previous HH and also compared with other conventional GAs. Hart and Ross (1998) introduced an HH to solve dynamic job shop scheduling problems. A GA was proposed to search for optimal solutions by iteratively selecting a simple heuristic out of a predefined set of 12 suitable priority rules at each step of schedule generation. In addition to the previously published HH, the developed approach is novel in that the genome encodes not only the choice of heuristic, which is used to select the next job out of the set of available jobs, but also the choice of algorithm to be used to generate the set of jobs. An experiment was carried out for different benchmark problems as well as for a variety of well-known objectives, such as minimization of the weighted flow time. The results showed that their algorithm outperformed other state-of-theart approaches in most of the test instances. Moreover, the authors found that decisions in the early stages of the sched-ule generation process have a higher impact on the schedule's performance than those in later stages. Norenkov and Goodma (1998) developed an HH (or heuristic combination method (HCM), as they called it) to solve multi-stage flow-shop scheduling problems. They introduced an implicit representation of schedules on the chromosome, where each stage of the genes does not represent the number of a job or machine but instead the number of a heuristic to be used to select a job at that stage of generating a schedule. The hybrid algorithm they proposed is a combination of a GA with an EA. They tested different settings of their algorithm on a well-known benchmark scheduling problem. The results showed that their algorithm is superior to the individual use of simple scheduling heuristics. Vazquez et al. (2007) discussed the principles of the search mechanism of HHs. They distinguished the search spaces between the associated and the original search space. The associated search space is defined by the combined set of dispatching rules, while the original space contains the set of feasible schedules. The authors showed that the size of the associated space depends on the number of decisions and the number of dispatching rules. Moreover, they introduced the notation of a decision block that is denoted as a set of decisions containing the same low-level heuristic and thus which can be treated as a single decision. The size of the associated space can be reduced by generating decision blocks. An experiment showed that the quality of the result for a given number of solutions used as stopping criteria was sensitive to the size of the decision blocks. Therefore, the design of the associated space was proven to have a relevant impact on the performance of an HH. Rodríguez et al. (2007) developed a meta-HH method to solve flexible flow-shop problems. To schedule the jobs, for the first stage they applied a meta-heuristic, and for the subsequent stages they selected from a set of simple scheduling heuristics by using a GA. They tested and compared the performance to the individual simple heuristics for different test instances and regarding different objectives. The results showed that the developed approach performed better than the simple heuristics for most of the tested objectives, and especially for all objectives together. Furthermore, the authors discussed the interaction of heuristics to examine whether some of the individually poorly performing simple heuristics could be discarded from the set. A simple example proved that a combination of poorly performing heuristics could lead to better performance than the combination of well-performing heuristics. Ochoa et al. (2009) carried out a landscape analysis of the heuristics search space to study the properties of the heuristics search spaces. Moreover, they formalized a notion of HH landscapes. A landscape analysis is conducted for different types of flexible flow-shop problems solved by a dispatching-rule-based HH. The main results revealed that the studied landscapes contain a large valley structure, which means they have a globally convex structure. Furthermore, the landscapes showed a large number of local optima and the existence of plateaus, where many local optima are located at the same level in the search space. In conclusion, the study found the HH search spaces to be "easy" to search. Yu et al. (2020) examined bi-objective flexible flow-shop scheduling problems with the aim of minimizing the total tardiness and the total setup time. Using EA at the high level, the authors proposed two different HH approaches. First, they solved the problem a priori by transforming it into a single-objective optimization (SOO) model using the DM's preferences for a scalarization of the objective function. In their second approach, they solved the problem a posteriori, meaning that they used the well-known NSGA-II algorithm to find a set of non-dominated solutions from which the DM can choose an appropriate solution after running the program. A numerical experiment was carried out, revealing that both methods have some drawbacks. The challenge in the a priori method is the correct determination of user preferences, which are usually difficult to elicit from the DM and generally imprecise due to a lack of knowledge about possible trade-offs or imprecise information. To overcome this issue, the a posteriori approach can be used to present a set of different non-dominated solutions to the DM, who can then have a deeper understanding of the solution space and choose the solution that best fits their preferences. Nevertheless, the authors identified that the Pareto front observed by the NSGA-II may be worse in certain specific regions compared with some of the single decoding algorithms. The reason for this lies in the compromise between the two parameters "width" and "depth" that needs to be chosen to achieve an efficient allocation of the computing resource. Lin et al. (2017) proposed an HH to solve distributed assembly permutation flow-shop scheduling problems. At the high level, the authors employed a backtracking search algorithm to operate on a set of 10 simple low-level heuristics in order to minimize the makespan. They also applied an efficient encoding and decoding procedure to generate feasible schedules. The effectiveness and efficiency of the developed algorithm were then evaluated on small and large benchmark test instances. The results revealed the superiority of their algorithm over other state-of-the-art methods, where for many of the investigated instances a new best solution could be found. Song and Lin (2021) investigated the distributed assembly permutation flow-shop scheduling problem with sequencedependent setup times. At the high level of the introduced HH algorithm, they proposed a GP algorithm to operate on a set of 10 predesigned low-level heuristics. The developed algorithm iteratively evolves to new heuristic sequences, which are then applied to the search space, in order to find successively better solutions. Moreover, they embedded simulated annealing into the low-level heuristics to enhance the local search performance. Experimental results on a total of 540 benchmark test instances demonstrated the effectiveness of the proposed algorithm, where for 333 cases a new best solution could be obtained.

Heuristic generation methodologies
Heuristic generation methods are used to search for a new heuristic that is generated from the components of existing heuristics. Compared with heuristic selection methodologies, where the search space contains a set of simple dispatching rules, the search space of heuristic generation approaches includes a set of terminals and functions. Terminals of a production scheduling problem typically represent important parameters, such as the due date and processing time. Functions that are often used include the four basic arithmetic operators addition, subtraction, division, and multiplication.
An early heuristic generation approach to solving a singlemachine shop was introduced in Dimopoulos and Zalzala (2001). The aim was to evolve a dispatching rule that challenged human-made dispatching rules. The authors proposed to employ GP for the evolution of a new dispatching rule that would be responsible for the sequencing of jobs. The developed method is applicable to any other scheduling problem, assuming that the training phase of the rule is carried out correctly. Moreover, a combination of the GP with local search algorithms was presented to increase the performance of the algorithm. Nine different dispatching rules were evolved from nine different training sets. The evolved heuristics were then compared with other previously developed algorithms, with the result that in most cases, the developed HH was able to find a dispatching rule that outperformed the former. Geiger et al. (2006) presented a system to automatically learn effective dispatching rules for a given environment. A combination of an evolutionary learning mechanism with a simulation model of the industrial facility of interest was proposed to evolve and evaluate the rules in an iterative manner. An experiment was carried out wherein the training set size was identified as a critical parameter. Different training set sizes were tested, and a large number were revealed to be worse in terms of the generality of the identified rule. It was observed that 10 instances in the training set and the test set led to robust dispatching rules. The scalability to other shop configurations was also demonstrated, and an example was illustrated.
Abednego and Hendratmo (2011) proposed a GP-HH to evolve a scheduling priority rule in order to solve dynamic production scheduling problems for a single-machine shop configuration. The aim was to minimize the mean tardiness and the mean flow time. The authors proposed a weightedsum approach to solving a bi-objective optimization problem, where the weights are assumed to be equal for both objec-tives. Applying the developed approach to a real-world problem from the metal industry revealed the efficiency of their method. Results showed that their approach outperformed six human-made heuristics under comparison for each of the five investigated machines. Jakobović et al. (2007) developed an HH that evolves scheduling heuristics for a parallel machine environment. The authors proposed the use of GP to generate a suitable priority rule. A priority rule based on 100 test cases was evolved and then tested and compared with other existing and well-known scheduling heuristics. In their experiment, different cases regarding performance criteria and static or dynamic job arrivals were tested. The results showed that the authors' evolved scheduling heuristic outperformed the traditional approaches in most cases. Masood et al. (2017) investigated MO job shop scheduling problems. They proposed the use of GP and the NSGA-III algorithm to evolve efficient dispatching rules. The authors also found that the uniform distribution of the reference points, as is usually applied by the NSGA-III, is not perfectly tuned to the discrete and nonuniform Pareto front in job shop scheduling problems. Therefore, the authors introduced a reference point adaption mechanism based on particle swarm optimization to reduce the useless reference points and improve the performance. Experimental results showed that the developed adaption of the reference points led to a significant reduction in useless reference points, as well as improved performance in terms of hyper-volume and inverted generational distance of MO job shop scheduling problems. Ho and Tay (2005) developed a GP to generate effective dispatching rules with the aim of minimizing the total tardiness of flexible job shop problems. By combining terminals such as the release date of a job, with functions such as addition, a GP tree can be identified. To generate good dispatching rules to solve a particular problem, first, training needs to be carried out. This is why this approach applies to a large variety of scheduling problems. An experiment was carried out where five generated dispatching rules were compared with selected human-made dispatching rules. Tay and Ho (2008) extended their previous work to multiple objectives. The MO problem was transformed into a single-objective problem by using the weighted-sum approach, without taking the DM preferences into account. An experiment was presented in which dispatching rules were generated and compared with other human-made rules. The performance was first evaluated for each objective individually, and then the overall performance with respect to the weighted sum was shown. Moreover, a sensitivity analysis was carried out. One of the main conclusions identified by the authors was the need for incorporation of preference information in the MO approach in order to efficiently find a "good" compromise solution. Pickardt et al. (2013) proposed a two-stage HH to solve a complex dynamic job shop scheduling problem. The aim is to generate a set of work-center-specific dispatching rules in order to minimize the mean weighted tardiness. In the first stage, a GP algorithm was used to evolve a problem-specific composite rule from basic job attributes. In the second stage, the evolved rules were assigned to the work centers by applying an EA. The developed two-stage approach was then tested and compared with the two developed HHs individually and some benchmark rules. The test instance was a real-world problem from the semiconductor industry. Results showed that the two-stage approach outperforms the GP and EA individually as well as any of the benchmark rules under investigation. Moreover, the authors carried out a sensitivity analysis proving the robustness of the resulting rules sets against different operational variables such as the arrival rate of jobs. Park et al. (2018) investigated and carried out a systematic analysis of four popular ensemble combination schemes for GP-based HH approaches to solve dynamic job shop scheduling problems. Recent research has shown that ensemble GP-HH algorithms are more robust than traditional approaches for evolving single dispatching rules. Moreover, the authors established new measures to evaluate the behaviors of evolved ensembles. Based on these measures, an experiment was carried out to test four different combination schemes in order to minimize the weighted tardiness. For that purpose, the authors used an existing discrete-event simulation model proposed by Hunt et al. (2014). The main results of their study revealed that the linear combination scheme generally performed better than the other approaches under investigation and that the weighted schemes led to worse results than the unweighted ones.  proposed four MO HHs based on GP to solve dynamic flexible job shop scheduling problems to minimize the mean weighted tardiness, maximum tardiness, and mean flow time. The developed method simultaneously evolves job sequencing and machine assignment decision rules. For every generation in the population, a discreteevent simulation is used for evaluation with respect to the aforementioned performance measures. Based on that fitness evaluation, the NSGA-II or Strength Pareto Evolutionary Algorithm 2 (SPEA2) is then applied, and a new population is consequently built. Experimental results confirmed the efficiency, where the developed approaches outperformed 320 human-made scheduling techniques in most of the test scenarios. Moreover, the authors demonstrated that their method finds the Pareto front more comprehensively than the compared techniques.
As an extension to the previous work, Zhou et al. (2020) investigated different coevolutionary techniques, such as the cooperative coevolution GP, and introduced a surrogate simulation model to improve the fitness evaluation process. The main focus of this paper was to examine the impact of simulation configuration on the performance of the proposed HH algorithms. As one of the main experimental results, the authors showed that the simulation parameters of the discrete-event simulation model, such as the warm-up time, have a significant impact on the test performance in the evolutionary generation of scheduling policies. Furthermore, the implementation of a surrogate evaluation model led to a relevant improvement of the performance of evolved scheduling policies as the convergence speed of the algorithm was accelerated. Finally, the effectiveness and robustness of the proposed method were proven by carrying out a comparison experiment and sensitivity analysis.

Summary
In order to provide a clear classification of the reviewed literature, a summary is presented in Table 1. The discussed approaches are categorized according to their nature of search space and their number of objectives. The respective objectives examined by the authors are also marked. A more comprehensive overview of heuristic selection and generation methods is given in Burke et al. (2013), Drake et al. (2020), and Branke et al. (2016). In addition to scheduling problems, these authors include reviews in other domains and areas of application.

Literature gap
Despite the existence of numerous publications on HH algorithms for solving different shop configurations of production scheduling regarding a single objective, there are very few approaches considering the optimization of multiple objectives simultaneously. MOO problems usually have no unique optimal solution; instead, they contain a set of mathematically incomparable solutions, so-called Pareto optimal solutions. To find the optimal solution among them, DM preference information is required. In MOO literature, there exist different ways of approaching this challenge. Some methods ask the DM to articulate the preferences prior to the actual optimization task. Other methods aim to determine the best possible representation of the Pareto front to enable the DM to choose the most appropriate solution from a set of non-dominated solutions after running the optimization task. Furthermore, there is another category of so-called interactive procedures, which contain an iterative exchange of information with the DM to find a solution that best corresponds to their preferences. To the best of the authors' knowledge, there is currently no method that interactively guides the DM through the decision-making process by iteratively taking into account their preferences in order to evolve appropriate dispatching rules to solve MO job shop scheduling problems. Some of the reviewed works applied MO non-sorting GAs, such as the NSGA-II or NSGA-III, to find a good approximation of the Pareto front (Masood et al., 2017, Vázquez-Rodríguez & Petrovic, 2010, Yu et al., 2020. Another way to approach MO scheduling problems through the use of HH that was found in the literature is to transform the objective functions into a single-objective function by linear scalarization techniques, such as the weighted-sum approach (Yu et al., 2020;Tay & Ho, 2008;Abednego & Hendratmo, 2011). A classification of the reviewed MO articles with respect to their type of solving method is summarized in Table 2. Nevertheless, none of the reviewed articles addressed the challenge of how to incorporate the DM's preferences into the solving process of MO production scheduling problems. To enhance the applicability of MO HH approaches for real-world cases, the development of an interactive MOO method thus represents a promising area of research to find the "best" compromise solution according to the DM's preferences. This paper contributes to closing this knowledge gap by developing an interactive MO-HH-GP framework based on the reference point method. Our decision support system (DSS) aims to guide the DM through the whole decision-making process, where the interactive aspect of our procedure enables the efficient exploration of the objective space and the identification of a variety of non-dominated solutions.

Theoretical background
In this section, we will first introduce the general mathematical formulation and fundamental concepts of MOO. In addition, a brief overview of methods for solving MOO is presented, and a particular method, namely the ASF approach, is described in more detail. The second part of this section is devoted to the basic concepts and design of HH approaches to solve production scheduling problems, with a special focus on evolutionary-based algorithms and the automatic evolution of dispatching rules by GP.

Multi-objective optimization
MOO models represent an extension of SOO models, where instead of one objective function, a set of q ≥ 2 objec- Each objective function can individually embody either a maximization or minimization problem. Without loss of generality, we assume each objective to be simultaneously minimized in the remainder of this paper. Supposing we consider a problem containing o decision variables, the o-dimensional decision vector x = (x 1 , . . . , x u , . . . , x v ) represents a feasible solution in the decision space R u , which is denoted by X . For each decision vector x, there exists a representation in the feasible region F of the objective function space R q  (2010) The mathematical formulation of a MOO model where all objective functions are to be minimized is presented in (1). (1)

Dominance and efficiency
Instead of a single optimal solution as is the case for optimization problems with a single-objective function, MOO models usually contain a set of mathematically equally good solutions. Before we can derive the definition of an efficient solution, it is necessary to understand the concept of dominance.
Definition 1 (Dominance) An outcome vector in the objective space, f , is said to dominate another outcome vector in . , q and f k < f k for at least one k.
A non-dominated outcome vectorf ∈ F represents a feasible outcome vector if and only if there exists no other outcome vector f ∈ F such that f ≤f . Otherwise,f is a dominated outcome vector. The set of all non-dominated outcome vectors is denoted by define the lower and upper bound of the set of non-dominated solutions in the feasible region of the objective space. Its components can be calculated by minimizing or maximizing each objective function individually, respectively. Finally, the definition of efficiency can be derived as follows: Definition 2 (Efficiency) A feasible solution, x * ∈ X , is called efficient if there exists no other feasible solution, The set of all efficient solutions is denoted by E(X ).

Solving multi-objective optimization models
Because optimization problems with multiple objectives contain a set of efficient solutions, a single solution that represents optimality in a mathematical sense is usually not given. Instead, each solution of the set of efficient solutions embodies a viable compromise between multiple conflicting objectives. Which of these solutions corresponds to the most adequate choice depends on the preferences of those who have to make the decision. Therefore, the DM must be somehow incorporated in order to find the compromise solution that best meets their needs. Problems of this kind are also known as multiple-criteria decision-making (MCDM) problems. Methods for solving MOO problems are classified as non-interactive or interactive according to the role of the DM in the solution process (Miettinen, 2008). Whereas in non-interactive methods the DM's preferences are articulated before (a priori) or after (a posteriori) solving the model, the DM is directly involved in the solution process in interactive methods. The continuous exchange of information enables the DM to specify their preferences in each iteration and thus to focus the search on promising regions of the Pareto front. A frequently applied method for interactive procedures is the so-called reference point method. Making use of ASF, this method guides the DM through the exploration of the objective space. A variety of efficient solutions can be discovered interactively and iteratively, with the aim of finding a compromise solution that satisfies the DM's expectations. Since this method is implemented in our proposed interactive MO-GP-HH framework, its key elements, the ASF, are introduced in the following subsection.

Achievement scalarizing functions
One possible and frequently applied scalarizing technique for transforming multiple-objective functions into a singleobjective function is the ASF approach. First introduced by Wierzbicki (1980), ASF appears to be particularly suitable for implementation in the reference point method. A reference point can be represented by either the ideal outcome vector or any other point in the feasible or infeasible region of the objective space. In interactive procedures, a reference point is usually selected by the DM and represents their aspiration level. The desired objective function values are denoted by f 0 k for all objective functions k = 1, . . . , q. The resulting objective vector is called a reference point. ASF can then be used to compute the non-dominated solution that minimizes the distance to the reference point. A widely used metric for measuring the distance is the so-called Chebyshev (L ∞ ) norm. Using the L ∞ norm, the distance between an outcome vector f and a reference point f 0 is represented by the maximum difference component for all objectives k = 1, . . . , q: Moreover, different scale factors can be introduced in order to assign different "weights" to multiple objectives k = 1, . . . , q. For the weighted L λ ∞ metrics, the distance is defined as where λ k ≥ 0 represents the "weighting" for the objectives k = 1, . . . , q. These values can be either a preferential parameter or a purely normalized coefficient. The resulting vector λ = (λ 1 , . . . , λ k , . . . , λ q ) is called the scaling coefficient or weighting vector. For the interactive MO-GP-HH procedure proposed in this paper, the following ASF represents the starting point: where the so-called augmented coefficient ρ is an arbitrarily small positive scalar that is introduced to avoid weakly efficient solutions. By minimizing the ASF (4), we can obtain a non-dominated solution with the shortest distance to the reference point f 0 and with respect to the weighted Chebyshev metric L λ ∞ : The following two properties can be proved for a given reference point f 0 (Steuer, 1986): 1. If x * = arg min x∈X σ ( f , f 0 , λ, ρ), then x * is an efficient solution.
2. If x * is an efficient solution, then there exists a function σ ( f , f 0 , λ, ρ) such that x * is a (global) optimal solution of the achievement problem given in Model (5).
Different non-dominated solutions can be obtained by carrying out a parameterization of either the reference point or the weighting factor. Because of this characteristic, the reference point method often uses ASF to solve MOO problems interactively.

HH in production scheduling
HHs typically make use of EA as the high-level algorithm to carry out stochastic and iterative search procedures. Instead of searching directly in the search space of solutions, as is the case with most metaheuristics, they operate in a search space of heuristics or heuristic components (Burke et al., 2003;Ross, 2005). In this paper we will focus on the latter. The general procedure or conventional EAs are inspired by natural phenomena and start with a randomly or heuristically generated set of candidate solutions (individuals) that represents the initial population. During each iteration (generation), the individuals are decoded in order to evaluate their performance (fitness). According to the Darwinian evolutionary theory survival of the fittest, solutions are probabilistically selected for future generations based on their performance. Subsequently, new solutions (children) are generated by recombining (crossover) two of the selected solutions (parents) and/or adjusting an individual's components (mutation). As in nature, where individuals increasingly adapt to their environment over generations, better solutions can be evolved with EAs, step by step, performing a variety of iterations. For an overview of different evolutionary-based algorithms and a detailed introduction, the interested reader is referred to Bäck (1996) and Eiben and Smith (2015). The evolutionary process is visualized as a cycle diagram in a simplified way in Fig. 1. For the implementation of the HH in this article, an extension of an evolutionary-based GA, the so-called GP, is used to automatically evolve dispatching rules that best meet the DM's preferences. Therefore, the basic concepts of GP are introduced in the following.

GP-based evolving of dispatching rules
GP is a special type of GA and belongs to the family of evolutionary computation methods. The term "genetic programming," which was established by Koza (1992), arose from the combination of the concepts of evolutionary computation and automatic programming. In contrast to conventional EAs, where the length of candidate solutions is usually fixed, GP allows individuals to vary in size and logical structure . Therefore, GP is particularly suitable for automatically evolving computer programs or mathematical formulas of variable length, such as dispatching rules. In GP, an individual is composed of functions and terminal primitives that are encoded in the form of a rooted tree. The selection of the set of primitives is generally problem-specific and needs to be defined a priori, while the size of the set is crucial for the performance of the algorithm, and therefore the selection should be made carefully and with regard to the requirements of the problem at hand. In production scheduling problems, typical terminals are operational parameters that might have an impact on the performance of the schedule, such as the processing time (PT), due date (DD), and remaining processing time (RPT) of a job. The set of functions applied to production scheduling problems usually consists of simple rational, arithmetic, and conditional operators or functions, such as multiplication (MUL), subtraction (SUB), and division (DIV). For a deeper introduction to GP, the interested reader is referred to the book series by Koza (Koza, 1992, 1994, Koza, Andre, Keane, & Bennett, 1999, Koza et al., 2005 for the concepts of GP in general, and to Burke et al. (2009) for the implementation of GP in an HH framework.

Encoding of individuals
Computer programs generated by GP are usually encoded as a tree structure and can be illustrated in the form of a parse tree, where the nodes represent a combination of functions (internal nodes of the tree) and terminals (leaf nodes of the tree) . For production scheduling problems, each individual embodies a dispatching rule that is used to calculate a priority index for jobs waiting in the queue whenever a machine becomes idle. In Fig. 2, an example of such a dispatching rule is presented as a logical expression, with its prefix notation, and in the form of a parse tree. The

Parse Tree Prefix Notation Logical Expression
Decoding Encoding Fig. 2 Encoding and decoding of an individual evolutionary process within GP operates on the encoded tree form of individuals. To evaluate the fitness of each individual, the representation of its chromosomes needs to be decoded to its mathematical expression. Priority indices can then be calculated at each decision point and a schedule can be created step by step. Finally, performance measures such as the makespan can be derived and used for the probabilistic selection of the fittest individuals for future generations. Because of its dynamic nature, the discrete-event simulation is a suitable and widely used method for carrying out the evaluation of dispatching rules in the manufacturing environment (Pickardt et al., 2013).

Reproduction operators
Similar to conventional EAs, GP employs two types of operations, namely crossover and mutation, to reproduce individuals at the end of each generation of the evolutionary process. The crossover operations start by probabilistically selecting two individuals (parents) from the current population according to their fitness. Then, by randomly selecting a crossover point at each parent's tree, the subtrees for the crossover operation are defined. The position of these subtrees is then exchanged between the two parents at the defined crossover point, resulting in two new individual trees (children). For the mutation operation, first, an individual from the current population is probabilistically selected based on its fitness. Subsequently, a cutoff point is randomly chosen, and the corresponding subtree is replaced by a new subtree, which results from a random combination of terminals and functions. Note that both operations produce syntactically valid offspring only with respect to the user-defined size constraints and grammar of the program. Usually, the crossover operation is considered to be the significant element of GP and is therefore assigned with a 90% probability in most implementations found in the reviewed literature (see, e.g., Abednego & Hendratmo, 2018, Park et al., 2019, Zeiträg, Figueira, Horta, & Neves, 2011, Y. Zhou & Yang, 2020, Y. Zhou et al., 2022, Y. Zhou, Yang, & Zheng, 2019. Moreover, the mutation operation is frequently treated as a secondary operation. However, it can be stated that mutation adds diversity to the evolutionary process and thus improves the exploration of the search space and prevents the convergence to local optima. An illustration of the two described reproduction operations is shown in Fig. 3.

Problem formulation
The job shop scheduling problem (see, e.g., Pinedo, 2012) contains a set J = {J 1 , . . . , J i , . . . , J n } of independent jobs. Each job J i , for all i = 1, . . . , n, consists of several sequential operations, O i1 , . . . , O i j , . . . , O im , that need to be processed in a predefined sequence. Given a set U = {M 1 , . . . , M g , . . . , M } of machines, each operation O i j for all i = 1, . . . , n and j = 1, . . . , m is assigned to a particular machine and has a processing time denoted as p i j . Each job has a release date r i and due date d i for all i = 1, . . . , n. Furthermore, a feasible solution to the problem at hand must comply with the following assumptions: -Preemption of operations is not allowed. (Once an operation has been started, it cannot be interrupted by another operation until its processing has completely been finished.) -Each machine can only execute one operation at a time.
-All machines are continuously available and ready for operation whenever they become idle. -The due date of each job is binding and predefined.
-The order of each job's operations is predefined and invariant.

Model notation
The indices, parameters, and variables for the mathematical formulation of the MO job shop scheduling problem are denoted as follows: Indices: i: Index of jobs, i = 1, . . . , n j: Index of operations, j = 1, . . . , m g: Index of machines, g = 1, . . . , .

Performance measures
For the problem at hand, a selection of the most common objectives in job shop scheduling problems represents the performance measures that are used to evaluate the fitness in the evolutionary generation of preference-based dispatching rules. The definition, an economical interpretation, and the mathematical formulation of the selected performance measures are provided in the following. Please note that the selection of the performance measures has no impact on the developed interactive MO-GP-HH framework since it is designed to work in a generic manner. However, a selection has been made to illustrate the functionality in the remaining sections of this article.
1. Makespan. The first and most common objective in job shop scheduling, as defined by Expression (6), is to minimize the makespan, that is, the equal to the completion time of the last scheduled job. This metric can be used to measure the utilization of machines and represents from an economical point of view a meaningful indicator for efficient production processes.
2. Total tardiness. The second objective is to minimize the total tardiness. This performance measure can be used as a "good" metric to represent the service quality provided to the customers. In a global environment of increasing competition, customer satisfaction is becoming more and more crucial to ensuring future business success. Therefore, meeting delivery dates is of significant importance.
3. Total waiting time. This performance measure indicates how long a job remains on the shop floor from the beginning of the processing of its first operation until the job's completion time. This objective is related to the work-in-process inventory and is associated with its costs. According to modern lean manufacturing philosophy, keeping the level of inventory in process as low as possible is highly desirable.

Framework
The proposed interactive MO-GP-HH framework consists of three interconnected modules: the interactive MO-GP-HH procedure, the GP-HH-based dispatching rule generator, and the simulation-based fitness evaluation. The course of the proposed method as well as the aforementioned modules and how they are connected can be seen in Fig. 4. In the following subsections, each module will be described in more detail.

Interactive MO-GP-HH procedure
Inspired by Zeiträg (2021) and based on the main steps for a general interactive DSS for MOO defined by Miettinen et al. (2008), the proposed interactive MO-GP-HH procedure contains three main phases: initialization, exploration, and decision phase. The procedure will be briefly described in the following. For more details, interested readers are referred to Zeiträg (2021). In the first step of the procedure, a sample of reference points is generated to identify a set of nondominated solutions that provides the DM a "good" overview of the non-dominated region and possible trade-offs between the competing objectives. For that purpose, several reference points can be determined over a uniform distribution on a hyperplane between the approximated ideal and nadir points (see, e.g., Figueira, Liefooghe, Talbi, & Wierzbicki, 2010). In the next step, a dispatching rule is evolved that minimizes the distance of its performance measures to the corresponding reference point's values by running the GP-HH-based dispatching rule generator (see Sect. 5.2). Operating on a processor with multiple cores, the GP-HH-based dispatching rule generator can be run in parallel for multiple reference points (Figueira et al., 2010). In the final step of the initialization phase, the obtained solutions are presented to the DM, who then can decide whether they want to explore new solutions or whether one of the identified solutions already satisfies the their aspirations regarding all performance measures simultaneously. If further exploration is desired, the interactive procedure continues with the exploration phase.
In the first step of the exploration phase, the DM has the ability to select from the two following exploration options: 1. Add aspiration level to avoid undesirable regions of the objective space 2. Intensification for exploring promising areas.
For both of the provided options, the DM first needs to choose a solution out of the set of generated non-dominated solutions from the previous phase. The DM can then decide to either add an aspiration for a particular performance measure or explore the surrounding of the investigated solution. Assuming a solution performs well in all dimensions except one, the DM can try to "tune" the unsatisfying performance measure by adding an aspiration level and running the GP-HH-based schedule rule generator again (see Sect. 5.2). Moreover, the DM can add an acceptance level if they notice that one of the performance measures of the selected solution exceeded a certain level of expectation and could therefore be debased to a defined limit. In a mathematical sense, the aspiration level represents an upper bound and the acceptance level is equal to a lower bound for minimization problems. The second exploration option provides the DM the opportunity to intensify the investigation of promising areas. This choice is particularly suitable for solutions whose performance measures correspond more or less to the DM preferences in all dimensions. Under predefined variation of the scaling coefficient vector of the ASF, new solutions can be found that are placed in the surrounding of the promising solution by running the GP-HH-based dispatching rule generator (see Sect. 5.2). The newly evolved dispatching rule(s) and its corresponding performance measures can be subsequently evaluated by the DM for both exploration options, while a solution can either be kept for further investigations or be discarded if its performance is not acceptable. The exploration process can be repeated until no further interaction is desired by the DM. The iterative exploration procedure enables the DM to identify solutions that could have been remained hidden with classical a priori or a posteriori methods. Furthermore, the interaction with the DM leads to a deeper understanding of trade-offs of objectives. This learning effect facilitates the identification and articulation of the DM's preferences throughout the procedure. In the final phase of the interactive MO-GP-HH procedure, a decision must be made as to which of the obtained solutions (if any) best meets the DM's preferences. Therefore, a pairwise comparison of all solutions is carried out, where in each iteration one of the compared solutions should be eliminated. The pairwise comparison is repeated until a single solution is left. The final solution is supposed to represent a "good" compromise between all the assessed competing objectives. If the DM is satisfied with the final solution, the interactive procedure is finished. Otherwise, the procedure can be restarted, and different solutions might be obtained by varying the parameters.

GP-HH-based dispatching rule generator
The actual optimization task takes place in the GP-HH-based dispatching rule generator. In a stochastic search procedure,

Simulation-based fitness evaluation
For evaluating the fitness of the individual dispatching rules generated in the evolutionary process, we propose the use of discrete-event simulation. Because of the inadequacy of analytical techniques such as the queuing theory (see, e.g., Blackstone et al., 1982), particularly for more complex shop configurations or under consideration of dynamic events, simulation has become the predominant method for evaluating scheduling problems and is thus the most common method in the literature.
In the proposed discrete-event simulation, the job shop problem at hand is first loaded with all the necessary information, such as the processing time of each operation or the due date of each job. For more detail on how the training and test instances are randomly generated, see Sect. 6.1. After collecting the necessary information, the simulation run can then start with the dispatching of operations. The priority index for each of the waiting operations is calculated by using the dispatching rule from the examined individual in its decoded form. As soon as a machine becomes idle, the operation with the lowest priority index is selected from the waiting queue. The respective machine is then blocked for other activities for the period of the processing time of the scheduled operation. When the processing time expires, the next operation of the respective job is released. Moreover, the machine returns to the idle state and is available for the next operation in the queue with the next-lower priority index. This procedure is repeated until all operations have been scheduled and all jobs have been completed. The performance measures defined in Sect. 4.2 are calculated and saved for the current simulation run. The simulation is repeated until the maximum number of defined simulation runs is reached. To calculate the fitness of the tested dispatching rule, the mean values of the performance measures of all simulation runs are finally determined and returned to the GP-HH-based dispatching rule generator. This module can then calculate the final fitness by inserting the performance measures into the ASF. In this context, the fitness value represents the distance to the observed reference point with respect to the weighted Chebyshev metric.

Experimental design
To demonstrate the functionality of the proposed interactive MO-GP-HH framework, we developed an appropriate experiment. In the following subsections, the generation of the set of test instances, the terminal and function set for composing a preference-based dispatching rule, and the fitness function and the GP parameters are presented. Since the data used for the experiment are taken from benchmark job shop problems, we simulate the interaction with a fictional DM.

Test instance
In job shop literature, there are many different sets of benchmark problems available that are designed to measure the performance of an optimization method (Taillard, 1993;Demirkol et al., 1998). Typically, the application of dispatching rules is known to be more effective and therefore more prominent for job shop scheduling problems in a dynamic environment. However, because of their lower complexity, it can make sense to first introduce newly developed algorithms on static job shop scheduling in order to analyze and validate their functionality (Masood et al., 2016). To evolve a dispatching rule that meets the DM's preferences, we propose using a set of 10 randomly generated training instances for a medium-sized job shop environment that contains 15 machines and 20 jobs. Each job consists of 15 operations (one operation on each machine) which need to be processed for a given duration (processing time) and in a predefined order. The attributes of the jobs and operations for each instance are generated according to the procedure proposed by Taillard (1993). Therefore, each operation is randomly assigned to one of the machines, without allowing a machine to be revisited. The processing time for each operation follows a discrete uniform distribution between 1 and 99.
Since the original dataset developed by Taillard (1993) was designed to measure the makespan, the test instances do not include any due dates. To consider due date-based performance measures such as the total tardiness, we attach a due date for each job of the test instance using the due date assignment rule of Baker (1984): where parameter h indicates the due date's tightness. A value of 1.3 is a typical practice in research (see, e.g., Kreipl, 2000, Masood et al., 2009, H. Zhou, Cheung, & Leung, 2017 and is therefore also applied in this study. Moreover, the release date r i of all jobs i = 1, . . . , n is assumed to be zero in Taillard's instances, which means all jobs are ready to be processed from the beginning. The aim of the proposed interactive MO-GP-HH procedure is to generate a dispatching rule that is able not only to reach a certain compromise between objectives on a particular set of training instances, but also to reproduce a similar compromise for problems that are likely to occur in future applications of the examined job shop environment. Therefore, another independent set of 10 randomly generated instances are used to test the evolved dispatching rule and compare its performance to other well-known dispatching rules, such as the earliest due date first (EDD) or the shortest processing time (SPT).
A summary of all parameters used to generate the training and test instances in the proposed MO-GP-HH framework is presented in Table 3.

Terminal set
The performance of a schedule might be influenced by a large number of operational parameters. Since the search space grows by an increasing number of terminals, their selection can be stated as a crucial process to ensure the efficiency of the proposed interactive MO-GP-HH framework. Moreover, the set of terminals should be kept sufficiently small in order to generate short and meaningful dispatching rules. Therefore, the most prominent and promising parameters for evolving effective dispatching rules from GP-HH literature (see, e.g., Branke et al., 2016) are chosen for application to the following experiment. The abbreviation and a brief description of the terminals are shown in Table 4. In addition, the terminals are categorized into job parameters, attributes that concern the work center, and global attributes.

Function set
As is common practice in the application of GP-HH methods, the function set of our experiment consists of the four basic arithmetic operators: addition, subtraction, multiplication, and division. Please note that the division function works as a protected division, which means that if the denominator is equal to zero, the value 1 will be returned. Otherwise, the result of the division of the two input values represents the output of the function. Furthermore, other mathematical operators such as "MIN" and "MAX" and logical operators such as "if-then-else (IFTE)" that have been revealed as viable functions when dealing with more complex performance measures such as tardiness are also included (Nguyen et al., 2017). Arity is set to 2 for all functions except for the "IFTE" function. The ternary operator requires three arguments, while the first value represents the condition. If the first argument is greater than or equal to zero, the second argument will be returned by the function and the third argument otherwise. An overview of the set of functions is given in Table 5.

Fitness function
Typical fitness functions in GP-HH scheduling methods include the performance measure to be optimized (for SOO, see, e.g., Dimopoulos and Zalzala (2001)), a scalarizing function of multiple performance measures (for a priori MOO, see, e.g., Tay and Ho (2008)), or multi-objective performance measures such as the hyper-volume ratio (for a posteriori MOO, see, e.g., Zhou and Yang (2019)). Since the proposed interactive MO-GP-HH framework is based on the concepts of the reference point method, the ASF (4) is used to calculate the fitness of a schedule. Therefore, the fitness of a schedule represents the maximum weighted deviation of the performance measures from a selected reference point. In general, the lower a schedule's value for the ASF, the better its fitness.

Parameter setting
The focus of our study is on validating the functionality of our developed interactive MO-GP-HH procedure to generate an effective dispatching rule that best meets the DM's pref-    erences rather than tuning its performance. Therefore, we propose selecting the most commonly used settings from the GP-HH scheduling literature (e.g., Zhou and Yang (2019); Pickardt et al. (2013); Dimopoulos and Zalzala (2001)).
However, experiments could be carried out to tune the efficiency and effectiveness by optimizing the set of terminals and functions as well as the GP parameters. The GP parameter used in our experiment is shown in Table 6.

Experimental results
The proposed interactive MO-GP-HH framework was implemented in Python, making use of an EA library called "DEAP" (Fortin et al., 2012) to design the GP algorithm for the GP-HH-based dispatching rule generator and a web app library called "Streamlit" (Teixeira et al., 2018) to create a sufficient interface for information exchange with the DM along with the interactive GP-HH procedure. The experiment was carried out on a standard laptop with an AMD Ryzen 5 3500U 2.10 GHz CPU and 8 GB RAM. The remainder of this section will follow the three main steps of the MO-GP-HH procedure (see Fig. 4) to evolve a preference-based dispatching rule on the training set of problems as introduced in the previous section. The final dispatching rule selected by the DM is then tested on an independent set of problems in order to investigate whether the selected rule is able to reproduce a similar compromise of a schedule's performance measures on unseen instances. Finally, the results will be discussed and compared with other benchmark dispatching rules from the literature.

Initialization phase
In the first step of the initialization phase, an approximation of the ideal and nadir points needs to be calculated. For that purpose, we use the GP-HH-based dispatching rule generator. Instead of evaluating the fitness based on the ASF, we propose directly evaluating a schedule's fitness by each of the performance measures individually. The program automatically starts six individual GP-HH runs to search, for each performance measure, one dispatching rule that minimizes and another dispatching rule that maximizes the corresponding performance measure. The same GP parameters (see Sect. 6) are used for each run. The results aim to express a good approximation of the ideal and nadir points. Please note that these points do not represent an ideal or nadir solution to the actual scheduling problem, but rather an approximation of the best-and worst-performing dispatching rule evolved by the GP for a certain performance measure. Since the developed MO-GP-HH framework operates exclusively in the space of a dispatching rule's components, it is obvious that the lower and upper bounds are portrayed in the same space. The solutions of the approximated ideal and nadir points can be seen in Fig. 5.
To generate an initial set of solutions that represent a wide range of the Pareto front, a uniformly distributed set of reference points will be placed on a hyperplane between the approximated ideal and nadir points. This phase aims to provide a sufficient overview of different alternatives to familiarize the DM with possible trade-offs and guide the DM's search toward promising regions. Moreover, a selection of settings can already focus the search on a preferred area in this step. On the one hand, the DM can determine the position of the hyperplane between the ideal and the nadir point, a value of 0.5 being suggested by default. On the other hand, due to the range of performance measures presented to the DM, undesirable areas can be excluded when searching for possible solutions. In our experiment, the DM decided to place the hyperplane slightly closer to the ideal point by choosing 0.4. Moreover, the DM desired not to limit the search space in the first step and therefore to search the whole objective space for potential solutions. Figure 5 shows the results from the first step of the initialization phase as well as the selection of the fictional DM for the generation of the first sample of potential solutions.
After determining the settings, the program automatically evolves a dispatching rule for each of the reference points by minimizing the distance with respect to the ASF. The weighting vectors are assumed not to favor any of the performance measures. Therefore, a normalization based on the approximated ideal and nadir points is proposed: where f nad k (x) represents the approximated nadir and f k (x) the approximated ideal solution for objective k = 1, . . . , q. The observed solutions are then checked for their efficiency. Since we are using an EA to evolve a good solution for the problem at hand, we cannot guarantee that this solution represents the exact optimum. Some of the obtained solutions might be dominated by others. Therefore, a simple compar-ison is carried out to identify the dominated solutions and discard them from the set of potential solutions. Please note that choosing for a larger number of generations or individuals in the population could reduce the likelihood of dominated solutions appearing. Finally, the relative performance of each of the remaining solutions based on their approximated ideal and nadir points is presented to the DM in the form of a bar chart (see Fig. 6). First impressions of possible trade-offs can be elicited by the DM, and promising solutions can be selected for further exploration.

Exploration phase
In this phase of the interactive procedure, the DM can select one of the previously presented solutions and can either limit the solution space by adding constraints in the form of aspiration levels or search the surroundings of the selected solution in order to find more suitable dispatching rules. Several iterations were carried out by the fictional DM, and a selection of two of them, one for each option, will be illustrated in the following.

Adding aspiration level to avoid undesirable regions of the objective space
The fictional DM decided to add an aspiration level to solution #4, since it performs very well for the total tardiness (6528.33 or 91.78%) and total waiting time (25338.82 or 91.81%) but does not fully satisfy the DM's expectations for the makespan (1714.21 or 72.73%). Therefore, a maximum allowed value of 1680 or 78.12% is defined as the aspiration level for the makespan. Executing the program starts the developed GP-HH-based dispatching rule generator. The parameter settings are adopted unchanged from the previous step. The same reference point is used as that used to determine the solution selected by the DM. Furthermore, the feasible objective space is limited to candidate solutions that are able to achieve a makespan of at least 1680 as defined by the DM. Therefore, candidates who exceed the maximum allowed value for the makespan are penalized during the GP run. For this purpose, the fitness of these solutions is debased by a relatively high value, as a result of which the probability of selecting these solutions for subsequent generations is significantly reduced. This automatically leads the search to solutions that have a minimum range of performance measures as desired by the DM. After the GP-HH-based dispatching rule generator finishes the pre-defined 200 generations, the final solution is added to the set of potential solutions, where again the performance of all solutions is compared in order to identify dominated solutions and discard them. In this example, the newly identified dispatching rule produces schedules on the set of training instances slightly better than solution #5 in all dimensions. Therefore, solution #5 is removed from the set of potential solutions and the bar chart overview is updated accordingly. The final dispatching rule observed by the GP-HH-based dispatching rule generator is then presented to the DM. Moreover, to compare its performance with that of the originally selected dispatching rule, a visualization in the form of a radar chart is provided. The DM can then decide whether they want to discard one of the two compared solutions or keep both of them for possible further explorations. In our example, a new solution was found that respects the DM's aspiration considering the makespan. However, the other two performance measures, the total tardiness and the total waiting time, have worsened slightly. The DM decided to keep both solutions for further exploration. The results are presented in Fig. 7.

Intensification for exploring promising areas
For this option, the DM selected solution #15 from the remaining set of potential solutions. As the performance of this solution shows good results in all dimensions, the surrounding area reveals itself to be a promising region worth exploring further. While the reference point remains the same, which also represented the basis for determining the selected solution, neighboring solutions are searched for by varying the weighting vector of the ASF. The DM can select how extensively the exploration should be carried out by choosing one of the following options: 1. Fast search: A total of three runs are executed; in each run, one of the three performance measures are favored by increasing the weighting by 1 3 of its original normalized value and by slightly decreasing the weightings of the two remaining performance measures by 1 6 of their original normalized values. 2. Extensive search: A total of six runs are executed; in addition to the three runs of the fast search, another three runs are carried out; and in each run two of the performance measures are simultaneously favored by increasing the weightings by 1 6 of their original normalized values and decreasing the weighting of the remaining performance measure by 1 3 of its original normalized value.
Selecting the second option, the program executes six runs for six different weighting vectors. The results are shown in Fig. 8. In total, six new solutions (solutions #26 to #31) could be identified in the surrounding of solution #15. One can clearly see that in each run, the performance measures of the solution obtained by the GP-HH-based dispatching rule generator increased slightly for the performance measure(s) that are favored due to the changed weightings in the ASF. Moreover, trade-offs between the objectives can be identified. For instance, to achieve better performance of the makespan in the first run (solution #26), the total tardiness deteriorates due to its competitive behavior with the makespan. Furthermore, the newly observed solutions are compared with each other in order to identify dominated solutions among them. In our example, solution #29 is dominated by solution #27 and is therefore automatically removed from the set of potential solutions for the next iteration of the exploration phase.

Decision phase
The aim of the last phase of the interactive MO-GP-HH procedure is to identify the solution that best meets the DM's preferences. For this purpose, we propose a pairwise comparison of the remaining potential solutions. This procedure starts by randomly choosing two solutions and subsequently asking the DM to select the preferred solution. To facilitate the decision, the comparison is visualized in the form of a radar chart (see Fig. 9). If the DM can clearly state which solution they prefer, the "losing" solution will be removed from the set of potential solutions. Otherwise, the DM can decide to skip the decision and continue with the next iteration, where again two solutions are randomly selected from the set of remaining solutions. The procedure is repeated until a single solution is left. If the DM is satisfied with the performance of the final solution, the interactive MO-GP-HH framework is finished. Otherwise, the DM can return to the exploration phase and search for new solutions. The final solution of our experiment is shown in the form of a parse tree and its prefix notation in Appendix (see Fig. 12).

Test performance and comparison
In this subsection, the performance of the final dispatching rule evolved by the developed interactive MO-GP-HH framework is tested and compared with other well-known dispatching rules on an independent set of instances. This evaluation aims to confirm that the evolved dispatching rule is able to achieve a similar compromise between performance measures for an unseen set of job shop instances. Moreover, a comparison with three other dispatching rules from the literature is carried out to illustrate the differences in trade-offs among performance measures. An overview of the dispatching rules for the test evaluation is provided in Table 7. Since these dispatching rules are usually designed to optimize a single objective or a combination of objectives, it is obvious that they cannot meet the fictional DM's preference in our example. Therefore, the schedules generated by these rules cannot compete directly considering the individual performance measures. However, a comparison can be used to illustrate that the evolved dispatching rule produces a compromise solution somewhere between the performance measures of the established dispatching rules with respect to the fictional DM's preferences. To guarantee a fair comparison of the dispatching rules, using their relative performance on the training set and test set is proposed. Therefore, the ideal and nadir points of the test set are approximated by using the GP-HH-based scheduling generator. The mean of the absolute value for the three performance measures over the 10 test instances can be computed by running the simulationbased fitness evaluation module. Normalizing these values will result in the relative performance that represents the basis for our comparison. The final results are presented in Fig. 10, where the relative performance of each of the tested dispatch-ing rules is shown for the training set on the left-hand side and for the test set on the right-hand side of the figure.
As expected, EDD performs well for the total tardiness and total waiting time on both sets of instances, while the relative performance of the makespan is the worst in our comparison. Moreover, MDD shows a slight improvement on all performance measures compared with the EDD. The relative performance of the makespan remains somewhat low. The best result for the makespan was achieved by using MTWR. However, without considering the jobs' due dates, this rule performs poorly for the total tardiness. Finally, solution #37 which results from our previous experiment supposed to represent the DM's preferences produces schedules with an aspired compromise between the makespan, the total tardiness, and the total waiting time. The performance on the set of test instances is very similar to the one on the training set. Small differences can be explained by the relatively small set of instances in the training set used for evolving the preference-based dispatching rule. Another factor that might have an impact on deviations between the two sets of instances is the approximation of the upper and lower bounds of each performance measure used as the basis to calculate the relative performance. Finally, it can be stated that the proposed interactive MO-GP-HH framework successfully guided the fictional DM to a dispatching rule that enables the production of schedules with respect to their expectations. Furthermore, the results of the tests and the comparison underline its usefulness and relevance, especially for real industrial applications, in which a compromise solution is often desired between competing goals.

Evaluation of the terminal set
As stated in Sect. 6.2, the selection of appropriate terminals can be seen as a crucial process in the design of the MO-GP-HH framework. While a too small number of terminals might decrease the quality of the evolving rules, a large terminal set usually has a negative impact on the framework's efficiency. In order to assess the adequacy of the set of selected terminals for the experiment conducted, we propose counting their occurrence in the last individual of each iteration of the experiment. Figure 11 shows the total occurrence of each terminal in the 37 evolved rules in descending order. PT can be identified as the most significant terminal as it occurs a total of 211 times, which is almost twice as often as the second most frequently occurring candidate. Another interesting finding is that SPT is of less importance, as it appears only 46 times, which is almost half of the second-weakest appearing representative and less than a quarter of the most commonly employed terminal. The total number of occurrences of the remaining terminals is between 79 and 112.
In order to determine whether a reduction of the present set would enhance the performance of the proposed MO-GP-HH framework, we exclude the less frequently applied parameter SPT from the set of terminals. Subsequently, the entire experiment from the previous Section is repeated. The 37 obtained solutions using the reduced set of terminals (reduced set) are then compared with the 37 solutions identified by the framework using the original set of terminals (full set). To compare the quality of the two sets, we propose the application of the hyper-volume (see, e.g., Zitzler, Thiele, Laumanns, Fonseca, & Da Fonseca, 2003), as it represents one of the most frequently used performance measures in MOO (see, e.g., Riquelme, Von Lücken, & Baran,2015, Zitzler et al., 2003. To calculate the hyper-volume, we used the approximated nadir point that was obtained in the first step of the proposed MO-GP-HH framework. The results of the evaluation are shown in Table 8. Therefore, the reduction of the terminal set leads to a slight improvement of the solution set with respect to the hyper-volume.

Conclusions
In this paper, we introduce an interactive MO-GP-HH framework to solve MO job shop scheduling problems. We propose the application of the reference point method to iteratively search for dispatching rules that best meet a DM's preferences. In interaction with the DM, information about the optimization problem, its limitations, and potential trade-offs between competing objectives as well as the DM's preference information is exchanged. In order to facilitate the interaction with the DM, the developed DSS was implemented in a web-based application, where it was tested for classical job shop scheduling problems based on Taillard's benchmark instances. In an illustrative example, a fictional DM was guided through the entire procedure. After presenting an initial set of well-distributed non-dominated solutions, the DM had the opportunity to explore new solutions on his own. A subsequent pairwise comparison of the performance of all discovered dispatching rules led to a final compromise solution. The evolved rule was tested on an independent set of instances, with the result that a similar compromise solution was achieved for unseen problems in the same environment. Moreover, a comparison with other benchmark dispatching rules from the literature underlined the usefulness and relevance of our approach, especially for applications under real industrial settings, where a compromise of different objectives is often desired. The component that consumes by far the largest share of the total computational time is the simulation-based fitness evaluation. The proposed interactive procedure requires a large number of simulations to find a representative set of potential solutions. In our example, a total of 370,000 individual simulation runs were carried out (37 solutions were identified; for each of them a population of 200 individuals was evolved over 50 generations). A promising way to reduce the number of required simulation runs and therefore significantly reduce the total computational time is by incorporating the interaction with the DM directly in a multi-objective evolutionary algorithm such as the NSGA-II. Successively updating the preference information after a certain number of iterations helps to focus the search on the relevant part of the Pareto front. Such methods have already demonstrated a significant reduction in the number of generations required to achieve an acceptable solution. In particular with regard to dynamic job shop scheduling problems, where typically a large number of jobs need to be simulated to robustly evaluate the performance of a dispatching rule, reducing the number of simulation runs is of utmost importance. The implementation of an interactive multi-objective evolutionary algorithm in the proposed framework and the resulting extended application to more realistic dynamic environments will therefore be further investigated by the authors.