Enhancing Constraint Programming via Supervised Learning for Job Shop Scheduling

Constraint programming (CP) is a powerful technique for solving constraint satisfaction and optimization problems. In CP solvers, the variable ordering strategy used to select which variable to explore first in the solving process has a significant impact on solver effectiveness. To address this issue, we propose a novel variable ordering strategy based on supervised learning, which we evaluate in the context of job shop scheduling problems. Our learning-based methods predict the optimal solution of a problem instance and use the predicted solution to order variables for CP solvers. \added[]{Unlike traditional variable ordering methods, our methods can learn from the characteristics of each problem instance and customize the variable ordering strategy accordingly, leading to improved solver performance.} Our experiments demonstrate that training machine learning models is highly efficient and can achieve high accuracy. Furthermore, our learned variable ordering methods perform competitively when compared to four existing methods. Finally, we demonstrate that hybridising the machine learning-based variable ordering methods with traditional domain-based methods is beneficial.


I. INTRODUCTION
C ONSTRAINT programming (CP) is a powerful technique for solving constraint satisfaction and optimization problems [1]. In a CP solver, variable ordering determines the order in which variables are branched on during the search process and has a substantial impact on the efficiency of problem solving [2]. A good variable ordering strategy should help a CP solver generate a relatively small search tree, hence solving problems more efficiently. However, there is a lack of theoretic understanding of what makes a good variable ordering, and empirically finding a good variable ordering is hard [3].
Existing variable ordering heuristics are mainly designed based on the intuition of human experts [4][5][6][7][8][9]. Recently, a deep reinforcement learning approach based on graph neural network was introduced to automatically learn variable ordering heuristics for solving constraint satisfaction problems [10]. In [11], a multi-armed bandits approach was proposed to select a good variable ordering heuristic from a set of existing ones for solving a given instance. These methods were mainly designed for constraint satisfaction problems, which focused on finding a feasible solution or proving that no feasible solution exists. In a more recent study [12], a genetic programming approach was developed to evolve variable ordering heuristics for solving a constraint optimization problem. This evolutionary learning approach aimed to automatically learn a variable ordering for CP to minimize the number of branches required to optimally solve a set of training problem instances.
In this study, we propose a supervised learning approach to automatically learn a variable ordering heuristic for solving job shop scheduling (JSS) problems [13]. JSS is a class of NPhard combinatorial optimization problems that often arise in manufacturing [13]. In a JSS problem, there are a number of jobs, each consisting of some operations to be processed on a number of machines. The aim of JSS is to schedule operations on machines such that constraints are satisfied and an objective such as makespan or total weighted tardiness is optimised (see Section II for a formal definition). JSS has attracted much attention in the research community [12,[14][15][16][17][18][19][20].
We propose that ordering variables based on the optimal solution of JSS is a reasonable approach for a CP solver, as it potentially leads to a high-quality solution in an early stage of the search and thus helps to prune the search space more aggressively. This motivates us to develop a supervised learning model to predict the optimal solution for JSS and use the predicted solution to order variables, in order to solve the problem more efficiently. We show that optimal solution prediction for JSS can be modeled as either a regression or classification task. The regression task aims to predict the optimal start times of operations based on the features extracted from JSS, while the classification task predicts, for each pair of operations in a JSS instance, which operation is scheduled earlier in the optimal solution.
We train the supervised learning models on solved JSS instances with known optimal solutions using several machine learning (ML) algorithms including support vector machine, multi-layer perceptron, and genetic programming. We show that our supervised learning models can achieve a high accuracy in predicting optimal solutions for JSS, and they are very efficient to train. When embedded in a CP solver to solve JSS instances, our learned variable ordering methods compare favourably to three traditional methods and one state-of-theart learning-based approach. Finally, we show that hybridising our ML-based variable ordering methods with a traditional domain-based method leads to overall better performance. The contributions of this paper are summarised as follows: 1) Developing two novel supervised learning models to predict optimal solutions for JSS with good accuracy; 2) Proposing a novel automatic variable ordering method based on optimal solution prediction to enhance CP; 3) Evaluating the efficacy of the proposed variable ordering methods using an extensive experimental study and showing that our methods are clearly competitive among the state of the arts; 4) Demonstrating the benefit of hybridising ML-based and domain-based variable ordering methods.

II. BACKGROUND AND RELATED WORK
A. Job shop scheduling The JSS problem requires a number of jobs to be processed on several machines such that an objective function is optimised. In this context, a job consists of a number of operations, and each operation must be executed on a specific machine. In JSS, the execution of operations on machines (routes) are fixed and can be different for different jobs [13]. In this study, the focus is on the static JSS problem, where a schedule for the shop (the working/manufacturing environment) must be determined, with the shop consisting of a set of M machines and N jobs. A job j has a fixed route through the machines, and the processing times are different at each machine it visits. In the notation of [21] this scheduling problem is J|r j |Obj where Obj is one of C max , T max or w j T j . The following notation is used to define the JSS. A number of jobs J = {1, . . . , j, . . . , N } are given, and job j has associated with it a release time r j , a due date d j , a weight w j and operations {o j1 , . . . , o jnj }, where n j is the number of operations of job j. The processing times of all operations of job j are time j = (p j1 , . . . , p jnj ), where p ji is the processing time of job j's i th operation. Moreover, the route for job j is route j = (m j1 , . . . , m jnj ), which specifies the sequence of machines that job j will visit, where m ji is a machine that will process job j's i th operation.
The following four sets of variables are also defined: • s ji is the starting time of job j's i th operation, • e ji is the end time of job j's i th operation, • C j is the completion time of job j, and • T j is the tardiness of job j: There are three different objectives considered for the JSS problem in this study. These objectives are typically seen in industrial problems and have been used as the most common performance measures in previous studies [22].

B. Constraint programming for JSS
In a CP solver, a problem instance is typically represented by a constraint network N W (X, D, C), where X is the set of decision variables, D is the set of domains (i.e., possible values that a variable can take) of the variables, and C is the set of constraints. In the context of JSS, the variables are start and end times of operations, completion time and tardiness of jobs defined before. The set of constraints are ∀j ∈ J : s j1 ≥ r j (1) ∀j ∈ J , i ∈ {1, . . . , n j } : e ji = s ji + p ji (2) ∀j ∈ J : Constraint (1) ensures the release times of each job are satisfied. Constraint (2) links the start and end time variables. Constraint (3) sets C j to be the completion time of the last operation of job j, and Constraint (4) computes the tardiness T j of job j. Moreover, there is no overlap between the operations on the same machine, which is implemented using the following disjunctive constraints: In other words, if two operations u and v of two different jobs need to execute on the same machine (i.e. m ju = m kv ), the start time of the first job must be later than the end time of the second job, or vice versa. There may also exist precedences between operations of a job: Finally, each of the three objective functions are implemented in the model as follows: 1) Makespan: let C max be the largest completion time across all jobs. In addition to the above model, the following objective and constraint are included: Note that due dates are ignored for the Makespan objective. 2) Maximum tardiness: let T max represent the maximum tardiness across all jobs. The following objective and constraint are added to the above model: 3) Total weighted tardiness (TWT): In addition to the above model, the following objective (minimise cumulative tardiness' across all jobs) is included: In solving the model, new restrictions are typically identified and incorporated into the system. The solver uses these new restrictions to automatically update the variable domains and potentially identify new restrictions. Specifically, the solver imposes a restriction by checking the domains of all variables for consistency. An outcome of this is that the solver indicates success if the domains are consistent, and if not, the solver is in a state of failure. If imposing a restriction or a set of restrictions is successful, the domains of all variables are updated and any inferred restrictions are imposed.
In CP terminology, the assignment of a variable to a value is called a labelling step. If such an assignment is made, it is tested for consistency with the variable's current domain. If the assignment is consistent, it is accepted, which leads to propagation of new constraints. Alternatively, the assignment may lead to an inconsistency, in which case a failure is triggered and the labelling step is rejected [23]. A key aspect of the search component in CP is the variable ordering strategy that guides the selection of variables for labeling. Devising an effective variable ordering strategy can lead to substantial gains in terms of identifying high-quality regions of the search space and better quality solutions more quickly. The focus of our study is to achieve an improved variable ordering strategy in CP via supervised learning, and we demonstrate this can be effectively achieved using JSS as a case study.

C. Methods for tackling job shop scheduling
Since the JSS problem is NP-hard, finding optimal or high quality solutions by conventional optimization algorithms still remains a challenge [13]. Hence, a number of optimization approaches have been proposed to tackle JSS. Commercial optimization solvers, such as CPLEX, have made significant advances in recent years, though solving large problem instances for JSS is not straightforward and scalability still proves to be a challenge. Ku and Beck [14] investigated the performance of four different mixed integer programming models for the JSS, and they found that good results for moderate sized problems can be achieved via a disjunctive model. Additionally, meta-heuristics have been an active area of research in JSS, where these methods are often able to find near-optimal solutions [15][16][17]. These methods have drawbacks due to a "combinatorial explosion", where for large instances, there is an exponential growth in the solution space. This makes it computationally challenging to navigate the search space (locally or globally) efficiently to determine high-quality solutions. Furthermore, due to the inherent nature of meta-heuristics, problem-domain knowledge (e.g. solution neighbourhood structure, schedule construction heuristics, etc.) is necessary to efficiently design an application specific algorithm. This is a difficult process often requiring a significant amount of trial-and-error.
In recent years, CP has been an effective method for solving satisfaction and optimization problems [23]. There are a number of problem domains where CP is proven to be effective, and scheduling is one of these [20,[24][25][26]. Moreover, CP has also been shown to be effective on JSS. The studies by Beck et al. [18] and Watson and Beck [19], investigated a CP and tabu search hybrid. Their results showed that this hybrid can generate solutions with low variability and are competitive with local search approaches. Grimes and Hebrard [20] investigated JSS with setup times, and they showed that enhancing the search component of CP can prove to be very effective for this problem.

D. ML for combinatorial optimization
Leveraging ML for combinatorial optimization (MLCO) has attracted significant attention in recent years [27][28][29][30]. In their survey, Bengio et al. [28] have classified MLCO studies into three categories depending on the role of ML: (1) end-to-end learning in which ML is used to directly produce solutions for CO problem instances, (2) learning to configure algorithm in which ML is used to efficiently parameterise optimization algorithms (in a broad sense), (3) ML alongside optimization algorithms in which ML is used to assist low-level decisions within optimization algorithms. In the operations research and computational intelligence community, MLCO has been investigated extensively and referred to as hyper-heuristics [31][32][33][34][35]. Different from traditional search algorithms that attempt to find good solutions in the solution search space, hyperheuristics emphasise on exploring the heuristic (or algorithm) search space to find a good way to solve problem instances. Hyper-heuristics can be used for heuristic generation or heuristic selection. In heuristic generation, hyper-heuristics use a search algorithm (e.g. GP) to generate/discover effective heuristics for a problem subset. Generated heuristics can be used to construct solutions or refine/improve a solution (similar to the above mentioned categories 1 and 2 in Bengio et al. [28]). When used for heuristic selection, the goal of hyper-heuristics is to adaptively select low-level heuristics in optimization algorithms to solve problem instances (similar to category 3 in [28]). To avoid confusion, we will use Bengio et al. [28]'s categorisation to discuss related work.
Many earlier studies in the MLCO literature examine the methods for end-to-end learning. For examples, popular ML algorithms such as decision tree [36], logistic regression [37], artificial neural networks [38], and deep reinforcement learning [39] can be applied to learn constructive heuristics for production scheduling problems and showed that their learned heuristics outperform existing constructive heuristics in the literature. GP is also a popular learning algorithms in this category and many program representations and advanced search techniques have been proposed to improve the quality of learned heuristics [40,41]. There are several reasons for the popularity of end-to-end learning. The main reason is that computational efforts required for training in this case is low and learned model/heuristics can be used to generate good solutions very efficiently. The disadvantage of end-to-end learning is that there is no guarantee that solutions produced by these methods are optimal or near-optimal. Empirical results [36,41] showed that there is a gap in solution quality between learned constructive heuristics and optimization algorithms, which is less attractive for problems in which optimal solutions are desirable. Moreover, a meta-algorithm or a heuristic template is usually required to construct solutions, which restrict the applicability of learned heuristics. Apart from production scheduling, end-to-end learning is also applied to other CO problems such as NASA's resourceconstrained scheduling problem [42], bin packing [43], graphbased problems [44], and routing problems [45][46][47][48].
In the last decade, there has been a growing interest in learning to configure algorithm. For example, Khalil  Shen et al. [53] developed an ML-based primal heuristic for MILP solvers. Nguyen et al. [12] investigated the use of GP to learn variable selectors in CP, and they show that this method is effective for JSS considering three different objectives. Our proposed methods belong to this category because the learned variable selectors will configure the search of CP based on characteristics of problem instances.
With the recent boost in computing power, many innovative ML techniques have been developed to enhance optimization algorithms for solving combinatorial optimization problems. Kruber et al. [54] used ML to predict if decomposition is needed to reduce the time to solve MILP instances using features including instance characteristics and decomposition statistics. Kruber et al. [55] used ML to predict if linearisation will be more efficient in solving quadratic programming (QP) problem. Basso et al. [56] used random sampling and ML to learn what are good decompositions [56]. Morabit et al. [57] developed an ML model to select promising columns for column generation. Shen et al. [58] developed an ML-based pricing heuristic to enhance column generation. Sun et al. [59], Lauri and Dutta [60], Sun et al. [61] used ML to prune the search space of combinatorial optimization problems. Sun et al. [62] developed ML models to enhance the sampling of a metaheuristic. While the research in this paper fits into this wider effort to use ML to improve combinatorial optimisation, it is to the best of our knowledge the first time that it has been used to improve the labelling strategy in constraint programming for solving constraint optimization problems.

III. METHODS
We develop two supervised learning models to predict an optimal solution for the JSS problem, which is then used to enhance variable ordering for CP to solve the problem.

A. Feature extraction
A key task of building ML models is feature extraction, which often has a significant impact on the accuracy of the models. To predict the optimal value of decision variables (i.e., start times of operations) for JSS, we extract a set of features (f ) to characterise an operation in Table I. These features are basic attributes that can be extracted easily from the JSS problem. NPO, PT, PTB, PTA and EST are operationwise features that characterise an operation, the processing time, downstream and upstream operations etc. In contrast, TPT, DD, W, and RT are job-related features, and WL is a machine-related feature. Although deep learning models such as graph neural networks [63,64] have the potential to automatically extract features, training deep learning models typically requires a large amount of data and computational resources. Hence, we have only manually designed features to build a simple ML model. However, we will show in experiments that our simple model with little effort of training can already bring significant benefits to CP.

B. A regression model
A direct approach to predict optimal solution values of decision variables (i.e., optimal start times of operations) for the JSS problem is to train a regression model based on solved problem instances with known optimal solutions. The regression model takes the features of an operation as inputs, and outputs a real value as the predicted start time for the operation. The training of the regression model is to minimize the error between the predicted and the true optimal start times over a set of training instances.
To construct a training set, we use the CP-SAT solver in OR-Tools to solve a set of small JSS problem instances to optimality and obtain the optimal solutions. As the problem instances are small, they can be solved very efficiently. An operation in a training problem instance is then represented by a set of features (Table I) and labeled as its start time in the optimal solution. We can then construct a training set with a training example corresponding to an operation in a problem instance. In order to build a more robust regression model, we normalise the training data as follows: The number of precedent operations (NPO) is normalised by the total number of operations in a job; the weight (W) of a job is normalised by the maximum weight of the jobs; the other seven features and the label are normalised by the maximum TPT (total processing time of operations in a job) in a problem instance.
After the training set is obtained, we can train an existing ML algorithm for the regression task. We will evaluate three different ML algorithms, linear support vector machine (SVM), multi-layer perceptron (MLP) and genetic programming (GP). The linear SVM is very fast to train and highly interpretable, as it is a linear model. MLP is a widely-used neural network, that is able to learn complex non-linear relations between input and output data. GP is used as an evolutionary symbolic regressor here. Although it is much slower than other algorithms to train, GP can not only learn complex relations in data but also may have better interpretability. We will use the implementations of SVM and MLP in the Python scikit-learn library [65] and the GP implementation in the gplearn library.
Given an unseen JSS instance, we can first compute features for each operation, and then use the trained regression model to predict the optimal start time of each operation.

C. A classification model
An alternative approach of solution prediction is to train a classification model to predict the optimal sequence of operations for a JSS problem instance. The aim of the classification model is to predict, for each pair of operations in a JSS instance, which operation should be scheduled earlier. It is important to note that this classification model is effectively equivalent to the regression model, in the sense that they would produce the same variable ordering if they are both 100% accurate in prediction.
The time complexity of the classification model is quadratic in terms of the number of operations in a problem instance. To reduce the computational time, we introduce a variant of the classification model that only predicts, for each pair of operations on a machine, which operation should be scheduled earlier. We observed in our preliminary experiments that classifying operations within a machine is easier than classifying operations across different machines. Note that ordering all  of the operations within each machine is sufficient to specify a leaf of the search tree. Hence, we will only describe this classification model variant in detail. The training set for the classification model is slightly different from that for the regression model. Here, a training example corresponds to a pair of operations to be processed on a machine. Consider a pair of operations, o i and o j , and their feature vectors, f i and f j , normalised similarly as before. The features of the corresponding training example is the difference between f i and f j : f i − f j . The class label of the training example is −1 if the operation o i is scheduled before o j in the optimal solution; otherwise 1. This then becomes a standard binary classification task, and we will test three different ML algorithms for this task: SVM, MLP and GP.
Given an unseen JSS problem instance, we can apply the trained classification model to predict a sequence of operations. Specifically, we perform pairwise comparison between operations to be processed on a machine, and use the trained classification model to classify which operation is expected to be scheduled earlier on the machine. An operation accumulates a score of 1 if it is predicted to be scheduled later than the other operation under comparison. After the pairwise comparison, each operation has an accumulated score, indicating the predicted order of operations to be processed on a machine. The operations are then sorted in ascending order based on their accumulated scores.

D. Enhancing variable ordering for CP
We explore two different ways of enhancing variable ordering for CP via the supervised learning models: (1) a pure ML-based approach, and (2) a hybrid approach. The first approach directly orders variables (operations) based on the predicted solution (i.e., the predicted start times if using the regression model or the predicted operation sequence if using the classification model). On the other hand, the hybrid approach combines the merits of both ML-based with domain-based variable ordering methods. Specifically, we first apply a domain-based variable ordering method, that orders variables based on their minimum domain value. If there are multiple variables with the same minimum domain value, we further use the ML-based variable ordering as a second criterion to break ties. The motivation behind the hybridisation approach is that the domain-based variable ordering method determines predominantly the efficiency of CP for finding feasible solutions [12], whilst our ML-based variable ordering focuses more on finding high-quality solutions.

A. Experimental setup
Problem instances. We use both randomly generated instances and benchmark instances to evaluate our methods. The random instances are generated using the generator described in [12]. For a problem size N × M , we generate 100 problem instances using different random seeds, with the due date allowance factor h set to 1.3. In addition, we use five popular benchmark datasets in OR-library [66] for Cmax. These additional datasets have no release times.
Implementation and platform. We use the state-of-theart CP-SAT solver of Google OR-Tools [67] as our CP solver. The domain reduction strategy for CP-SAT is set to 'selecting-min-value' based on preliminary experiments. The implementations of the ML algorithms are from the scikitlearn [65] and gplearn libraries. The parameters of the ML algorithms are consistent with the default setting except the number of generations for GP is set to 2000. Our source codes are written in Python and will be made publicly available if the paper gets accepted. Our experiments are conducted on a high performance computing server with multiple CPUs @2.70GHz, each with 4GB memory.
Baselines. We compare our ML-based variable ordering methods with four baselines: (1) Default, the default variable ordering method of OR-Tools, that orders variables based on a definition of impact (reduction of search space) [6], (2) MinDom, that orders variables based on the current domain size [4], (3) LowMin, that orders variables based on the current minimum domain value, and (4) BranGP, that uses GP to learn a variable ordering to minimize the number of branches required to solve a set of training instances [12].

B. Results of training
Training set. We use 100 randomly generated JSS problem instances of size 9 × 9 to construct our training sets. For each objective, Cmax, Tmax and TWT, we solve the training problem instances to optimally using OR-Tools. As shown in Table II (the column 'Data Collection'), the training problem instances can be solved efficiently, especially for Cmax and  Tmax. We construct a separate training set to train the regression and classification models for each of the objectives. Training time. The time taken to train each ML models is shown in Table II. The linear model, SVM, can be trained in a few seconds, and MLP in about one minute. Training our GP model is slower but still can be done in less than one hour. Overall, the training of our supervised learning models is much more efficient than that of the evolutionary learning approach, BranGP.
Training accuracy. The 5-fold cross validation accuracy of the classification models and coefficient of determination (R 2 ) of the regression models are shown in Table III. All the ML algorithms achieve a reasonable accuracy and R 2 score for the classification and regression tasks. Overall, MLP performs the best and GP is slightly worse than the other two. BranGP is not a supervised learning model, so no classification accuracy or R 2 score can be reported.
Model interpretability of SVM. The coefficients of the features in the decision functions learned by SVM-C (for classification) and SVM-R (for regression) are shown in Table IV. Here, the magnitude of a coefficient indicates the importance of the corresponding feature, and the sign of a coefficient represents the correlation between the feature with the label. For example in the decision functions learned by SVM-C for Tmax, the features f 2 (PT) and f 9 (EST) are of the most importance as their coefficients have the largest magnitude. In addition, these two coefficients are positive, meaning that an operation with smaller processing time (PT) and smaller earliest start time (EST) should be scheduled earlier. Overall, the features f 2 (PT) and f 9 (EST) are two most important features for all three objectives.
Model interpretability of GP. The decision functions learned by GP-C (for classification) are • Cmax: f 1 ; • Tmax: 0.572f 2 + f 9 ; • TWT: 0.065f 2 − f 7 + 0.090f 9 − 0.004. The decision functions learned by GP-R (for regression) are: The acronyms of the features can be found in Table IV. It is surprising that most of the decision functions learned by GP are linear. For example, the one learned by GP-C for Cmax only uses one feature f 1 (number of precedent operations) to order variables. The earliest start time (f 9 ) is the most important feature used by five of six decision functions.
Model interpretability of MLP. MLP is a 'black-box' neural network model. However we can visualize the decision boundary learned by MLP-C empirically. To do so, we use principal component analysis (PCA) to project the 10-D feature vectors onto a 2-D space, spanned by the first two principal components. MLP is then trained in the 2-D space, and the decision boundary learned is shown in Fig. 1. These decision boundaries are also close to linear.

C. Efficacy of ML-based variable ordering methods
We evaluate the efficacy of our ML-based variable ordering methods on randomly generated JSS problem instances. For Cmax and Tmax, we generate 100 instances for each problem size in {6 × 6, 8 × 8, 10 × 10, 12 × 12, 14 × 14}. For TWT, the test problem sizes are {6 × 6, 7 × 7, 8 × 8, 9 × 9, 10 × 10} because the TWT objective is much hard than the other two to solve. We use the CP-SAT solver in OR-Tools with ten different variable ordering methods to solve the test problem instances. As the test instances can be optimally solved, we also investigate a variable ordering strategy, called OptSol, that orders variables based on the optimal solution values.
The mean and standard deviation of the number of branches used by each method to optimally solve the problem instances are presented in Table V. Here, we also present the p-values of the paired t-tests comparing each method against the one with the best mean values. Overall, our methods (GP-C, SVM-C, MLP-C, GP-R, SVM-R, and MLP-R) generate the best results compared to the four baselines (Default, MinDom, LowMin and BranGP). Specifically, all our six classification and regression models significantly outperforms the Default and MinDom methods for all the three objectives. The LowMin method performs very well for TWT, but it is significantly outperformed by our methods for the other two objectives. For some of the instances, including those TWT instances where LowMin performs well, the OptSol approach does not perform very well. This demonstrates the limitation of using the optimal solutions from small instances as training data. While ordering variables based on an optimal solution guarantees that the optimal solution is found at the start of the search and hence that the best possible upper bound is available, it does not guarantee that the size of the search tree is minimised. Similarly, the BranGP method is very competitive for Cmax, but it is consistently outperformed by our methods for Tmax and TWT.

Obj
Method   In Fig. 2, we select three of our methods (GP-C, SVM-C, MLP-R) and compare them to the baselines using box plots. Here, for each variable ordering method, we compute an improvement over the Default method as (n d − n)/ max{n d , n}, where n d and n are the number of branches used by the Default and the method under comparison respectively. An improvement value greater than zero indicates that the method performs better than the Default. We can observe that our ML-based methods generally produce the largest improvement compared to the baselines with only a few exceptions.
In Fig. 3, we compare our six classification and regression models using box plots. In general, GP-C and MLP-R perform slightly better than others for Cmax, while all methods perform similarly for Tmax and TWT. This is also supported by the statistical test results presented in Table V. Hence, our MLbased variable ordering method can significantly benefit the CP solver, no matter which of the learning algorithms is used.
The OptSol method, that orders variables based on the optimal solution, performs very well on small problem in-   stances, but its performance deteriorates when the problem size increases. This may be because the OptSol method (and hence all our ML-based methods) focuses on finding high-quality solutions without taking into account how to efficiently prove optimality. The hybrid variable ordering methods should be able to alleviate this issue, because they combine the merits of both ML-based and domain-based variable ordering methods.

D. Efficacy of hybrid variable ordering methods
We hybridise our ML-based variable ordering methods with the domain-based LowMin method, because LowMin outperforms the MinDom and Default methods according to Table V. Further, we will only consider the classification models (i.e., GP-C, SVM-C and MLP-C) from now on, since the regression models are not significantly different from the classification models. For a fair comparison, we also hybridise the baseline, BranGP, with LowMin. Fig. 4 shows the comparison between the pure ML-based methods (GP-C, SVM-C and MLP-C) against the hybrid variable ordering methods (GP-H, SVM-H and MLP-H). We can observe that the hybrid methods generally perform better than the corresponding ML-based methods. When the problem size increases, the difference between the hybrid and ML-based methods becomes more significant. This result is consistent with our intuition that the hybrid methods should be able to prove optimality faster especially on larger problem instances. Table VI presents the results of all hybrid methods and the baselines. Here, the hybrid methods consistently perform better than the Default, MinDom and LowMin across different problem sizes and objectives, in terms of the average number of branches used to solve the problem instances. The hybrid method using BranGP (BranGP-H) is competitive compared to our hybrid methods (GP-H, SVM-H and MLP-H).

E. Generalisation to larger instances
We test the generalisation of our methods on larger problem instances. As the hybrid methods perform better than the pure ML-based methods, we will only test the hybrid methods in the rest of the paper. For each problem size in {20 × 20, 40 × 40, 60 × 60, 80 × 80, 100 × 100}, we randomly generate 100 instances with different random seeds, similar as before. These instances are significantly larger, and they generally cannot be solved to optimality within a reasonable amount of time. Hence, we give 60 seconds to the CP-SAT solver and record the best objective values found within the cutoff time. Fig. 5 shows the comparison between our hybrid methods (GP-H, MLP-H, SVM-H) and baselines. Here, the improvement is computed over the Default method in terms of the best objective value found. We can see that all methods perform better than the Default except MinDom. The LowMin method performs well for TWT, but is significantly outperformed by our hybrid methods. The BranGP-H method performs similarly with GP-H and better than MLP-H and SVM-H for Cmax. However, for the other two objectives Tmax and TWT, the BranGP-H method is generally outperformed by our hybrid methods, and the performance gap becomes larger when the problem size increases.

F. Generalisation to benchmark instances
We further test the generalisation of our hybrid variable ordering methods on benchmark instances that are very different from those used in training. The cutoff time for the CP-SAT solver to solve each problem instance is set to ten minutes. The results for the Cmax benchmarks are presented in Table VII. Overall, our hybrid methods (GP-H, SVM-H and MLP-H) perform better than the Default, MinDow and LowMin methods. Our GP-H method performs the best in terms of the number of instances solved. Our MLP-H method generally performs the best in terms of the number of branches required for solved instances. All hybrid methods (including BranGP-H) perform similarly in terms of the best objective value found for unsolved instances.    and optimization problems. In this paper, we proposed a supervised learning approach to automatically learn effective variable ordering strategies for CP, and showed that this can be achieved using the job shop scheduling problem as a case study. The idea was to predict the optimal solution of a problem instance and order variables according to the predicted solution. We showed that the solution prediction for the job shop scheduling problem can be modeled as either a classification or regression task. Our supervised learning models were much faster to train compared to an evolutionary learning approach, and they achieved a reasonable accuracy in predicting optimal solutions with three different algorithms. We evaluated our ML-based variable ordering methods using comprehensive experiments and showed that overall our methods performed better than four baselines. Finally, we showed the benefit of hybridising our ML-based variable ordering methods with a traditional domain-based method especially on solving large-sized problem instances.