A choice function hyper-heuristic framework for the allocation of maintenance tasks in Danish railways

A new signalling system in Denmark aims at ensuring fast and reliable train operations, however imposes very strict time limits on recovery plans in the event of failure. As a result, it is necessary to develop a new approach to the entire maintenance scheduling process. In the largest region of Denmark, the Jutland peninsula, there is a decentralised structure for maintenance planning, whereby the crew start their duties from their home locations rather than starting from a single depot. In this paper, we allocate a set of maintenance tasks in Jutland to a set of maintenance crew members, defining the sub-region that each crew member is responsible for. Two key considerations must be made when allocating tasks to crew members. Firstly a fair balance of workload must exist between crew members and secondly, the distance between two tasks in the same sub-region must be minimised, in order to facilitate quick response in the case of unexpected failure. We propose a perturbative selection hyper-heuristic framework to improve initial solutions by reassigning outliers, those tasks that are far away, to another crew member at each iteration, using one of five low-level heuristics. Results of two hyperheuristics, using a number of different initial solution construction methods are presented over a set of 12 benchmark problem instances.


Introduction
The European Railway Traffic Management System (ERTMS) ( Barger et al., 2009 ) is the newest signalling standard to systematise train control and communication systems within railway networks. The motivation behind ERTMS has been to enhance signalling communication amongst various train systems, to improve connectivity and allow for faster travel between European countries. Although ERTMS was initially presented by the European Union for the scope of European countries, it rapidly became a worldwide signalling standard. As ERTMS is still in the primary stages of operation, there is very limited research pertinent to its maintenance processes and other aspects ( Barger et al., 2009;El Amraoui and Mesghouni, 2014;Patra et al., 2010;Redekker, 2008;Tapsall, 2003 ).
Denmark will be the first country in Europe to upgrade its entire signalling system to ERTMS. Railway track and signalling systems are complex and highly interdependent. Unlike when a failure happens on a track segment, failure of one component in the route for each crew member can be determined and the overall driving distance cost calculated for the entire maintenance plan. We must emphasise here that this routing phase is considered as a separate optimisation problem and will not be studied in this paper.
The focus of this paper is the allocation of maintenance tasks to crew members for the Jutland peninsula, the largest region in Denmark. The current maintenance planning system in the country is decentralised, with crew members starting their duties from different locations rather than from a single depot. This structure requires an effective assignment of tasks to avoid high total driving distance costs or, in some cases, to ensure a feasible plan is made. Based on the allocations found, each crew member is responsible for undertaking tasks within their own sub-region.
Considering the characteristics of the maintenance planning problem introduced above, the problem can be seen as the Multi-Depot Vehicle Routing Problem (MDVRP) ( Lenstra and Kan, 1981 ), where each vehicle operates on its own routes, starting and finishing at a specific depot. According to Banedanmark, each crew member is equipped with a technical vehicle and all the necessary equipment to undertake any task. Each crew member in our problem can be seen as a vehicle within the MDVRP, with their home location corresponding to a depot. Starting and ending their route at the depot location, each crew member must complete all of the tasks that they have been assigned. As the MDVRP is an NP-hard problem, heuristic methods have been used widely within the literature. Among the existing heuristic approaches, Tabu Search ( Cordeau et al., 2001 ) and adaptive large neighbourhood search ( Pisinger and Ropke, 2007 ) have been shown to be particularly successful. Montoya-Torres et al. (2015) provide a comprehensive survey on approaches to solving the MDVRP.
Due to the structure of the MDVRP, the process of determining which customers are served by which depots has been fundamental to many proposed solution approaches. Such approaches fall under the research spectrum of cluster-first, route-second approaches ( Fisher and Jaikumar, 1981;Peng, 2011 ), in which the clustering phase is usually solved by an assignment algorithm ( Tansini et al., 2001 ). Giosa et al. (2002) proposed a number of assignment algorithms for the MDVRP, three of which, namely Parallel Assignment, Simplified Assignment and Sweep Assignment ( Ryan et al., 1993 ), were referred to as methods which perform assignment through urgencies . These methods define a precedence relationship between customers, to determine the order in which they are serviced by the depot, with high-priority or "urgent" customers served first.
Hyper-heuristics represent a class of high-level search techniques employed for addressing combinatorial optimisation problems . Unlike traditional search methods, which operate on a space of solutions, hyper-heuristics operate on a search space of low-level heuristics or heuristic components. A recent definition of hyper-heuristics is given by Burke et al. (2010) : 'A hyper-heuristic is a search method or learning mechanism for selecting or generating heuristics to solve computational search problems'.
This definition covers the two main categories of hyperheuristics: selection hyper-heuristics, which choose a heuristic to apply at each step of a search, and generation hyper-heuristics, which generate new heuristics from existing sets of low-level heuristics or components. A traditional selection hyper-heuristic iteratively selects and applies low-level heuristics to a single solution, using a move acceptance criterion to make a decision whether to keep the new solution at each step. While there has been sustained research interest in hyper-heuristics in the last decade or so, methods exhibiting hyper-heuristic behaviour can be traced back to as early as 1961 ( Fisher and Thompson, 1961 ). Selec-tion hyper-heuristics have been previously applied successfully to a wide array of problem domains, including bin packing ( Lopez-Camacho et al., 2011 ), dynamic environments ( Kiraz et al., 2013 ), examination timetabling ( Ozcan et al., 2010 ), the multidimensional knapsack problem ( Drake et al., 2014 ), nurse rostering ( Burke et al., 2003 ), sports scheduling ( Gibbs et al., 2011 ) and the vehicle routing problem ( Garrido and Castro, 2009 ). Here we will use a selection hyper-heuristic to define working sub-regions for maintenance crew members across the Danish rail network. This paper is organised into five sections. In Section 2 , we present the problem definition, including a mathematical model of the railway maintenance crew scheduling problem and a description of the instances used. Section 3 describes the proposed framework used to solve the problem, and Section 4 presents experimental results and a discussion on the proposed framework. Finally, this paper closes with a conclusion in Section 5 .

Mathematical model
The mathematical model of the problem that we deal with in this paper can be described as follows. Given a set of crew members C and a set of maintenance tasks M , with crew indices k, v ∈ C and maintenance task indices l, h ∈ M , decision variable x k,l is set to 1 if task l is assigned to crew member k ; otherwise, it is 0. Q k,l denotes the distance between crew k and task l , while S l,h is the distance between task l and task h and d l is the duration of task l . The objective function (1) is multi-criteria, whereby the first term in the objective function minimises the total travel time from a crew member's location to the assigned tasks for each crew member. The second term ψ, together with constraint (2) , aims at minimising the maximum distance among task pairs within each sub-region. This reflects the definition of the diameter of a subregion as the maximum distance between any two tasks assigned to a maintenance crew member.
In addition, fair distribution of the tasks among the crew is considered as a third criterion ( w ). Workload distribution is modelled according to the balancing constraints defined by Bredstrom and Ronnqvist (2008) . Using this formulation, constraint (3) balances mismatches across different sub-regions, where w represents the biggest difference in the total duration of assigned tasks between any two sub-regions. Constraint (4) ensures that each task is assigned only to one crew member.

Dataset
As ETRMS has not yet been implemented, this is exploratory work commissioned by Banedanmark, the state-owned Danish company in charge of maintenance and traffic control of most of the Danish railway network. As such, there is currently no solution implemented in practice yet. This work has been done before the implementation of ERTMS, to give some indication of the problem that they are likely to face, and ensure that they are prepared when it comes to solving the problem in the future. In this section we define the instances used for experimentation. The geographical points are all located in the Danish peninsula of Jutland. Tasks should be assigned to a number of crew members. Coordinates representing the geographical location of the tasks were generated by utilising the Google Map API. This was done based on three different task location generation strategies: 1. Exact (E). Tasks are all located on the rail tracks of the Jutland region. 2. Mixed (M). Tasks are located at a mix of on-or off-track positions within the Jutland region. 3. Random (R). Tasks are scattered randomly across the Jutland region.
For each of these three cases, four instances were generated with a different total number of tasks: 10 0, 50 0, 10 0 0 and 50 0 0, resulting in 12 problem instances overall. These should be serviced by a team of eight crew members. These numbers were chosen respectively according to the numbers of maintenance tasks which need to be done on a daily, weekly, monthly and annual basis. To standardise our test cases, we follow the file format of the classical benchmark test sets for the Vehicle Routing Problem with Time Windows (VRPTW), introduced by Solomon. 1 The dataset and documentation about how the instances were created are accessible at http://github.com/ShahrzadMP/Dataset . Each instance is referred to by its locationType-taskTotal pair herein, e.g. E100, R5000 etc. Fig. 1 presents a geographical visualisation of the on-track, on-and offtrack and random instances with 500 tasks.

Proposed framework
Given an existing solution generated by an initial constructive phase, we use a selection hyper-heuristic to improve the assignment of maintenance tasks to crew members. As with many existing selection hyper-heuristics, the search is performed on a single candidate solution, in an attempt to improve a given solution at each iteration, using two phases: heuristic selection and move acceptance ( Ozcan et al., 2010 ). By applying a selected heuristic 1 http://w.cba.neu.edu/ ∼ msolomon/problems.htm . at each iteration, a candidate solution (Sol t ) at a given time ( t ) is modified into a new solution. A move acceptance criterion makes the decision of whether to accept or reject the new solution.
In the proposed framework, task assignments are modified by reassigning tasks that are far away from a maintenance crew member's starting position to another maintenance crew member's subregion. Such tasks are representative of the concept of outliers, explained in more detail in Section 3.2 . The algorithm starts with a constructive phase to generate an initial feasible solution. Next, at each iteration, the algorithm tries to detect an outlier in a particular sub-region. If no outlier is found for any of the sub-regions of the current solution, the algorithm terminates and the best solution is returned as the final solution. If an outlier is detected, the hyper-heuristic selects and applies a low-level heuristic to reassign the outlying task before the move acceptance criteria decides whether to accept this new allocation. This process continues until either no outliers remain or one of the given termination criterion is met. The overall framework is illustrated in Fig. 2 .

Initial solutions
To generate initial solutions, we present a constructive deterministic heuristic based on two different ordering strategies, in order to assign tasks to maintenance crew members. The set of tasks allocated to each crew member represents the sub-region in which the crew member operates. The constructive heuristic starts with a list of maintenance tasks, sorted according to the distance of each task from the crew member's starting location, and in each step a task is allocated to a crew member, depending on the ordering strategy being used. We define two strategies to decide the order in which tasks are allocated: Furthest Task First (FTF) and Closest Task First (CTF). In FTF, tasks are ordered in descending order of distance from the closest crew member, with the task furthest from its closest crew member allocated first. This strategy intends to allocate "difficult to assign" tasks which are a long distance from any crew member early on in the construction process. Conversely, CTF allocates tasks in a greedy manner, assigning them in ascending order of distance away from the closest crew member.
In order to ensure that tasks are distributed fairly among all crew members, a Tabu list is used to manage those who are able to be allocated a task at a given point. Once a task is allo- cated to a crew member, the heuristic is prohibited from allocating this person another task until the Tabu list becomes empty. In this way, the number of tasks assigned to each crew member is balanced while constructing the solution. Algorithm 1 presents the pseudocode for the constructive heuristic. For comparison, we have also implemented the Simplified Assignment (SA) algorithm ( Giosa et al., 2002 ) from the literature, which orders tasks by the difference in distance from a task to the closest and second closest crew member.

Identifying outliers
In the task allocation problem described above, in order to ensure a quick response across the network in the event of failure, the maximum distance between the tasks should be minimised within each sub-region (cluster). This reflects the definition of the diameter of a cluster, that is, the maximum distance between any two points of the sub-region ( Rajaraman et al., 2012 ). Explicitly calculating the diameter of a sub-region can be costly and requires checking all pairs of tasks within that sub-region. In terms of time complexity this is O ( n 2 ), where n is the number of tasks within the sub-region. To reduce the time complexity of our approach and allow for better scalability, we use the radius of the sub-region instead of the diameter. The radius of a sub-region is defined as the maximum distance between all the points and the sub-region centre and can be calculated in O ( n ) time. Whilst the radius and diameter of a cluster are not associated directly, they do have a propensity for being proportional ( Rajaraman et al., 2012 ). Fig. 3 shows the outlier detection module in the proposed framework. A sub-region is selected randomly from the current solution at hand. In order to detect an outlier, the module finds the task furthest away from the sub-region centre, defined as the starting location of a crew member. If the radius is greater than half Algorithm 1: Ordering heuristic, employed to generate initial solutions.
1: Order task list M according to ordering strategy (FTF or CTF) 2: Initialise tabuList as empty 3: Set tabuList size to number of crew member -1 4: for each task l in M do

5:
if Size of tabuList equals to maximum size of tabuList then 6: empty the tabuList 7: Allocate l to closest non-Tabu crew member c 8: Add c to tabuList 9: end for of the maximum allowed distance during the failures, it is recognised as an outlier. In the Banedanmark problem, the maximum allowed distance is 100 km which corresponds to roughly an hour and a half travel time. For example, if the furthest task away from the sub-region centre (radius) is 80 km, the task will be detected as the outlier, as the radius is greater than half of the maximum allowed distance, which is 50 km in this example.
If an outlier is detected within the current sub-region, the algorithm will enter the improvement phase, carried out by the selection hyper-heuristic. If not, the algorithm will add the selected subregion to a Tabu list, to avoid re-selecting sub-regions that do not contain any outliers. After a sub-region is added to the Tabu list, the algorithm continues to keep selecting a non-Tabu sub-region until it finds either a sub-region with an outlier, or there are no more non-Tabu sub-regions from which to choose. Each time an outlier is detected successfully, the Tabu list is emptied. Outlier detection is possible until the radius (furthest task away from the centre of the sub-region) of all sub-regions is no further than half of the maximum distance a crew member is allowed to travel in the case of a breakdown. In the worst case, the maximum distance from a crew members current location to the location of a failure within the sub-region should be twice the radius of the sub-region, and therefore within the maximum distance allowed.

Choice function heuristic selection
Once an outlying task has been identified, a low-level heuristic is applied to reassign the task to another sub-region. The impact of different low-level heuristics on a certain solution is dependent on two factors: the nature of the low-level heuristic and the point in the search at which they are applied. Hence, if the state of the search can be acknowledged through some mechanism, a hyper-heuristic can apply an appropriate heuristic at each step, in order to guide the solution towards better areas of the solution space. The choice function is an intelligent heuristic selection strategy, introduced by Cowling et al. (2001a ) to evaluate and rank the performance of multiple low-level heuristics. Choice functionbased hyper-heuristics and variants have since been used to solve a variety of different problems ( Drake et al., 2014;Guizzo et al., 2015;Maashi et al., 2015 ).
The choice function comprises three terms and utilises information about the impact of each low-level heuristic individually ( f 1 ), the combined impact of applying two heuristics successively ( f 2 ) and the amount of time elapsed since the heuristic was last called ( f 3 ) ( Cowling et al., 2001a ). At each decision point, the lowlevel heuristic with the highest score, calculated using the choice function, is selected and applied to the current solution. Exploitation of the search space is taken into account by gathering perfor- mance information on the heuristics through f 1 and f 2 . Exploration of other parts of the search space is achieved by selecting low-level heuristics that have not been applied recently ( f 3 ). The parameters α, β and γ are used to weight each of the three components ( f 1 , f 2 and f 3 ), giving greater weight to recent performance. The complete formulation of these components is as follows: where I n ( h j ) and T n ( h j ) are changes in the objective function and CPU time taken the n th last time the heuristic h j was called. I n ( h k , h j ) and T n ( h k , h j ) indicate the change in the evaluation function and the amount of CPU time taken, the n th last time the heuristic h j was called directly after heuristic h k . Finally, τ ( h j ) is the time elapsed since the heuristic h j was last called. The choice function, F , for a given heuristic is calculated as: To enhance the generality and robustness of our hyperheuristic, a self-adaptive version is preferable. Accordingly, we use the parameter-free choice function introduced by Cowling et al. (2001b ) which tunes the parameters of the choice function at each decision point based on the state of the search space, rather than using constant values for α, β and γ during the search. The parameters α, β and γ are rewarded or punished if the resulting solution following the application of a low-level heuristic is better or worse than the previous solution, respectively. This adaptivity allows for regular interplay between the parameters of the choice function, modifying the weighting assigned to each parameter according to the performance of each low-level heuristic application. Various approaches can be implemented as a reward/punishment strategy to control α, β and γ . Examples include a linear scheme (e.g. α = α(1 + ) ) or non-linear (e.g. α = α (1+ ) ) scheme, where can be either a negative or positive constant, or a function of the relative improvement obtained from the change in the evaluation function after employment of the last selected heuristic ( Soubeiga, 2003 ). Here we employ the adaptive choice function hyper-heuristic taken from the schematic view given by Soubeiga (2003) , using a linear scheme with a constant value of 0.1 with the positive or negative sign for the reward and punishment scheme, respectively. Initially, α, β, and γ are set to 1.
This adaptive variant of the choice function will be referred to as CFHH in the remaining sections of the paper. In addition, our experiments will also use a simple random hyper-heuristic (SRHH) for comparison, which makes a uniform random selection of lowlevel heuristic to apply at each step.

Low-level heuristics
We introduce five low-level heuristics that the hyper-heuristics select from. A low-level heuristic defines a strategy to reallocate a task identified as an outlier in one sub-region to another maintenance crew member. The five low-level heuristics are illustrated in Fig. 4 , in which a circle represents a single maintenance crew member's sub-region, with each point denoting a task allocated within that particular sub-region. Red points are tasks identified as outliers, while black points could be either an outlier or a nonoutlying task. All of the proposed low-level heuristics, except for Balancing, have been defined as hill-climbing methods. This means that when they are applied to a solution, if the solution is not improved, the new solution is discarded and the original solution retained. The balancing low-level heuristic does not consider the change in objective function value, and only attempts to balance the number of tasks allocated to each crew member in the current solution.
Domino : the Domino heuristic first moves the identified outlying task to the sub-region of the closest other maintenance crew member. Subsequently, the sub-region which has received the outlier does the same and reassigns its furthest task to the sub-region of the closest crew member's starting location, thereby having a "domino effect" on the overall solution.
Pair : this heuristic removes two outliers sequentially from the selected sub-region and assigns them to the best possible subregion in terms of the distance of the outlier to the other subregions' centres. The destination sub-region for the two outliers could be the same or different. This heuristic changes the balance of the sub-regions.
Interchange : this heuristic tries to allocate an outlying task to the closest other crew member in exchange for another task, which is closer to the first crew member than the original outlier. The task received from the second crew member could either be an outlier or another task which is closer to the first crew member's starting position.
Balancing : in order to try to balance the number of tasks between crew members, the Balancing heuristic moves an outlying task to another crew member, who is currently allocated fewer tasks in total. Join : this low-level heuristic looks for two tasks which are close to each other in terms of distance, but belong to different sub- regions. It then tries to place the two tasks in the same sub-region. Out of the two possible moves, the assignment which yields the lowest average distance of the two tasks away from the centre of the sub-regions is kept.

Pseudocode for the proposed framework
The framework that we present in this paper is composed of three phases: generating an initial solution, detecting the outlier and improving the solution using a selection hyper-heuristic. In each run of the algorithm, one initial solution is generated and then the solution is improved through collaboration between the outlier detection and improvement hyper-heuristic phases.
Algorithm 2 presents the pseudocode for the proposed choice function hyper-heuristic approach to the problem (CFHH). The search space of the high-level heuristic consists of all possible permutations of the low-level heuristics defined in Section 3.4 . The algorithm starts by generating an initial solution using one of the constructive heuristics introduced in Section 3.1 . Once a solution is constructed, the algorithm enters the main loop to find an outlier of one of the sub-regions and improve the solution iteratively, until the stopping condition is met. Outlier detection (line 5) has been explained in detail in Section 3.2 . If an outlier is found, the algorithm will attempt to improve the solution using the choice function hyper-heuristic introduced in Section 3.3 operating over the low-level heuristics described in Section 3.4 .
As discussed earlier, in order to enhance the robustness of the presented framework in this paper, we employ the adaptive choice function ( Soubeiga, 2003 ), which automatically changes its parameters according to the search space in which it is operating. The rest of the algorithm from line 7 refers to the schematic flow chart of the adaptive choice function introduced by Soubeiga (2003) . At the beginning of the search, the variable nonImprovement is declared, to keep track of the number of consecutive iterations no changes to the objective function are made. The choice function value is then computed for each heuristic, and the heuristic h j with the highest F value is selected (lines 7 and 8). H 2 is another heuristic, with the highest value for f 3 , used to provide an appropriate level of exploration of the heuristic search space (line 9). In order to determine whether the hyper-heuristic needs to exploit or explore the solution space at each iteration, G , the biggest contributor to the F value of the selected heuristic, is identified. This prescribes the way in which the chosen heuristic is applied (line 13). In the case of N consecutive non-improving iterations, H 2 is applied to the solution (line 12). Compute choice function F for each heuristic 8: Select heuristic h j for which F is max 9: Select heuristic H2 where f 3 is max, and H2 = h j 10: if nonImprov ement is ≤ N then 11: if nonImprov ement = N then 12: Apply heuristic H2 to Solution 13: G = biggest contributor to F , either f 1 , f 2 or f 3 14: if G = f 1 or f 2 then 15: Apply h j in steepest decent 16: In general, when the algorithm is in an exploitation phase ( G = f 1 or G = f 2 ), the chosen heuristic is applied in steepest descent fashion (line 14). If the solution requires exploration ( G = f 3 ), the heuristic with the smallest f 3 value is applied in steepest descent fashion (line 18). If this yields an improvement γ is penalised (line 20), otherwise h j is applied using steepest descent (line 22). If this still doesn't lead to an improvement, the solution is returned to the previous solution and h j applied once (line 24). If no component of the choice function dominates the others in terms of contribution to F, h j is applied in steepest descent fashion (line 26). Following the application of a low-level heuristic to the solution, nonImprovement is incremented if no improvement has been found and set to 0 in the case of improvement (line 27). After more than N consecutive non-improving iterations, the algorithm rewards γ and H 2 is applied to the solution (lines 29-33).
The algorithm terminates under three different criteria. The first occurs when no outlier is found in any of the sub-regions within the solution. If no outliers are detected, the low-level heuristics have no task to reassign to another sub-region. The second criterion is met when an outlier is detected, but the hyper-heuristic cannot improve the solution after a certain number of iterations. This threshold is set to 0.1 * the number of tasks in the problem instance. Finally, if the algorithm does not fail under the previous conditions, the framework will stop after a set number of iterations (2 * number of tasks in the instance).

Results and discussion
This section presents a number of experiments to analyse various aspects of the proposed framework. Firstly, the results of the initial solutions obtained using the CTF, FTF and SA assignment algorithms introduced in Section 3.1 are compared. Following this, the results of the proposed choice function selection hyperheuristic (CFHH) applied to the three different initial solutions generated for each instance are presented. Next, we compare CFHH to a baseline simple random hyper-heuristic (SRHH) using the solutions generated by FTF. Detailed analysis of the performance of low-level heuristics is then performed, using the three largest instances. Finally, detailed performance of the choice function hyperheuristic (CFHH) during a single run is presented, using one of the largest instances as an example. All experiments were run using an Intel Core (TM) i7-4600U CPU 2.10 GHz processor, with 8.00GB RAM. Table 1 summarises the results of using three different constructive heuristics to generate solutions for the 12 instances introduced in Section 2.2 . This table shows five different measurements related to each solution. Total_D is the total distance cost, calculated as the sum of the distances between each task and the crew member to which it is assigned. MDD gives the maximum distance between two tasks allocated to a single crew member within the whole solution. This gives an indication of the worst case scenario in terms of travel time in the case of unexpected failures or breakdowns. Similarly, AVG_MDD calculates the average maximum distance travelled by each crew member, to give an "average worst case" across the entire solution. w is the imbalance in workload distribution across different sub-regions on the railway network. The CPU time taken to generate the solution in seconds is also given (CPU_T). The best value for each metric between the three constructive heuristics is highlighted in bold.

Quality of the initial solutions generated using different constructive heuristics
From Table 1 , we can see that SA generates many of the best results in terms of Total_D and MDD. In other measurements, FTF generates marginally better results in the majority of cases for AVG_MDD and CPU_T(s), and CTF generates slightly better results in terms of Total_D for the 'R' instances. The only exceptional cases are as follows: FTF generates results much more quickly (256.48, 201.76, 360.94) for large instances compared to SA (575.89,416.75,412.36) on E50 0 0, M50 0 0 and R50 0 0, respectively. CTF also generates significantly better results in terms of Total_D (7283.62) for instance R100 compared to SA (7413.68). Regarding workload imbalance ( w ), SA results in a better distribution of tasks overall. However, there is not a big difference compared to CTF and FTF. It is evident that the results achieved by FTF are close to the results of SA, while CTF generates the worst results. Using FTF ordering, tasks are assigned to the crew members, starting with the most difficult tasks through to the easiest. Using FTF, the algorithm penalises the solution in the early steps of solution construction. However, this protects the solution from receiving high penalties for assigning the remaining faraway tasks to the crew in the final steps of solution construction. Distant tasks which are difficult to place are assigned to a better possible choice in the early stages of constructing a solution, unlike CTF which effectively assigns tasks in a greedy manner. Similarly, the difference measure used by SA prevents bigger penalties later on in the construction of a solution by assigning tasks which are close to a single crew member early on. In the remaining sections of the paper, we will use the solutions obtained by the CTF, FTF and SA construction heuristics as input for hyper-heuristics attempting to improve the initial task allocations.

Results of CFHH using different initial solutions
Here we will analyse the impact of different initial solutions with different qualities on the performance of CFHH. For this pur- pose, we performed 10 CFHH runs, starting from the same initial solution for the solutions generated by CTF, FTF and SA for each instance. Table 2 shows the average performance obtained by CFHH, using different initial solutions based on the five measurements introduced in Section 4.1 . Each of these measurements is followed by a column indicating the relative ranking of that measurement compared to the other two methods for generating initial solutions. At a glance, the results indicate that CFHH, using solutions constructed on an FTF basis, performs better in the majority of measurements for all instances, ranked mainly first and second, with SA also performing well. This is despite the fact that the quality of the initial solutions generated by FTF were often of poorer quality than those generated by SA in the previous subsection, especially in terms of Total_D. Notably, CTF generates the worst results in all instances under the Mixed (M) and Random (R) categories in terms of Total_D, MDD and AVG_MDD. This demonstrates that starting with a solution which makes decisions on a greedy basis makes any improvement to the solution more difficult when applying CFHH. In other words, a good balance between the greediness of the initial solution and the adaptiveness of the hyper-heuristic is not found. It is notable that the results obtained using these distance-based measurements seem to be correlated, with the best solutions in terms of Total_D often also performing best in MDD and AVG_MDD.

Comparison between CFHH and simple random hyper-heuristic (SRHH)
Here we will make a direct comparison between a simple random hyper-heuristic (SRHH), which makes a uniform random choice of low-level heuristic at each step, and the adaptive choicefunction-based hyper-heuristic (CFHH). Both SRHH and CFHH start with a solution produced with FTF following the results presented in the previous subsection. The results (best and average over 10 runs) are given in Table 3 for all 12 instances. This table shows the three distance-based measures as before (Total_D, MDD and AVG_MDD). Each of these measurements is followed by a column showing the percentage of the improvement to the corresponding measurement compared to the initial solution constructed by FTF, shown earlier in Table 1 . In the case that this percentage value is negative, the solution quality by this metric is worse than the initial solution. The last row of each set of results represents the average percentage of the improvement achieved by SRHH and CFHH for each measurement over all instances. From Table 3 we can see that both SRHH and CFHH improved the initial starting solution in terms of Total_D for all instances. CFHH improves in all three measures on average over the 12 instances. This is likely to be due to the rationale behind the proposed low-level heuristics, Domino, Pair and Join, which minimise the maximum distance between two tasks in a sub-region, subsequently minimising the overall distance of a solution by reassigning outlying tasks to a better sub-region. These heuristics help intensify the search space by focusing only on minimising total distance, in order to provide a better solution. The Interchange heuristic, which tends to both minimise the total distance and maintain the balance of the allocation of tasks, attempts to intensify the search space in the same way as the previous three, despite the fact that it does not affect the balance of the solution. The Balancing heuristic only takes the balancing of subregions into account. The effect of this heuristic is to diversify the search space, in order to avoid getting trapped in a local optimum. However, there is also the possibility of exploiting the search space if it leads to a solution with less total cost compared to the previous solution. The obtained results indicate that although the effects of these methods are very dependent on when and how long they are applied to a solution in the framework, they have still been designed to be able to explore different areas of the search space effectively.
The only exception is that SRHH could not improve the MDD measurement across the average of all instances ( −1 . 74 and −3 . 64 for the best and average results). This is likely to be due to the lack of learning mechanism to guide this hyper-heuristic, leading to an imbalance between intensification and diversification when traversing the search space. Despite this, the overall improvement yielded on all instances on Total_D and the AVG_MDD measurement of the corresponding instances is an indicator of an improvement in the solution compared to the quality of the initial solution.
Comparing the best values obtained over all 12 instances, CFHH yielded approximately 12.75%, 14.35% and 2.04% improvement for Total_D, MDD, and AVG_MDD respectively, while SRHH improved by 5.32% and 7.53% but only on Total_D and AVG_MDD, a deterioration in quality is observed on average in terms of MDD. In the case of the average values obtained, CFHH achieved roughly 10.50 on both Total_D and AVG_MDD and 2% in MDD, while SRHH improved the initial solutions by approximately 3.1% on Total_D and AVG_MDD out of the three measurements.
Since we use the same low-level heuristics in both frameworks, the difference in performance of CFHH compared to SRHH is likely Fig. 5. Compactness of solutions generated by FTF, and following improvement by CFHH and SRHH. to be due to the self-adaptive nature of the hyper-heuristic, appropriately controlling the amount of exploitation/exploration by adjusting parameters α, β and γ in every iteration. Meanwhile, in SRHH, choosing the low-level heuristic randomly may lead the solution to the area of the search space where it is difficult to move quickly to another area. For instance, applying the low-level heuristics which only pay attention to minimising distance and not workload balancing, such as Domino, Pair or even Join, might lead the space to an area with very high quality in terms of overall total distance and maximum distance but very low quality in relation to balancing. In this situation, moving the solution space back to a space resulting in a balanced solution might cause a penalty in terms of the objective function value.

Compactness validation
As mentioned earlier, the framework presented in this paper is used to partition the maintenance tasks within the Danish railway system, allocating a set of maintenance tasks to a set of maintenance crew members. This phase takes place before maintenance planning in the ERTMS signalling system. In this way, the system attempts to ensure that no distant tasks are assigned to any crew member in the scheduling phase. In any scheduling problem, the main objective is to minimise total cost (i.e. a weighted function of the number of routes and their length) and to ensure that all tasks are completed. Therefore, the density of the tasks in each sub-region can affect the length of routes and subsequently the total cost in the scheduling phase.
To calculate the cohesion of the sub-regions, in addition to results found in other problem-specific measurements, we calculate the validity factor of compactness, which is a well-known measurement in the literature ( Tan et al., 2013 ). Compactness is a validation factor employed to measure the cohesion of objects in a cluster by mean normalised variance and indicates how well data points are clustered in terms of object homogeneity. In other words, this index is formulated to decide whether or not a given subset is internally dense. Essentially, the higher this value, the lower average cohesion of the cluster: where C is the compactness value for the clusters that need to be minimised, K is the number of the clusters, N is the number of tasks, P is the partition matrix and P i,k specifies if task X i is in cluster k. μ k is the centre of cluster k . Fig. 5 presents the comparative results of the compactness measurement of the initial solution obtained using FTF, and after applying CFHH and SRHH as above. The compactness of the solutions obtained by SRHH and CFHH is shown as a ratio of their compactness measurement to the compactness measurement of the initial clustering result (FTF). As a lower compactness measurement indicates more dense clusters, it is evident that CFHH generates sub-regions that are much more compact than SRHH and the initial solution generated using FTF. It is also notable that CFHH improves approximately 31% on the compactness of the initial solution, while SRHH improves 9.30% of the measurement, respectively, on average across all instances. One anomaly is the performance of SRHH on the R100 instance, where it cannot improve the compactness of the initial solution, obtaining a compactness factor that is approximately 2% worse. However, this outcome is not unanticipated, as SRHH generated the worst result for R100 in terms of the average maximum distance ( −8 . 24% ) in Table 3 , as highlighted earlier.

Detailed low-level heuristic performance
To assess the impact of different low-level heuristics during a run, Table 4 gives the number of calls of each low-level heuristic by CFHH, during the first 100 and last 100 iterations, for the run where the best solution for each of the largest instances was found (E50 0 0, M50 0 0 and R50 0 0).
From the number of calls during the first 100 iterations, it is clear that in the early stages of the search, different low-level heuristics are selected more frequently than in the last 100 iterations of the search. It is interesting that during the first 100 iterations, Domino is selected most often (83, 60 and 50) and Balancing (2, 3 and 1) is selected least often for all three instances. This indicates that the hyper-heuristic recognises the low-level heuristics which intensify and diversify in terms of minimising distanceeven in the early stages of the search. Applying the Domino heuristic, which only causes an improvement to total distance, is still an indicator of greedy behaviour in the framework at this point in time. Interestingly the Pair heuristic is selected far more often for the Random instance than the Exact instance, indicating that different low-level heuristics are more or less effective depending on the type of instance being solved. This provides some justification for using a hyper-heuristic approach, mixing multiple lowlevel heuristics as appropriate during a particular search.
From the last 100 calls, it is noticeable that the spread of calls over the low-level heuristics is more evenly distributed as the search progresses. This suggests that there is less improvement towards the end of the search. If no improvement is found for a large number of iterations then the only component that will contribute towards the choice function score is f 3 (time since last called). As such, the choice function will behave more like a simple random hyper-heuristic when fewer improvements are made.
In Table 5 , we show the proportion of calls to each heuristic over the full run of the same examples as above, with the relative rank of each low-level heuristic given in brackets. Note that these percentages have been rounded to 1 decimal place, and as a result may not all add up to exactly 100%.
From the overall ratio of calls we see that, in general, across the three instances, the Join and Interchange heuristics appear among the top two heuristics, whereas the Balancing heuristic is always selected the least often. Join and Interchange explore the solution Table 5 Percentage of calls (rounded to 1 d.p.) and relative rank of lowlevel heuristics selected by CFHH on large instances.

Heuristic
E50 0 0 M50 0 0 R50 0 0 Call % (rank) Call % (rank) Call % (rank) Fig. 6. Trend of improvement of Total_D over a sample run of CFHH on instance E50 0 0. space in slightly different ways compared to the other low-level heuristics. Join is the only low-level heuristic that tries to minimise total distance, not by dealing with outliers but by joining close tasks from different sub-regions. There may be many close tasks which belong to different sub-regions, which can be joined to the same sub-region to improve the total distance in different ways. This is particularly important when the hyper-heuristic cannot improve the solution by only dealing with outliers, whether the best assignment is the current sub-region or the solution space gets stuck in a local optima. Interchange is designed in a way that not only improves the solution without being limited to dealing with the outliers, but which also takes care of balancing between sub-regions. The rank of the Balancing heuristic is perhaps not a surprise, as it does not attempt to minimise the total distance directly. However, the number of calls of this heuristic shows that the parameter γ has been appropriately controlled to explore the search space by calling the Balancing heuristic during the search despite potential poor performance in objective function terms.

Trend of solution improvement during a run using CFHH
Figs. 6 and 7 show the trend of improvement for three different measures, using the run in which the best solution for instance E50 0 0 was found by CFHH. The y-axis in Fig. 6 is the total cost of driving distance (Total_D). In Fig. 7 (b), it is the maximum distance of a crew to a task (MDD -red plot) and the average of the maximum distance obtained by all of the crew over the iterations (AVG_MDD -green plot). Only the trend of one instance is investigated, because the heuristics selected by CFHH show almost the same trend in all large instances in the previous subsection.
It is evident that CFHH shows an overall trend of improvement, in terms of minimising total distance throughout the run. In early iterations, it seems that CFHH improves the initial solution quickly. However, the best solution fluctuated between 10 0 0 and 40 0 0 it- Fig. 7. Trend of improvement of MDD (red) and AVG_MDD (green) over a sample run of CFHH on instance E50 0 0. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) erations. One possible explanation might be due to punishment of the Balancing heuristic after each call, since whenever it is applied, it incurs a bad penalty in terms of total distance. This could be mitigated by somehow considering the balancing of the solution as an objective, instead of calculating only the penalty of an increase in total distance. In this way, Balancing could be called more often and consequently lead to less fluctuation in solution quality compared to the current trend. It is notable that the performance stabilises after approximately half of the iterations pass. Similarly, the average of the maximum distance (AVG_MDD) in Fig. 7 (green plot) shows the same trend with a significant drop in early iterations, followed by a fluctuation and finally remaining steady with marginal changes in the latter stages.
In contrast to Total_D and AVG_MDD, the maximum distance (MDD) plot (red plot in Fig. 7 ) fluctuates more in the second half of the search than in the early stages, indicating that the low-level heuristics can be combined in order to improve all of the embedded factors (minimising total distance, minimising maximum distance and balancing the sub-regions) over time, with the hyper-heuristic adapting appropriately through the parameters α, β and γ .

Conclusions
In this study, we have proposed a perturbative hyper-heuristic framework using choice function heuristic selection, which improves the allocation of maintenance tasks to a set of crew members in the Danish Railway system. Our framework generates a set of sub-regions of maintenance tasks, with each sub-region representing the working area of a single crew member. It is desirable to minimise the distance between any two tasks in each sub-region, in order to ensure a fast response in the case of recovery failure. Using the concept of outliers (i.e. those tasks which are a long distance from the starting location of each crew member), tasks are reassigned to different sub-regions using one of five low-level heuristics, with the intention of reducing the maximum distance between two tasks within the same sub-region.
An adaptive choice function hyper-heuristic has been developed to search the space of low-level heuristics. Once an appropriate allocation of maintenance tasks have been decided, the sub-regions can be passed on to a routing algorithm to decide the individual routes each crew member should take. Our results show that, higher quality initial solutions do not always lead to higher quality solutions following improvement by the hyper-heuristic. Using initial solutions which are slightly lower quality does not restrict the search to particular regions of the search space, allowing hyper-heuristics to traverse the search space with more flexibility. An adaptive choice function (CFHH) was shown to be able to adaptively learn which heuristics to apply at a given stage of the search, balancing intensification and diversification within the search, outperforming simple random search (SRHH). The results obtained using CFHH were demonstrated to have a high degree of cohesion, in terms of the compactness ratio. This is a desirable property in preparation for the subsequent routing phase. Future work will seek to link the clustering phase addressed in this paper to the scheduling phase, where the sub-regions defined are used to schedule and route individual crew members.