Cockpit crew pairing Pareto optimisation in a budget airline

Crew pairing is the primary cost checkpoint in airline crew scheduling. Because the crew cost comes second after the fuel cost, a substantial cost saving can be gained from effective crew pairing. In this paper, the cockpit crew pairing problem (CCPP) of a budget airline was studied. Unlike the conventional CCPP that focuses solely on the cost component, many more objectives deemed to be no less important than cost minimisation were also taken into consideration. The adaptive non-dominated sorting differential algorithm III (ANSDE III) was proposed to optimise the CCPP against many objectives simultaneously. The performance of ANSDE III was compared against the NSGA III, MOEA/D, and MODE algorithms under several Pareto optimal measurements, where ANSDE III outperformed the others in every metric.


Introduction
Airline scheduling is one of the most challenging planning processes encountered by the airline industry. Due to their complexities, four main aspects of airline scheduling are often sequentially optimised, these being the flight schedule, fleet assignment, aircraft routing and crew scheduling problems (Gopalakrishnan & Johnson, 2005). The result of the preceding problem becomes the given input for the succeeding problem. To enhance their competitive advantages, airlines must optimise the crew scheduling since it is under their control and the total crew costs (e.g. benefits, salary and top-up payment) are the second biggest operating expenditure for airlines after fuel costs. Substantial annual returns can be earned from even a marginal saving percentage from a well-pondered airline scheduling management (Anbil et al., 1991). Airline crew scheduling consists of the two sub-problems of crew pairing and crew rostering, which are normally solved successively. The former determines the set of flight sequences with a minimum cost to encompass a predefined set of flight legs, as prescribed in the timetable of the airline for a given period while abiding by the diverse and complex legal restrictions enacted by the federation's safety regulations and the company's agreements. Under this context, a pairing means a crew itinerary or a tour of duty to be followed by a crew member. The latter pertains to the assignment of named crew members to the optimised crew pairings obtained from the previous step. The assignment priority often depends on senioritis and crew preferences. This paper focuses on the former problem. Crew members in an airline are classified as cockpit and cabin crews (Graves et al., 1993). Cockpit crews (e.g. captain, first officer, etc.) are licensed to fly only a specific aircraft type, depending on their current licenses and certifications. Moreover, different categories of cockpit crews may be controlled by dissimilar permissible restrictions. In contrast, cabin crews (e.g. steward and stewardess) may provide services on more than one type of aircraft since their licenses are less restrictive. In this paper, we confined our research scope to the cockpit crew, hereafter called the cockpit crew pairing problem (CCPP), since airlines pay substantially higher wages and fringe benefits to the cockpit crew than to the cabin crew. Due to the restrictions in the pilot's license, the CCPP is optimised only for an individual fleet type. Besides the economic considerations, the optimised CCPP would bring about increasing cockpit crew utilisation and reducing wastage time due to other reasons apart from flights (Zeren & Özkol, 2016).
Although a crew pairing problem (CPP) is not new, the following reasons render its continued challenge to academia and practitioners. A tremendous number of pairing combinations are possible for most airline companies. Multifaceted and nonlinear functions have to be used to formulate crew costs with a minimum guaranteed pay. Numerous regulations and practical restrictions (e.g. international safety regulations, work rules, labour laws, and airline-specific rules) must be obeyed for feasible pairings. Moreover, nowadays, more and more people are turning to travel by plane causing the air-travel demand (problem size) to increase exponentially to match the rapid expansions of flight routes and frequencies. These problems are acknowledged by all airlines as making a manual generation of crew pairings an ineffective approach. Operations research techniques have been used to solve the CPP since 1950 (Barnhart & Talluri, 1997). Set covering and set partitioning methods have often been exercised in the study of the CPP. Since the CPP falls in an NP-hard class of combinatorial optimization problems, for real-life problems, these techniques are unable to effectively optimise such an intractable problem in reasonable computation time (Deveci & Demirel, 2018). To help find a good solution, many metaheuristics have been developed recently. Most research has focussed only on the single objective of minimising the crew pairing cost (CPC). However, other objectives that are no less important than the cost-related issue, such as workload balancing, should also be taken into account when forming crew pairings. Hence, the objective of this paper was to propose a new approach to solve the multi-objective (MO)CCPP.
The structure of this paper is formatted as follows. The survey of the CPP literature is presented in Section 2. Section 3 explains the background of the CPP in a budget airline case study in Thailand, and then the methodology to optimise the MOCCPP is proposed in Section 4. The experimental cases and benchmarked algorithms are described in Section 5, while the experimental results and analysis are discussed in Section 6. Finally, Section 7 summarises the conclusion of this study.

Literature survey
A crew pairing consists of a set of flight legs within the same fleet, operated by the same crew and organised in a sequential order, where the same crew base is destined as the starting and terminating points. During the last 30 decades, the CPP has continued to be challenging due to the size and complexity of the problem. A recent survey of the CPP that discussed the models and solution has been presented (Gopalakrishnan & Johnson, 2005). Lavoie et al. (1988) developed a new method to solve the CPP with two steps. First, the set covering problem represented by a 0-1 integer linear program was used to formulate the CPP with the aim of minimising the total CPC. Then, the continuous relaxation of the problem was solved by the column generation (CG) method. This approach was then applied to real long-and medium-haul networks of Air France, with 4 to 5% average cost savings being found over the best manual computations exercised by experts. Barnhart et al. (1995) proposed the deadhead selection heuristic to decrease the extended rest periods of crews leading to an overall cost reduction in longhaul international CPP. The profiles of the arrival and departure times at each station, which were used in successively pricing out potential deadhead flights, were the dual solutions obtained from solving the linear programming relaxation. A substantial reduction in the CPC (around $5 million yearly) was revealed from applying the proposed mechanism. Desaulniers et al. (1997) used an integer, nonlinear multi-commodity network flow model with supplemented resource variables to formulate the CPP to minimise the CPCs. Integer solutions of the problem were derived from the branch-and-bound (B&B) algorithm in which the concept of the Dantzig-Wolfe decomposition technique was applied. The proposed method was then tested on the medium-haul networks of Air France and was found to be much more effective than the expert system. Anbil et al. (1998) applied the sprint method to solve linear programming relaxations and a branch on follow-on tactic in the B&B approach to significantly shorten the solution time in reaching approximate solutions of the CPP. The data from Southwest Airlines and US Airways were then used to test the proposed approach, where the CPU times were reduced dramatically compared with the dual simplex algorithm. Klabjan et al. (2001)] proposed an algorithm for the CPP that was composed of two phases. The linear programming relaxation was solved to obtain quasi optimality in the first phase. Then an integer solution with the greatest improved cost was heuristically attained in the second phase. Overall, the algorithm was claimed to perform better than the current practice. AhmadBeygi et al. (2009) applied a network flow concept in formulating a mixedinteger programming model for the CPP. The marker and connection variables were employed to model the non-linear cost function and complicated restrictions about crew-to-flight assignments. Under such an approach, the 143-and 329-flight networks operated by a major US carrier could be solved by commercial solvers. Liu et al. (2009) created an opportunity to explore more optimal solutions of the CPP by applying the permutation-based mode in the formulation stage. The problem was then solved by the inequalities-based MO genetic algorithm (GA). Several objectives were formalized through mathematical equations. The effectiveness of the proposed method was tested with real data provided by a Taiwanese airline. Souai and Teghem (2009) proposed an integrated model to solve both airline crew pairing and rostering sub-problems simultaneously. A hybrid GA was applied to optimise the bi-objective problem hierarchically, which was to minimise both the total cost and the deviation between the real and average flying times of any crew member. Three real cases from airlines were used to test the performance of the proposed algorithm. Zeren and Özkol (2012) applied a GA with a perturbation genetic operator (GO) to solve the CPP with multiple crew bases and deadhead flights. The test data were obtained from Turkish airlines. The proposed algorithm was then compared with another four approaches and was found to be effective with a fast convergence time and could reduce deadhead flights. Reisi and Moslehi (2013) addressed the daily CCPP, in which deadheading was not permitted. Based on the shortest path with resource constraint, two approaches were developed to solve the CG sub-problem, namely the SPRCD and SPRCF. The data from two major Iranian airlines were employed to test the performance of the proposed approaches against an existing method (MKA). The solution times of both the SPRCD and SPRCF approaches were substantially lower than MKA. Aydemir-Karadag et al. (2013) developed two algorithms for solving the daily CCPP, i.e. a knowledge-based random algorithm (KBRA) and a hybrid algorithm (HA). These two algorithms were compared with the CG technique using hypothetical problems with two home bases. The average crew schedule costs produced by the HA were the same or close to those from the CG approach, and they were better than the KBRA. However, the CPU times of the HA were substantially shorter than the CG. Zeren and Özkol (2016) proposed a new strategy to solve the CCPP that consisted of new CG, paring enumeration, and pricing network methods. The proposed strategy was evaluated with the data from Turkish airlines. Besides obtaining comparative results with the current practice, it successfully reduced the number of deadheads and international overnights (Zeren & Özkol, 2016). Demirel and Deveci (2017) developed a dynamic-based GA for medium-scaled CPP. The deadhead minimising search and partial solution search approaches were also proposed to effectively improve the quality pairings. The proposed approach could find competitive results compared with the current method. Deveci and Demirel (2018) developed a memetic algorithm (MA) with hill climbing and local optimization mechanisms and two variations of the GA to solve the monthly CPP of a Turkish domestic airline. The proposed MA obtained lower CPCs compared with two GAs. Aggarwal et al. (2019) presented a customised GA in which the initialisation mechanism and GOs were improved to increase the exploration of search space. The proposed approach targeted optimising the CPC and reducing deadheads and overnight rests of the medium-scaled US airline networks. The performance gap between the proposed algorithm and the CG optimiser was observed to be around 11%.
From the literature survey, the CPP was shown to be a complex problem due to the non-linear CPC function. Most papers have attempted to optimise only a single objective function. Since the CCP is NP-hard, conventional analytical approaches have been unable to optimise practical sized problems. As a result, various heuristics based on mathematical relaxations and evolutionary algorithms (EAs), such as GA, were proposed to find a good and acceptable solution. However, apart from the CPP, which is the most addressed issue, other objectives also play a significant role in the airline industry resulting in MO optimisation of the CPP being a more sensible model. Moreover, many novel MOEAs have emerged recently, which enable finding the Pareto optimal solutions for the MOCCPP effectively and being something that has more practical value.

Problem definition
The case study airline is a budget carrier in Thailand that was registered in the securities exchange of Thailand nearly a decade ago. Although the airline is a public company limited, in which many public and private investors are joint ventures, its major shareholder is the biggest national airline enterprise. The mission of the airline is to be the first choice among budget national airlines by paying the greatest endeavours to understand the passengers' demands and needs as well as delivering beyond expectation services to them. The airline offers nearly 30 flight routes, most of which are operated domestically. Nearly 700 roundtrip flights are scheduled weekly using Boeing 737-800, ATR72-500, and Bombardier Q400 aircraft. The main operations base is located at Don Mueang international airport. For a better understanding in the context of airlines, some terminologies often used in addressing the CCPP need to be articulated before gearing towards the remaining discussion (see Fig. 1 for pictorial expressions of these terminologies).
Flight leg (segment): A one-way nonstop flight of the aircraft consisted of take-off at the departure airport, flying to the destination, and landing at the destination airport.
Block hour: The interval from the instance that the doors of the aircraft close at the departure airport until the instance that the doors of the aircraft open at the arrival gate of the destination airport after its landing.
Duty period: A sequence of flight legs with short rest periods (sit times) interposed between them. In addition, brief (preparation) and debrief (end) periods are placed before the first flight leg and after the last flight leg of the duty period, respectively.
Sit time: The interval between two successive connected flight legs in a duty period.
Brief: The interval before commencing the first flight leg of the duty to prepare crews for the duty.
Debrief: The interval after completing the last flight leg of a duty period to prepare the next flight crews.
Crew base: The airport where crews are stationed and also the location where the duty period begins and ends.
Rest period: The interval between two successive duty periods (sometimes called overnight or layover rest, or overnight or layover connection). Within the rest period, crews are free from all sorts of duties.
Pairing: A sequence of one or more duty periods to be flown by a crew member, starting and ending at the same crew base, often with one or more rest periods interposed between them.
Flying time: The total hours of actual flying time.

Time away from base (TAFB):
The elapsed time of a crew from departure to return to the same crew base, which includes the overnight rests between duty periods in a pairing, i.e. the total duration of a pairing.

Fig. 1. A crew pairing example
A legal crew pairing must not violate local and international safety regulations and must comply with the company's agreed operational rules. The rules and regulations of the case-study budget airline stemming from the civil aviation authority of Thailand (CAAT), international civil aviation organisation (ICAO), and the airline's labour unions, are listed as follows. • The CCPP in this study is prepared weekly since the flight schedule in the case-study airline is recurrent every week.

•
Only one crew base, located at Don Muang international airport, is in operation.

•
The policy of the case-study airline permits minimal deadheading to save crew costs.

•
In a duty period, any two successive flight legs should be organised in such a way that the destination of the previous flight is the same as the departure of its direct consecutive flight.

•
The brief and debrief times take 60 and 30 minutes, respectively. • At most six flight legs could be assigned in a crew pairing.

•
At least 30 minutes of the sitting time must be available between two consecutive flights.

•
The maximum duration of a duty period each day depends on the duty start time and the number of flights in the duty period, as shown in Table 1. • The accumulated duration of the duty periods responsible by each cockpit crew must not exceed the following values, i.e. ≤ 34 h within every 7 consecutive days, ≤ 110 h within every 28 consecutive days, and ≤ 1000 h within every 365 consecutive days. • The minimum rest period given to a cockpit crew after completing each of his/her duty depends on the duty flight time, as shown in Table 2. Due to the cost-saving strategy of the budget airline, aircrew scheduling is conducted manually by an experienced planner using Excel. The objective of the CCPP is to determine a set of crew pairings with minimum costs that encompass all flight legs laid down in the timetable of the airline as well as satisfying all of the airline's restrictions. The input data of the CCPP is composed of a set of flight legs defined in the airline's timetable, fleet routing, costs of pairings, and legitimacy constraints. However, the identities, ranks, availability and combinations of different crew members are not considered in the CCPP. The following is the manual planning procedure of the case-study airline to realise appropriate cockpit crew pairings (CCPs).
• Data preparation: The flight legs, flight number, departure and destination airports, regions to fly of the country, flying distance (in nautical miles), departure and arrival times, block hours, and daily flight patterns in each week are updated and prepared by the planner. • Determination of the CCP: The steps to manually plan CCPs with Excel are as follows.
a) Sort all flights in ascending order of their departure times and initialise the pairing number. The first pairing number is O1, followed by O2, O3, …, etc. b) The first flight in any pairing must depart from Don Muang international airport (crew base). c) Check if the flight list is empty or not. If yes, the CCPP process is ended; otherwise, go to d). d) Select the flight (by strictly considering the regulations of CAAT and ICAO) with the earliest departure time from the list of flights whose airport of the next outbound flight matches the airport of the previous inbound flight. If feasible, add it into the current pairing, and update the list (by deleting the selected flight from the list); otherwise, go to step f). e) Check if the number of flight legs in the current pairing is more than six or not. If yes, stop adding more flights into the current pairing and go to step f); otherwise, go to step c). f) Increase the pairing number by 1 and go to step b).
In this paper, besides the CPC, which is broadly used in literature, other objectives deemed by the airline executives as equally important were also considered and optimised simultaneously. Let be the pairing number, so ∈ (total number of pairings). The cost structure of the pairing , denoted as , expressed in units of time, is computed as shown in Eqs. (1) and (2).
where is the duty period number in the pairing ; is the number of duty periods in the pairing ; _ℎ is the minimum number of hours of the guaranteed pay for each pairing; is the number of hours in the duty period ; is the time away from the crew base of the paring ; _ℎ is the minimum number of hours of the guaranteed pay for the duty period ; is the flying hours of the duty period , is the elapsed time of the duty period ; and 0 ≤ and ≤ 1 are the fractions of the elapsed times of the pairing and duty periods, respectively. The values of and can be varied according to the agreement of the airline. Note that we can simply convert the CPC (h) to any currency by multiplying with a given currency factor. The objective to minimise the CPC is formulated as shown in Eq. (3).
The second objective is concerned with balancing the workload (i.e. TAFB) among different parings. It is desirable that the TAFB ( ) is uniformly distributed across different pairings. Let be the average TAFB of the CCP . Minimising the mean absolute deviation of the TAFBs among different pairings is then formulated as shown in Eq. (4).
The third objective attempts to avoid adding repeated flight legs with the same start and end destinations in each crew paring. Apart from reducing the boredom of flying repeatedly on the same route, the cockpit crew can gain more experience and become familiar with diverse flight routes by utilising this objective. Let be the number of repeated flight legs with the same start and end destinations in the crew paring . Minimising the repeated flight legs is then given by Eq. (5).
The fourth objective attempts to balance the workload among different crew pairings, as measured in the form of nautical miles accumulated from the set of flights in the pairings. Let and be the total and average nautical miles of the pairing , respectively. Minimising the mean absolute deviation of the nautical miles among different pairings is then given by Eq. (6).
The last objective attempted to minimise the number of crew pairings. The total number of CCPs ( ) can be used to measure the minimum number of cockpit crews needed to be prepared by the airline (i.e. crew member capacity). A lower value of means a higher utilisation of cockpit crews, resulting in the worthiness of the wages and benefits paid by the airline. This is formulated as shown in Eq. (7).

ANSDE III
The use of EAs has become an outstanding de facto approach nowadays in successfully dealing with complex real-world applications to achieve single-and multi-objectives. However, when the number of objectives is higher than three, traditional EAs are an ineffective means to find the Pareto front (PF) in which diverse non-dominated solutions (NDSs) are uniformly distributed and well-converged in high-dimensional space (2007). The reason is that the recombination and selection mechanisms of traditional EAs to guide the appropriate trajectory to the population to gear towards the true PF substantially lose their effectiveness since most of the offspring solutions are NDSs. To alleviate the above issues, EAs developed for optimising more than three objectives simultaneously, which emphasize the improvement of the diversity preservation mechanism rather than the Pareto domination-based principle in searching for NDSs, seems to be a more suitable tool. In this paper, two EAs (i.e. NSGA III and MOEA/D) (Aydemir-Karadag et al., 2013) and an EA which is effective in the offspring generation mechanism, i.e. the differential evolution (DE) algorithm, were integrated to induce the strength of each algorithm to form a single HA, namely the adaptive non-dominated sorting differential evolution III (ANSDE III) algorithm. The remarkable feature of NSGA III is its ability to search for NDSs that are evenly distributed. Meanwhile, MOEA/D and DE are capable of guiding the algorithms to converge towards good NDSs. The structure of ANSDE III is as follows.
A reference-point based NSGA II (namely NSGA III), the latest version of the non-dominated sorting GA (NSGA) collection developed by Deb and Jain (2014), adopts the diversity preservation concept in its evolutionary process. Multiple prespecified reference points scattered all over the whole search space are used in NSGA III for diversity preservation and directing multiple searches toward the goal positions along each direction. Due to its effectiveness, NSGA III has been widely accepted in optimising real-world MO optimisation problems (MOOPs).
The general framework of NSGA III is explained as follows (Ibrahim et al., 2016). At generation , let be the parent population of size , be the offspring population of size generated from , be the selected population, and be the elite members.
1. Reference point construction: The reference points in NSGA III are used to warrant diversity in the offspring population.
The determination of these points has to be done before commencing the evolutional process of NSGA III. The simplexlattice design, one form of the mixture experimental design, is often adopted to systematically plot uniformly distributed normalised weight vectors on the hyperplane. If the problem has objectives, as well as separations with an equal space along with each objective and one intercept on each axis is required, we can calculate the entire number of reference points ( ) by = . 2. Initialisation: The initial population ( ) of size is randomly created from the uniform distribution. Set = φ, = φ, and = 1. 3. Offspring generation: Apply selected GOs to manipulate the crossover and mutation processes to create the offspring population ( ) from ∪ . 4. Population combination: Find the combined population ( ) of size 2 from = ∪ . 5. Non-dominated sorting: Remove all repeated solutions in . Apply a non-dominated sorting technique to rank into different levels of non-dominated fronts, i.e. , , … , . 6. Population selection: Add non-dominate fronts of to be the members of in the new population , front by front, starting from , then , , … until ‖ ‖ ≥ . Note ‖ ‖ means the number of members in the set.
a. If the total members of is equal to (i.e. ‖ ‖ = ), set = , and go to Step 7). b. If ‖ ‖ > , do the following. i. Find the first level ( ) that causes ‖ + + ⋯ + ‖ > . All population members from to are contained within the members of . The remaining members ( ) of are selected from . ii. Find ) which is the normalised objective of each member ∈ for each objective ( = 1,2, … , ) using where is the minimum value of the objective in , and is the nadir point computed by = { )}.
iii. Associate each population member ∈ with a reference point by selecting the reference point of which its reference line is nearest to the population member ∈ as the associate reference point. iv. Find niche count for each reference point by counting the number of its associate population members ∈ , but not including the population members in . v. Identify the reference point that has the lowest value of . If more than one reference point is eligible, choose any one of them at random. vi. If = 0 and vii.1.
None of the members in is associated with the reference point , then no longer consider this reference point in this generation. vii.2.
Only one member in is associated with the reference point , then add that member in and increase the value of by one. vii.3.
More than one member in is eligible, then select the one with the nearest perpendicular distance from the reference line, add that member in , and increase the value of by one. vii. If > 0 and vii.1.
Only one member in is associated with the reference point , then add that member in and increase the value of by one. vii.2.
More than one selectable member is available, then apply random selection, add that member in , and increase the value of by one. viii. Repeat steps vi-vii for times (i.e. the remaining members of are filled up). 7. Update the elite members in , set = φ and = + 1. 8. Repeat Steps 3 to 6 until the termination condition is reached.
The aforementioned steps in NSGA III are adopted as the main structure of ANSDE III. It appears that the offspring generation process of NSGA III uses GOs in the crossover and mutation executions to create high quality and diverse solutions. However, Chandrasekar and Ramana (2012) reported that the offspring generation mechanism of the DE algorithm was more effective than GA in searching for a global optimum. Moreover, Li and Zhang (2009) showed that in MOOPs, the replacement of the offspring generation process of GA by DE in the basic structure of the MOEA could enhance its performance substantially. Subsequently, the concept as such is implemented in ANSDE III and hence the offspring generation illustrated in Step 3 of NSGA III is replaced with the mechanism of DE. The use of DE was first pioneered by Storn and Price (1997) to optimise continuous complex optimisation problems and is an instance of EAs renowned for effectively solving various practical engineering problems and being simple to implement. Similar to GA, DE uses crossover and mutation mechanisms to create offspring. However, the reversed order of these two operators is applied in DE, i.e. first mutation followed by crossover. Moreover, a real number is used in DE to represent each gene in the solution's chromosome, instead of binary (0, 1) or positive integers as in GA. Although various mutation and crossover techniques have been proposed to create tentative offspring in DE (Price et al., 2006), the DE/rand/1/bin strategy is used in ANSDE III due to its effectiveness, yet uncomplicatedness. The working process of this strategy is explained as follows. First, three solutions in the current population are chosen at random as parents. One of them is randomly assigned as a target vector ( ), while the remainders ( and ) are used for calculating the differential discrepancy. A mutant vector ( ) is created from the mutation by adjusting the target vector with the weighted differential discrepancy, as shown in Eq. (8).
where ∈ [0,1] is the weighted factor. The crossover starts immediately after the mutation is completed. During the crossover, the mutant vector ( ) and the target vector ( ) are combined at the randomly selected crossover positions of the solution to form the trial vector . The trial vector at position , namely , is created as shown in Eq. (9).
where is the crossover probability. If the trial vector and the target vector are identical after the crossover process is completed, one position in the trial vector is randomly selected and the value of is replaced by . In addition, we will replace with if the trial solution dominates the target solution. The mating restriction and neighbouring solution updating mechanisms of the MOEA based on decomposition (MOEA/D) proposed by Zhang and Li (2007) are also utilised in ANSGA III to increase its convergence speed. For a reference point , its associated neighbouring reference points (sized ) comprise reference points having the shortest Euclidean distances from . As a result, this process is added into Step 1 of the original NSGA III and renamed as "reference point and neighbourhood construction". In Step 3 of the algorithm, the crossover process is limited to its neighbours only. After completing the first iteration of the algorithm, each reference point will have some population solution associated with it. Once the first solution (target vector) is randomly chosen from , the remaining solutions that will join the crossover process are the associated solutions of the neighbouring reference points of the reference point associated with the first solution. Since the solutions involved in the crossover process come from the current best neighbouring solutions of the first selected solution, we hoped that a good offspring would be created. The following steps proceed after the offspring solution is created. If the target solution dominates the offspring solution, the offspring solution is discarded from further consideration. If the offspring dominates the target solution, the target solution is replaced with the offspring and a reference point for the offspring solution is found. If the offspring solution and the target solution are non-dominated with each other, both of them are kept and a reference point to the offspring solution is assigned. To further enhance the global search of the algorithm, an adaptive mechanism is also embedded in ANSDE III. The status of the elite members in the external memory evolved over time is monitored to identify if the parameters of the algorithm need to be adjusted. In every iterations, the alteration of the elite members is probed. Note that the value of is set to 10 in this research. Three main parameters of the algorithm that could pull the current solutions off from the local optima are the weighted factor ( ), crossover probability ( ), and the neighbourhood size ( ). If the elite solutions remain unchanged for the past iterations, the values of , , and will be changed; otherwise, these parameters are kept at their original values. The value of affects the resemblance between the target solution and the mutant solution in that a higher value of could increase their discrepancies. The value of controls the similarity between the trial and target solutions. If the value of is low, the trial and target solutions will not be very different. In addition, the value of is used to control the selectable solution members to join the crossover process. A high value of allows the solution members that are quite distant from the target solution to participate in the crossover process. As a result, the values of , , and should be set at high if a high diversity in the solutions is required to be created (i.e. the elite solutions have not changed during the last generations). Otherwise, these parameters are maintained at low if some changes are detected in the elite members. This process is added at the end of Step 7 (i.e. parameter adaptation mechanism).

Solution encoding
The position-based priority scheme is applied to represent the solution string (chromosome) of the CCPP. The length of the string is equal to the number of flight legs in the problem. The positions from left to right in the string indicate the assignment priority of the flight legs (i.e. the highest priority is given to the flight leg that is assigned in the left-most position). For example, if eight flight legs are considered as shown in Table 3, a possible solution string can be A1, A2, A3, A4, A5, A6, A7, and A8.

Solution decoding
The output derived after decoding the solution string is the physical pairings. The first-fit rule was applied to construct pairings. Since the position-based priority scheme was implemented in the encoding process, we had to sweep from left to right of the solution string and the first flight leg that did not violate the regulations discussed in Section 3 was selected. The selected flight leg was assigned to the pairing and was scheduled in the flight timetable. The first-fit assignment process was then repeated until no more flight legs in the solution string were left. According to the solution string exemplified in the encoding process (i.e. A1, A2, A3, A4, A5, A6, A7, and A8), the pairings and their corresponding timetables are shown in Fig. 2 Solution string encoding

Experimental design
Three well-known algorithms developed for solving MOOPs were employed to compare their relative performances with ANSDE III. These were MOEA/D (Zhang and Li, 2007 ), NSGA III (Deb and Jain, 2014), and MODE (Xue et al., 2003). These algorithms were evaluated using the same testbeds through various instance problems, and the details of the testbeds and parameter settings for each algorithm are discussed as follows.

Instance problems
To conduct a rigorous performance test of ANSDE III, various problem instances were used as the testbeds. The instances ranged from small (S) to medium (M) and large (L), based on the number of flight legs. Some data, particularly in the large instances, were obtained from the flight schedule of the case-study airline during the low and high season for travelling in Thailand. Also, these data are based on the Boing 737-800 fleet due to its highest capacity and number of aircraft. Although the small and medium instances are hypothetical, these data were generated at random based on truncated real data from the case-study airline. All instances used Don Muang international airport as the crew base. The details of the testbeds are shown in Table 4.

Parameter setting
The algorithms that competed in finding non-dominated solutions (NDSs) for the MOCCPP comprised ANSDE III, NSGA III, MOEA/D, and MODE. To ensure fair competition, the key parameters, which are the drivers of each algorithm, were identified. The initial values of these parameters were obtained from the literature. The design of experiments was based on the factorial design concept (Montgomery, 2017) and was used to determine the best parameter settings suitable for each algorithm. As mentioned by Jiang et al. (2014) and Lu et al. (2019), the generational distance ( ), invert generational distance ( ) and (sometimes called the spacing metric ( ) or symbolised as △) were the three major metrics for assessing the relative efficacy of different Pareto fronts (PFs) obtained by multi-objective evolutionary algorithms (NB: the descriptions of these and other metrics used in this paper will be given in Section 5.3). We prioritised the first, then and last since a better PF should lay close toward or even overlap with the true-Pareto front (TPF) and also uniformly distributed. Given that the levels of each factor (parameter) to be tested for each algorithm were two. As a result, a 2 k factorial design was applied in determining suitable values of the algorithm's parameters. A hierarchical Pareto performance metric satisfaction approach was taken in this matter. First, the results from the analysis of variance (ANOVA) on the were considered. If a significant difference was noticeable on main effects or interactions (any -value ≤ 0.05), the parameters were then set to the treatment combination's values that minimise the . Otherwise (all -values > 0.05), move from the to and apply the same approach to the . The parameter settings would be based on the treatment combinations that result in the lowest if any statistically significant effect was revealed. If we still cannot observe any significant effect on the from ANOVA (all -values > 0.05), continue to apply this approach to the . After that, finally, if no significant effect was detected yet (all -values > 0.05), the parameters were then set to those parameter's levels that made the minimise. For example, suppose the parameters (i.e. and ) of MOEA/D for the small-sized problem (S2) needed to be established. Two levels of the factors and were experimented with, i.e. (10, 20) and (2, 5), respectively. First, the 2 2 factorial experiments were conducted to determine whether or not these settings in combination had any effects on the . The results of ANOVA revealed that the main effects and two-way interactions were not significant (all -values > 0.05) at a 95% confidence level ( = 0.05). Figure 3 (a) depicted the main effect and interaction plots for the . As a result, the next step was to conduct the experiments on the . The ANOVA's result summarised that the main effect of had a significant effect on the ( -value ≤ 0.05). Since the lowest value of the was desirable, the interaction plot (Figure 3  (b)) was observed to determine the values of the and parameters as such, i.e. = 10 and = 5. Table 5 summarised the appropriate parameters for each algorithm which were the results of applying the hierarchical Pareto performance metric satisfaction approach based on the 2 k factorial design analysis. The number of populations was 131 based on a simplex lattice design (Deb and Jain, 2014) and the number of generations to be evolved was 2,000 for all problems and algorithms. All algorithms were coded in MathLab R2019a under a personal computer equipped with Intel(R) Core™ i7-8550U CPU @ 1.8GHz 8.0 GB RAM, running under the Microsoft Windows 10 Pro 64-bit operating system.  Small S1 10 5 0.9 0.4 1 0.9 0.8, 1 0.7, 0.9 10 40 S2 10 5 0.6 0.1 1 0.9 0.8, 1 0.7, 0.9 10 60 S3 20 2 0.9 0.1 1 0.9 0.8, 1 0.7, 0.9 10 80 S4 10 2 0.6 0.1 1 0.9 0.8, 1 0.7, 0.9 10 100 Medium M1 10 5 0.6 0.1 1 0.9 0.8, 1 0.7, 0.9 10 120 M2 10 2 0.6 0.1 1 0.9 0.8, 1 0.7, 0.9 10 140 M3 10 2 0.6 0.4 1 0.9 0.8, 1 0.7, 0.9 10 152 M4 10 2 0.9 0.1 1 0.9 0.8, 1 0.7, 0.9 10 200 Large L1 10 2 0.9 0.1 1 0.9 0.8, 1 0.7, 0.9 10 282 L2 10 2 0.6 0.4 1 0.9 0.8, 1 0.7, 0.9 10 298 L3 10 2 0.6 0.4 1 0.9 0.8, 1 0.7, 0.9 10 320 L4 10 2 0.6 0.4 1 0.9 0.8, 1 0.7, 0.9 10 (Note: is the number of weight vectors in the neighbourhood, is the maximum number of solutions replaced by each offspring, is the probability of crossover, is the probability of mutation, is the scaling factor, is the crossover rate)

Performance measurements
Various Pareto-based metrics were recommended in the literature to compare the relative performance of multi-objective optimisation algorithms in the Pareto sense. Some of them were identical since they gauged the same things but named differently. Therefore, to demonstrate their linkages particularly with our previous research (Chutima and Kirdphoksap, 2019), notes on synonyms were added as appropriate. In this paper, the metrics to measure the performance of ANSDE III against NSGA III, MOEA/D, and MODE in optimising the MOCCPP include the generational distance ( ), Convergence ( ), inverted generational distance ( ), ratio of NDS located on the TPF based on its own PF ( ), quality metric ( ; sometimes named ratio of NDSs located on the TPF based on the TPF ( )), (sometimes symbolised △ or named spacing metric ( )), number of Pareto solutions ( ; sometimes named number of generated non-dominated solutions ( )), best frontier members ( ), mean ideal distance ( ), rate of achievement to the objectives simultaneously ( ), and diversification metric ( ). These metrics are widely accepted as the standard performance measurements of multi-objective EAs, especially and , in optimising MO problems (Jiang et al., 2014 andLu et al., 2019). The detailed formulations of these metrics are explained in (Chutima and Olarnviwatchai, 2018;Chutima and Kirdphoksap, 2019;Zade et al., 2014;Asefi et al., 2014). In this research, the TPF is approximated by merging all non-dominated solutions (NDSs) obtained from all algorithms, applying non-dominated sorting to all of them, and then selecting all NDSs on the first front as the TPF.

Results
To signify the statistical property of some input parameters used in the contested algorithms, which are based on random number generators, all algorithms were run with three replicates. The best values (minimum) of the objectives obtained ( ) from different algorithms were collected. Since some results are sensitive information of the case-study airline and also under the terms of the legal contractual agreement, the actual values cannot be disclosed to the public. However, their comparative data were allowed to present in a sense of proportion to each other just for academic purposes only. As a result, the data from ANSDE III were used as a base for the relative performance comparisons and so they were set to be equal to 1. The values of the objectives obtained from the other algorithms compared with those from ANSDE III were derived from dividing the actual values of those algorithms by the actual values of ANSDE III. From Table 6, it is clear that ANSDE III always achieved the best metric (i.e. for the best values of the objectives, the lower the better). Meanwhile, MOEA/D and NSGA III could also reach the best value in several cases. However, the number of cases that MODE gave the best value was very few compared with the other algorithms and was restricted to a few small-sized problems only. To evaluate the characteristics of the PFs obtained from different algorithms in comparison with the TPF, several metrics were employed in this study, i.e. , , , , (or ), (or ), (or ), , and . Liu et al. (2020) proposed four key categories to measure the efficacy of metaheuristics in optimising discrete multiobjective problems in their recent state-of-the-art review, i.e. (1) quantitative, (2) convergence-related, (3) distribution-related and (4) comprehensive-related metrics. Consequently, the metrics used in this research were chosen by at least one member from each category. The selected metrics were among those frequently used in literature and fallen into Liu et al. (2020)'s categories as follows: was the representative of the quantitative metric; , , , , and were the representatives of the convergence-related metric; and were the representatives of the distribution-related metric; was the representative of the comprehensive-related metric. A short description of each metric can be explained as follows. As a member of the quantitative measurement, the counts the number of NDSs located in the first front (i.e. Pareto front, PF) of the given algorithm. Regarding the convergent-related measurement, the and are similar in terms of denoting the closeness between the NDSs in the PF of the given algorithm and those in the TPF. However, these two metrics differ in that the uses the shortest distance between each NDS obtained from the given algorithm and the NDS in the TPF in its computation; whereas the uses the average distance of each NDS obtained from the given algorithm and those NDSs in the TPF. The and demonstrate the ratio between the number of the NDSs of the given algorithm that overlaps with the NDSs of the TPF and the number of its NDSs ( ), and the number of the NDSs of the TPF ( ), respectively. The and reveal the nearness of the obtained NDSs to the ideal points (i.e. best value of each objective). While the measures in terms of the root mean of different square values, the reports a mean absolute of different values. Under the distribution measurement, the indicates the dispersal of the NDSs on the PF of the given algorithm in a mean absolute different value. Meanwhile, the demonstrates the boundary between maximum and minimum values (range) of the NDSs in each objective obtained by the algorithm. Finally, the , which is a member of the comprehensive measurement, is capable of demonstrating both the diversity and convergence characteristics of the NDSs in the PF of the given algorithm. Note that the lower values of , , , , and are superior; whereas, the higher values of , and are desirable.  Table 7 shows the relative performances of different algorithms on small-, medium-and large-sized problems (i.e. S, M and L) under four key measurement categories. For the quantitative measurement category, in which the represented the group, the values obtained from ANSDE III were always better than the others. This indicated that the greater number of NDSs was created by ANSDE III in its PF. However, this metric could be deceptive since it would be meaningless if most of its NDSs are not located close to the TPF. Hence, this metric needs to be construed in conjunction with other more important measurements in rationally describing the results. Six metrics were used as representatives in the convergent-related category, i.e. , , , , and . It was observed that ANSDE III achieved the lowest values of the , , and ; meanwhile, having the highest values of the and . If we take the results from the metrics in this category in combination with the previous category ( ) and interprete them all together, it was obvious that not only ANSDE III could reach a higher number of NDSs but also their NDSs were positioned closer to the TPF than the others. Based on the and (distribution category) which indicate how uniform the NDSs obtained from a given algorithm is distributed, ANSDE III attained lower values of , and higher values of than others. This means the NDSs of ANSDE III were more uniformly distributed with a greater boundary range between maximum and minimum values in each objective than the others. Moreover, under the comprehensive category, the represented a member of this group. This metric reflected the effectiveness in terms of the convergence and spread of the PF simultaneously. The result showed that ANSDE III outperformed the others since its was lower. To demonstrate the overall performance of different algorithms based on different problem sizes (i.e. S, M and L), the resultant metrics of different algorithms were averaged and ranked accordingly. For different problem instances, the values of , , , , and of ANSDE III were always the lowest and its , , and values were always the highest compared with the other three algorithms. This pattern of results was also shown in the average values obtained from ANSDE III regardless of the problem size (S, M, or L). In terms of comparative performances, their rankings from best to worst were as follows: ANSDE III > NSGA III > MOEA/D > MODE (note: the symbol " > " denotes better than). Since the number of outputs obtained from all the compared algorithms was not high enough to validly assume that their distributions were normal, the non-parametric Kruskal-Wallis test of the equality of medians for four algorithms (i.e. MOEA/D, NSGA III, MODE, and ANSGA III) under different problem sizes and performance measures was performed. The hypotheses stated as follows: (H0) the population medians are all equal, as against (H1) not all medians are equal. The results testified that there were differences between the population medians (value ≤ 0.05), and ANSDE III always ranked first among the matched algorithms. Moreover, the results from the two-sample Wilcoxon rank-sum test of the equality of two population medians signified that ANSDE III commonly outperformed the other algorithms.

Conclusion
The MOCCPP was addressed in this paper. Apart from the cost-concern objective, four additional objectives were also optimised simultaneously, i.e. balance workload (TAFB) among different parings, minimise repeated flight patterns, balance the workload (nautical miles) among different pairings, and minimise the number of pairings. Since the problem belongs to the NP-hard class of combinatorial optimisation problems, ANSDE III, an EA that is a hybrid of the NSGA III, MOEA/D, and DE algorithms, plus a simple adaptive mechanism was proposed. The performance of ANSDE III was compared against its original algorithms (NSGA III, MOEA/D, and MODE) under convergence-related, spread-related, and mixed metrics. In all cases, ANSDE III performed better than the other algorithms, in terms of achieving both the best values of different objectives and the obtained PFs.