A Survey of Solution Methodologies for Exam Timetabling Problems

Exam timetabling is a prominent topic in academic administration management as it ensures the effective utilization of resources and satisfies the requirements and preferences of stakeholders, which leads to a productive academic environment, contributing to the institution’s overall success. Given the myriad of solution methodologies explored across diverse exam timetabling problems and constraints, both in studied benchmark datasets and real-life cases over the last decade, it is imperative to undertake a comprehensive survey. This survey paper aims to comprehensively describe the exam timetabling problem (ETP), including its variants, constraints, and benchmark dataset. We look at different methods to solve ETP problems from 2012 to 2023. These methods include mathematical optimization, heuristics, metaheuristics, hyper-heuristics, hybrid approaches, and matheuristics. Finally, we discuss the review findings and potential research directions. By doing so, we hope to facilitate a deeper understanding of ETP and offer valuable insights for future research.


I. INTRODUCTION
Educational timetabling represents a significant instance among challenging combinatorial optimization problems [89].This intricate problem is conventionally classified into three principal types: school timetabling, course timetabling, and examination timetabling [1].Each involves assigning events (e.g., meetings, exams, lectures, tutorials, classes) to limited resources (e.g., timeslots and rooms) while adhering to predefined constraints.Despite similarities, solving one problem type with a method does not guarantee success with another.Consequently, each problem type has been addressed The associate editor coordinating the review of this manuscript and approving it for publication was Mohammad J. Abdel- Rahman .independently, and each instance may differ significantly in constraints and dimensions.
Exam timetabling problem (ETP) was defined by Qu et al. [2] as ''assigning a set of exams into a limited number of timeslots (periods) and rooms (of certain capacity) subject to a set of constraints.''Constraints can vary in definition or weighting based on institutional requirements.The scientific community has been studying the ETP since the 1960s, with an early survey conducted by [3] focusing on practical applications of graph coloring heuristics from 1964 to 1984.Carter and Laporte [4] extended the survey by categorizing algorithms into sequential, generalized search, cluster, and constraint-based methods, while Schaerf [1] reviewed the early approaches for resolving the ETP.Most algorithms primarily addressed the basic timetabling problem with minimal constraints during that period.
The area of timetabling problems, including exam timetabling, has a thriving research community, with notable biannual conference series such as Practice and Theory of Automated Timetabling (PATAT) dedicated to timetabling practices and their applications.Exam timetabling constantly evolves, leading to rapid developments in theory and practice.With more standardized benchmark datasets, researchers are exploring ways to place the timetabling problem in a realworld context.Some studies focus on the optimality gap when solving benchmark instances, while others prioritize the time to find a feasible solution.
Many studies have investigated educational timetabling; however, the breadth of the field restricts each to addressing specific problem subsets.In 2008, Lewis [70] focused his study on metaheuristic techniques, while Pillay [72] focused on hyper-heuristics within educational timetabling in 2014.In 2020, Bashab et al. [73] specifically concentrated on university timetabling using metaheuristic techniques.In 2022, Ceschia et al. [74] focused on benchmarks and state-ofthe-art results in educational timetabling.Numerous survey papers in educational timetabling have delved into specific subdomains, such as course timetabling [87] and school timetabling [88], each with its associated methods [83], [84], [85], [86], respectively.
For dedicated surveys focusing on exam timetabling, Qu et al. [2] conducted a comprehensive study from 1995 to 2008, discussing key research achievements and trends in exam timetabling, encompassing algorithmic strategies, benchmark datasets, and emerging challenges.The survey by Bashar et al. in 2019 [5] exclusively concentrates on one formulation of ETP, namely the Uncapacitated ETP.Notably, the need for dedicated surveys addressing the exam timetabling problem is even more pronounced when compared to other educational timetabling subdomains.This scarcity encompasses the analysis of benchmark datasets or specific methods and the investigation of real-world problem constraints and requirements.Examining existing literature is crucial to discern gaps in exam timetabling research, considering substantial advancements in problem understanding and solution methodologies over the past decade.
This survey is essential for bridging the gap between benchmark datasets introduced over a decade ago and recent real-world cases, providing valuable insights to researchers, educators, and policymakers to navigate the evolving landscape of exam timetabling challenges.The contributions of this survey paper are: • We present an overview of the terminologies, problem descriptions, variants, and constraints related to the ETP.The review outlines the commonly used ETP benchmark datasets, detailing their respective characteristics and state-of-the-art methods.
• We provide an up-to-date overview of recent solution methodologies employed in ETPs, encompassing both benchmark and real-world ETPs from 2012 to 2023.Furthermore, we have structured the article based on the methods and techniques in chronological order, aiding readers in comprehending the methodology timeline in the field.
The tables and figures in this article categorize these methods, facilitating researchers in selecting categories of interest for further study.
• We discuss the classification of the recent methodologies used in the ETP field and propose future research directions.The remainder of the paper is structured as follows: Section II outlines the scope of the reviewed paper.Section III defines the problem variants and constraints.Section IV classifies and offers an overview of solution methodologies for ETPs.The categorization of these solution methodologies is discussed in Section V. Section VI highlights trends in benchmark ETP.Section VII discusses potential future directions.The paper concludes with Section VIII.

II. SURVEY SCOPE
We opted to initiate the search period in 2012 since it marked the commencement of widespread publications on exam timetabling problems, employing various solution methodologies.Bibliographic information was obtained from the Science Citation Index Expanded (SCIE) and Social Science Citation Index (SSCI) within the Web of Science Core Collection by Clarivate Analytics.The search involved looking for phrases related to exam timetabling in the topic field from 2012 to 2023 in the SSCI and SCIE databases.Additionally, further refinement of publications was carried out by selecting articles that prominently featured the search keywords in their front page elements, including the article title, abstract, author keywords, and year published.The search terms and research strategy utilized for exam timetabling were elaborated upon, encompassing the following criteria: Article title: ''exam timetabling'' or ''examination timetabling'' or ''university timetabling'' or ''exam timetabling problem'' or ''exam scheduling'' or ''examination scheduling'' or Abstract: ''exam timetabling'' or ''examination timetabling'' or Keywords: ''exam timetabling'' or ''examination timetabling'' and Year: ''2012-2023'' The search results (last accessed: January 9, 2024) resulted in 1,955 publications meeting these criteria, which were subsequently retrieved for further analysis.Apart from that, publications from the journal OPSEARCH are also included.After excluding specific publication types such as dissertations, conference proceedings, surveys, reviews, book chapters, partial exam-related studies, and those with fewer than five pages, we obtained 59 journal articles, as listed in Table 1.

III. EXAMINATION TIMETABLING A. PROBLEM DEFINITION
An ETP involves allocating a predefined set of exams, each with a known number of students, into specific timeslots within a designated exam session, adherence to a defined set of feasibility constraints (hard constraints), and optimization of quality based on specified criteria (soft constraints).Soft constraints can be breached, resulting in a penalty for the objective function.

B. TERMINOLOGIES
In this survey, we establish the meanings of several key terms within the exam timetabling domain, which will be consistently applied throughout this survey.
• Exam: To schedule exams, a specific period and room must be assigned for each one.
• Exam session: An exam session consists of the entire period during which exams occur or several periods over a defined time frame.
• Timeslot/ Period: The exam session spans a certain number of days, each of these days further segmented into distinct timeslots.Each combination of day and timeslot designates a specific period within the exam session.
• Room: Each room is assigned a designated capacity, indicating the maximum number of available student seats.Certain exams may necessitate one or multiple rooms, while others can be conducted using an online platform and do not require physical spaces.
• Invigilator: Also known as a proctor or supervisor, oversees students during exams.

C. ETP VARIANTS AND DATASETS
In the existing literature, two main variants of ETP are recognized: Uncapacitated and Capacitated.
• Uncapacitated ETP imposes no room capacity restrictions during scheduling and assumes there will consistently be a room with ample space for all participating students.
• Capacitated ETP entails considering limited room/slot capacity per period.
-Room-Capacitated ETP: a capacity constraint is imposed on each period.-Slot-Capacitated ETP: a capacity constraint is imposed on available rooms, specifying the maximum seating capacity each room can accommodate during a given period.For each variant, we present an overview of the problem, constraints, and objectives, along with the main characteristics of the respective benchmarks.

1) UNCAPACITATED ETP VARIANT
The Uncapacitated variant represents the traditional version of ETP introduced by Carter et al. [59].It features a dataset comprising 13 real-world cases collected at universities, primarily in Canada, known as the Toronto dataset.Furthermore, the dataset was expanded by Bilgin et al. [69] to establish a Slot-Capacitated variant.This modified dataset enforces a maximum seating capacity for each timeslot, featuring three daily exam slots.
a: CONSTRAINTS Consider one hard constraint for clash-free exams and one soft constraint for maximizing the distribution of conflicting exams.

b: OBJECTIVE FUNCTIONS
Minimize the sum of proximity costs per student, representing the distance between conflicting exams.Students taking exams with a time gap encounter varying proximity costs, determined by weights: 16 for no gap, 8 for one interval, 4 for two intervals, 2 for three intervals, and 1 for four intervals.

c: DATASETS
Accessible through http://www.cs.nott.ac.uk/ pszrq/data.htm, the dataset includes 13 instances, each with two plain text files-exams and student enrollments.Two versions exist for the five instances, varying in student count, enrollments, exams, and conflict density.A program with its source code for penalty computation can be found on the website.Table 2 illustrates the characteristics of the Toronto version 1 dataset, including the modified Toronto data in the last two columns.

2) ROOM-CAPACITATED ETP VARIANT
The 2nd International Timetabling Competition (ITC 2007) organizers provided 12 instances from British universities for track 1 exam timetabling [61] that were used to evaluate the results, per the competition regulations.This problem formulation effectively bridges the gap between research efforts and institutional requirements by introducing many practical aspects of real-world cases.

a: CONSTRAINTS
Consider five hard constraints: exam conflicts, room capacity, period length, precedence, and room exclusivity, as well as seven soft constraints: two exams in a row, mixed duration, undesired period, spread, frontload, undesired room, and two exams in a day.

b: OBJECTIVE FUNCTIONS
Minimize the penalty that measures the violation of the seven terms associated with the soft constraints.The weightings vary between instances and are specified in the input file.

c: DATASETS
Available for download at https://www.unitime.org/ITC2007/,the solver and its corresponding source code are also accessible.The dataset is also accessible at https://opthub.uniud.it/problem/timetabling/edutt/ett/itc-2007-ett, where solutions can be validated.Each instance is inputted with a text format specifying exam details, hard constraints, and institutional weightings.Table 3 illustrates the characteristics of the ITC 2007 datasets.

3) SLOT-CAPACITATED ETP VARIANT
The dataset from Yeditepe University (Faculty of Engineering) in Turkey includes real-world problems spanning eight semesters across three consecutive years.Initially presented by Özcan and Ersoy [75], it underwent subsequent modifications by Bilgin et al. [69].

a: CONSTRAINTS
The problem includes two hard constraints (exam conflict and slot capacity) and one soft constraint (exam spread).

b: OBJECTIVE FUNCTIONS
Minimize the instances of students undergoing two consecutive exams on a single day.

c: DATASETS
The datasets, available on http://www.cs.nott.ac.uk/ pszajp/ timetabling/exam/, consist of eight real-life instances that have been converted into the ITC 2007 format by Parkes and Özcan [60].To align with the ITC 2007 formulation, a considerably large room with a high cost, though having no impact on solution cost [32], was introduced for all Yeditepe instances except one.Table 4 illustrates the characteristics of the Yeditepe datasets.

4) OTHER DATASETS
The other ETP datasets that are available to be accessed are listed in Table 5.

D. CONSTRAINTS
Table 6 lists primary constraints categorized into examrelated, period-related, and room-related constraints.It provides their appearances across three benchmark instances and other real-world cases surveyed in the literature.These constraints may be classified as either hard or soft, contingent on the requirements or preferences of the institutions.
Invigilator-related constraints have received relatively less attention in the reviewed papers.These constraints have yet to be incorporated into any benchmark dataset, attributed to the diverse and significantly varying nature of constraints across different universities and the preferences that can represent each university's unique challenges.

IV. SOLUTION METHODOLOGIES
This section delves into the solution methodologies employed in the context of the ETP in this survey paper, categorizing these methods into one or more of the six categories: • Mathematical Optimization (MO): also known as mathematical programming, involves optimizing a mathematical model.It is categorized into Linear Programming (LP), which deals with linear functions and integer variables, and Nonlinear Programming (NLP), which involves general functions with discrete and continuous variables.This category encompasses integer linear programming, goal programming, mixed integer programming, and constraint programming.
• Heuristics (HE): find high-quality solutions within an acceptable computational cost but do not guarantee optimality.This category comprises constructive heuristics, graph coloring, and decomposition.Woumans et al. [36] developed two column generation (CG) models for scheduling multiple exam versions while ensuring even distribution.These models were implemented using IBM ILOG CPLEX and tested with real-world data at the KU Leuven campus and the Toronto benchmark.Model 1 used columns to represent exam schedules for unique student groups, while Model 2 represented mask schedules for each group.Post-processing employed heuristic and binary IP approaches to assign exam versions to student groups.Model 1 focused on minimizing spread costs, even if it led to higher total costs, and performed well on both Toronto datasets.Model 2 aimed for the lowest total cost but could only solve one dataset within the time limit.
Cataldo et al. [43] tackled a curriculum-based ETP for Universidad Diego Portales in Chile using three mathematical programming models.Unlike other ETPs, the exam schedule conflict numbers, in this case, are uncertain and depend on curriculum configurations.The first model allocated timeslots and room patterns to group courses, whereas the second model allocated timeslots and room patterns to each course.In the final model, specific rooms were allocated to each exam based on the outcomes of the second model.These models were implemented using GAMS, with CPLEX serving as the solver.Compared to the university's manual process, the schedules demonstrated improved efficiency, with fewer conflicts in most situations and no requirement for rescheduling.
Keskin et al. [46] proposed a mixed MIP model with a 2-stage heuristic approach to address a real-world ETP at Pamukkale University's Faculty of Engineering.The scheduling process involves two stages: determining the exam time in the first stage and assigning classrooms and students to those exams in the second stage.The proposed two-stage method yielded better feasible solutions and faster computation times than commercial software (i.e., CPLEX and Gurobi).Additionally, it shortened the projected final week by one day.
Güler and Gecici [51] developed a decision support system (DSS) using a spreadsheet and MIP model to generate balanced schedules for Y1ld1z Technical University's Industrial Engineering Department.To create the DSS, the MIP model was integrated into Microsoft Excel with the Solver-Studio add-in and Gurobi.In 2021, [56] presented two MIP models with a decomposition approach for solving exam and supervisor assignment problems at a vocational school that provides associate-level degrees.The MIP models were integrated into a web-based DSS, enabling a typical end-user to generate a timetable in less than two minutes using the CBC open-source solver.
Al-Hawari et al. [52] introduced a three-phase Integer Linear Programming (ILP) for addressing the ETP at the German Jordanian University (GJU) and Toronto benchmark.The approach decomposed the problem into three phases: first, allocating exams to timeslots; second, assigning timeslots to days; and third, assigning exams to rooms.ILP formulations were used to obtain solutions for each phase, with a CPLEX solver being utilized.Modularization simplified the formulation, resulting in easily generated, feasible timetables stored in multiple standard formats significantly faster than manual methods.
Godwin [78] utilized linear 0-1 integer mathematical programs, formulated using AMPL and solved with CPLEX, to generate a quality exam timetable considering faculty, room, and student-oriented aspects.Two surrogacy-based approaches were introduced: one assessed surrogacy levels among different objective functions, identifying potential surrogates for multi-objective problems, and the other created multiple solutions for quality and computational time comparisons.Tested on a business school ETP, the optimization models-faculty-oriented and room-orientedexhibited quicker optimization times and served as effective surrogates for each other.However, they proved less suitable as surrogates for student-oriented objective functions.
Bazari et al. [77] implemented an ILP model using IBM ILOG CPLEX and its optimizer.CPLEX's innovative search tree traversal facilitates more efficient discovery of feasible solutions than relying solely on the branching and cutting method.Tested at Ferdowsi University Management Department, the model outperformed the manually prepared schedule, showcasing its superiority in optimal class utilization, avoiding exam conflicts, and achieving average exam spreads.

2) GOAL PROGRAMMING (GP)
Cavdur and Kose [37] designed a binary GP-based approach incorporating fuzzy logic to tackle the ETP tailored explicitly for the Industrial Engineering Department at Uludaǧ 41484 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
University.Initially, fuzzy criticality levels for exams were determined by assigning them based on three factors: class credits, success ratios, and class types.Next, the criticality levels and pre-processed data were inputted into a GP model to develop an optimal exam schedule.A simple algorithm was then used to assign rooms and proctors.In all evaluated objectives, the model's generated schedule performs better than the human expert's produced schedule.

B. HEURISTICS
Burke et al. [14] evaluated a linear combination of primitive heuristics on an enhanced weighted graph that utilized continuously updated information.A new vertex-selection heuristics was developed based on saturation degree and largest weighted degree.The improved model enabled the combination of basic heuristics to create compound selectors and introduced ''badness'' thresholds for color assessment.Tested on Toronto datasets, it achieved optimal results in five cases and outcomes comparable to the best-known outcomes in seven others.Performance gains were observed when the linear combination strategy was combined with backtracking.
Abdul-Rahman et al. [21] proposed an adaptive decomposition and heuristic ordering strategy and experimented on the Toronto dataset.Exams were categorized into two subsets, difficult and easy, and ordered using various graph coloring heuristics to prioritize the assignment of exams to timeslots.To enhance the solution quality, the boundary set, positioned as a subset between the easy and difficult sets, underwent merging and swapping with the difficult set.If no improvement occurred, the roulette wheel selection-based shuffle strategy was applied.While the experimental results did not surpass six constructive approaches in their entirety, they remained competitive with other improvement strategies.
Kahar and Kendall [22] designed a two-phase constructive algorithm for addressing the ETP at Universiti Malaysia Pahang (UMP).In the first phase, exams were concurrently scheduled in timeslots and rooms, and the second phase involved invigilator scheduling.They enhanced the invigilation scheduling model with new constraints, including a minimum 2-day gap between duties, simultaneous oversight of exam papers by lecturers in the same location, and penalties for overseeing their papers.The algorithm met all constraints that the UMP proprietary system could not.
Kasm et al. [68] introduced a constructive heuristic, incorporating a color graphing algorithm, to address a capacitated ETP at Masdar Institute and Toronto benchmark.They considered a new constraint related to the limit on the number of exams assigned to a single student within a predetermined period.Utilizing an adjustable parameter known as the quantity of high-degree nodes, the heuristic impacted scheduling efficiency by giving preference to larger courses.It prioritized identifying the color group that harbors the most non-conflicting vertices, deviating from emphasizing minimizing the number of such groups.Experimental results showed its efficiency compared to the IP formulation, yielding optimal or near-optimal outcomes in computational time and scheduling duration for both datasets.

C. METAHEURISTICS 1) SINGLE-SOLUTION BASED METAHEURISTICS a: TABU SEARCH (TS)
Pais and Amaral [13] proposed a TS method utilizing a fuzzy inference rule-based system (FIRBS) for automated tabu tenure management.Two features, inactivity, and frequency, were employed to register elements on the tabu list.The definitions of these features varied depending on the neighborhood, taking into account both simple and Kempe chain moves.The study experimented with Toronto datasets in two phases.Initially, it compared FIRBS to TS with fixed tenure values, showing a significant improvement.In the second phase, FIRBS was compared with four other established methods using the same objective function, demonstrating its simplicity and suitability as an alternative to manually tuning tabu tenure.In 2016, [34] presented a multi-objective approach that utilized TS for multi-criteria ETP, assessed on the same dataset.Two features were introduced to enhance automation: FIRBS for tabu list's element tenure selection and MCR for flexible preference modeling.FIRBS achieved the best value for 3 of 4 objectives, and MCR's weighting functions provided a smoother balance between objective function values.

b: VARIABLE NEIGHBORHOOD SEARCH (VNS)
Elloumi et al. [25] developed an adapted VNS to minimize total capacity usage for an ETP at the Faculty of Economics and Management Sciences of Sfax.The first stage involved scheduling exams into timeslots, and the second stage assigned exams from timeslots to available classrooms.Two reduction procedures were employed: one involved sorting exam sizes and classroom capacities in a non-decreasing order, and the second used a dominance criterion.The VNS algorithm efficiently allocated classrooms using insertion and swap moves as neighborhood structures.Evaluating the VNS algorithm against this lower limit validated its ability to produce favorable outcomes.

c: SIMULATED ANNEALING (SA)
Battistutta et al. [39] introduced a single-stage SA method that utilized a feature-based parameter tuning technique designed for the ITC 2007 benchmark.The technique entailed the inclusion of non-feasible solutions within the search space and managing them through suitable penalty mechanisms.A two-step parameter tuning process identified crucial parameters and correlated parameters' values to instance features through a linear function using a regression model.Compared to the five ITC 2007 finalists, they globally outperformed them and achieved the best outcomes in two instances.The study highlighted that a properly tuned search method could effectively enhance generalization on new instances.
Leite et al. [48] designed an accelerated variant of SA, named FastSA, which automatically computes the cooling rate.The approach comprised two phases: construction, using a saturated degree graph coloring variant to form the initial solution and optimization, employing the SA approach.Testing on ITC 2007 datasets showed that FastSA outperformed standard SA with fewer evaluations and less computation time.FastSA ranked third among five algorithms when pitted against state-of-the-art methods, exhibiting improvement in one out of twelve instances.
Bellio et al. [55] proposed a two-stage SA-based approach for an ETP based on the formulation of uncapacitated Toronto benchmarks.In Stage 1, they focused on conflict resolution to establish a feasible solution, and Stage 2 enhanced the solution by optimizing exam distances without introducing new conflicts and utilizing specific neighborhood relations.The results obtained matched or surpassed the best outcomes in 10 out of 13 instances compared to a compilation of the most renowned results in the literature.Also, using a mathematical model and ''warm-starting'' CPLEX with SA-derived solutions to try to improve the initial solutions for instances but run into problems, showing how hard it is to use exact algorithms for larger instances.
Van Bulck et al. [67] presented an SA algorithm with a multi-neighborhood approach to address the ITC 2007 benchmarks.The SA framework incorporated six neighborhoods: Move, Kempe, Kick, Shake, and two noteworthy additions: a beam search operator for exam-to-room assignments and an approach leveraging multiple disconnected components in the underlying conflict graph.Its parameters underwent fine-tuning using the irace package through iterated racing procedures.The ablation analysis demonstrated a preference for including all neighborhoods, mainly Move and Swap, while the infrequent selection of Kempe was attributed to computational costs.The algorithm excelled in 11 out of 12 instances compared to ITC 2007 competition finalists and achieved the best solutions in 5 out of 8 Yeditepe instances.
Carlsson et al. [76] investigated three distinct methods for addressing a real-world ETP proposed in [79]: (i) extended SA with a new neighborhood relation, SwapEvents; (ii) constraint programming (CP) solved using Gecode; and (iii) two MIP models solved with Gurobi (2-stage method, MIP2S, and encoded in MiniZinc, MIPMZ).The experimental results demonstrated that extended SA outperformed the original SA in all instances except for two and surpassed MIPMZ and MIP2S in all instances except for two.CP exhibited somewhat inferior performance, and while MIPMZ succeeded in some instances, it struggled or timed out in others.

d: HILL CLIMBING (HC)
Bykov and Petrovic [33] implemented a step-counting mechanism named the Step Counting Hill Climbing algorithm (SCHC).SCHC generated an initial feasible solution using the Saturated Degree heuristic for exam scheduling and then employed a heuristic search, utilizing various moves and the Kempe chain procedure iteratively to produce a candidate solution.Three variants of SCHC were developed: SCHC-all (counting all moves), SCHC-acp (counting only accepted moves), and SCHC-imp (counting only improving moves).SCHC-acp outperformed SA, Late Acceptance Hill Climbing, Great Deluge, and other SCHC variants in longer runs when evaluated on ITC 2007 problems.Additionally, SCHC-amp outperformed previously published results on eight benchmark instances, including one pre-competition and two post-competition outcomes.
Burke and Bykov [40] presented a Late Acceptance Hill Climbing (LAHC) approach with an added greedy rule and enhanced history recording and evaluated it using benchmarks for the Travelling Salesman and ITC 2007 ETP.The algorithm initiated its process by generating an initial feasible solution at random, employing the saturation degree heuristic for exam room allocation.It then iteratively accepts or rejects candidates while ensuring feasibility by applying a range of moves.Throughout the entire search process, only feasible solutions were considered.The proposed approach demonstrated superior performance compared to basic LAHC, Simulated Annealing, threshold acceptance, and Great Deluge on most of the benchmark problems, particularly those with larger sizes.

e: GREAT DELUGE (GD)
Kahar and Kendall [31] implemented a modified GD algorithm (modified-GDA) for a capacitated ETP at UMP.The algorithm included a simple parameter that enabled dynamic adjustments to the boundary, decay rate, and desired value as the search process unfolded.The modified-GDA approach surpassed UMP's proprietary software, Dueck-GDA, and constructive heuristics by generating high-quality solutions that satisfied all constraints, which the proprietary software could not achieve.
Burke and Bykov [35] introduced an extension of GD called Flex-Deluge, featuring a flexible acceptance condition.This enhancement underwent testing on both the Toronto and ITC 2007 benchmarks.They introduced two concepts to improve search technique performance: slowing down uphill moves and adapting the algorithm to specific problem properties.Experimental results demonstrated that the adaptive Flex Deluge algorithms (AFDA), specifically AFDA/LD and AFDA/LD+LNS, which use the largest degree (LD) for replacement moves and the largest number of students (LNS) for room moves, outperformed other algorithms, such as GD, greedy HC, and fixed Flex-Deluge.AFDA variants achieved superior results compared to previous publications, obtaining nine top results for the ITC 2007 benchmark and nine out of 16 top results for the Toronto benchmark.and mutation on randomly selected individuals.Two local search operators, period-supplement and bi-directed, were employed to enhance solution distribution and optimize two objectives simultaneously.The experimental results showed that double hybrid initializations worked better than random and hyper-heuristic ones, demonstrating the effectiveness and robustness of the designed operators.
Leite et al. [45] designed a cellular memetic algorithm (cMA) that hybridized a cellular evolutionary algorithm and a threshold acceptance metaheuristic to address ETP posed by Toronto and ITC 2007 benchmarks.This two-phase cMA first generated feasible solutions using saturated degree and then optimized them in the second phase using TA, Kempe chain neighborhoods, and a combination of mutation and crossover operators.The cMA method did better than other solutions that had been published before.It had the smallest average relative deviation compared to the best solutions on four out of thirteen cases in the Toronto dataset and three out of twelve cases in the ITC 2007 dataset.

b: HONEY-BEE MATING OPTIMIZATION (HBMO)
Sabar et al. [10] proposed a modified HBMO algorithm and evaluated its performance using the Toronto and Socha datasets for uncapacitated ETP and course scheduling problems.Their approach started by creating populations of feasible solutions using the largest enrolment, largest degree, and least saturation degree.In the crossover process, two timeslots were randomly chosen from the queen and drone, and events were then swapped between these selected timeslots.To prevent the drone's sperm from being reused, an event-swapping mutation operator was employed between timeslots.Mated drones were eliminated to sustain population diversity, and newly produced broods were introduced into the population for the subsequent mating cycle.This approach outperformed the original HBMO on Toronto datasets and achieved results comparable to or better than five population-based and eight non-population-based methodologies.

c: PREY-PREDATOR ALGORITHM (PPA)
Tilahun [49] introduced an extended PPA and tested it on six instances of uncapacitated Toronto datasets.Three classifications were assigned to the initial solutions: the top-performing solution was denoted as the best prey, the least effective solution was marked as the predator, and the rest were considered ordinary prey.The best prey exploited its neighborhood, whereas the predator explored the search space to locate a promising area.Both the exploration and exploitation search behaviors influenced the ordinary prey.Statistical comparisons were conducted between the results and seven selected methods from existing literature, demonstrating performance on par with the best.

d: FIREFLY ALGORITHM (FA)
Nand et al. [58] designed a discrete firefly algorithm (dFA) by introducing preference parameters, a steppingahead mechanism to tackle the uncapacitated Toronto ETP.
The algorithm consisted of three phases: Phase 1 used dFA to create an initial solution; Phase 2 improved it with simple moves; and Phase 3 employed sequential neighborhood operators for further refinements.The novel stepping-ahead mechanism utilized preference parameters to enhance optimization and thoroughly explored the solution space, identifying superior solutions.Compared to 10 other algorithms using identical stages, dFA-Step showed similar performance in 5 out of 12 datasets.

D. HYPER-HEURISTICS (HH) 1) SELECTION HYPER-HEURISTICS
Burke et al. [9] evaluated various Monte Carlo-based HHs on the capacitated Toronto dataset.They employed basic heuristic selection methods such as greedy, simple random, choice functions, and a learning scheme.Additionally, they applied three move acceptance techniques based on Monte Carlo methods: standard SA, SA with reheating, and exponential Monte Carlo.An experiment tested combinations of heuristic selection and move acceptance techniques in selection HH.The results indicated that the most effective performance was achieved when combining the SA approach with the reheating move acceptance method and utilizing the choice function technique.
Demeester et al. [8] proposed a tournament-based HH approach to a curriculum-based ETP at KAHO and two post-enrollment ETP benchmarks-Toronto and ITC 2007.The study examined various factors, including the random heuristic selection, tournament factor, and four move acceptance criteria: ''improving or equal'', SA, GD, and a modified late acceptance (LA).Results showed that combining LA with the steepest descent was the most effective move acceptance criterion, on par with SA.It outperformed the manual solution for KAHO and improved 7 of 13 instances in Toronto.While not surpassing the best-known approaches in ITC 2007, it remained competitive.
Pillay [6] used an evolutionary algorithm (EA) based HH with three heuristic representations for the Toronto ETP: fixed length (FHC), N-times (NHC), and variable length (VHC).These representations, using unique combinations of low-level heuristics, were assessed for their impact on EA performance.FHC performed less effectively than VHC and NHC.Combining all three representations yielded superior results compared to using FHC, VHC, or NHC individually.
Soghier and Qu [16] suggested an iterative HH method that used a low-level graph coloring heuristic to choose which exams to take and adaptively hybridized bin-packing heuristics to assign exam timeslots and rooms automatically.This approach dynamically selects and adjusts the level of hybridization for each heuristic to generate a spectrum of heuristic sequences with varying quality.Combining top-performing heuristics resulted in the best solutions across the ITC 2007 datasets, achieving feasibility in all eight instances and ranking 5th among the nine best approaches in the literature.Sabar et al. [26] created a framework for gene expression programming-based HH and utilized it to tackle the ITC 2007 ETP and a dynamic vehicle routing problem.This framework represented an improvement-oriented approach, featuring a set of perturbative low-level heuristics and highlevel techniques with two elements: selection of heuristics through a dynamic multi-armed bandit approach with extreme value-driven rewards and an automatically generated criterion for accepting moves using gene expression programming.This approach outperformed the ITC 2007 competition winner and post-ITC 2007 methods in 4 of 8 instances.
Qu et al. [27] developed an HH framework using an Estimation Distribution Algorithm (EDA) to tackle the uncapacitated Toronto ETP.The high-level search process relied on a basic EDA, guiding heuristic selection for various problem-solving scenarios.The performance of low-level heuristics was assessed through their probability distribution during solution construction, enhancing HH search.Despite lacking backtracking or local improvement techniques, this constructive approach showed promising results across all instances, matching the top performance among the ten others discussed in the literature.
Muklason et al. [41] designed a three-stage multi-objective selection HH approach to optimize standard objectives and fairness within student cohorts.In the initial stage, feasible solutions were created using the squeaky wheel technique.In the subsequent two stages, a selection HH was used, integrating the GD algorithm and reinforcement learning to control the move acceptance and 14 low-level heuristics, respectively.Stage 2 aimed to optimize the standard objective, while Stage 3 sought fairer solutions within a reasonable Pareto front range.Experiments on three benchmarks (Toronto, ITC 2007, and Yeditepe) showed that the approach works for multi-objective ETP, with performance similar to methods that have been reported before.
Hao et al. [53] introduced a unified evolutionary multitasking graph-based HH that employed evolutionary multitasking for high-level search and graph heuristics for low-level heuristics.This unified framework exhibited greater effectiveness and versatility than the single-task HH method based on experimental results on both Toronto ETP and graph coloring problems.
2) GENERATION HYPER-HEURISTICS Sabar et al. [7] proposed a novel graph coloring constructive HH approach that utilized hierarchical fusion of four lowlevel heuristics: largest colored degree, saturation degree, largest enrollment, and largest degree.Combining these heuristics generated four ordered lists to calculate the scheduling difficulty index for the initial exams.Exams were chosen according to their difficulty index and assigned to timeslots using a roulette wheel selection.Despite its single-pass stochastic nature, this approach exhibited competitive performance when evaluated against other HH methods using single sequential graph coloring on the Toronto dataset and constructive methods on the ITC 2007 dataset.
Sabar et al. [17] designed a grammatical evolution HH framework and tested its performance on the ITC 2007 and Toronto benchmarks.The framework utilized grammatical evolution to build an online solver, utilizing a grammar definition to define heuristic components (neighborhood structures, acceptance criteria, and combinations).It automatically evolves multiple templates of perturbation heuristics to solve a problem instance.Experimental findings indicated that integrating adaptive memory into grammatical evolution-HH improved performance compared to the version without it.The approach achieved competitive results when compared to state-of-the-art HHs and bespoke methods for Toronto datasets, surpassing both the winner of the ITC 2007 competition and post-ITC 2007 methods.
Burke et al. [20] designed an adaptive HH approach that hybridized low-level heuristic moves to enhance timetables.The Toronto and ITC 2007 datasets revealed the optimal combination of swap timetables (ST) and Kempe chains (KCM) through offline learning.Ordering exams violating soft constraints via saturation degree yielded superior outcomes.They subsequently crafted an adaptive ST-KCM hybridization in two stages: first, they generated and analyzed random heuristic sequences to pinpoint the best ones.Second, they randomly assigned heuristics to empty positions to determine the optimal sequence.This approach showcased robust generalization across two benchmark datasets, achieving performance comparable to the latest cutting-edge techniques.
Abdul-Rahman et al. [19] introduced an adaptive linear approach based on squeaky wheel optimization.Each exam received a difficulty score during the assignment process by calculating a modified weighted sum of low-level heuristics.Exams were ordered based on their difficulty score, with higher scores getting assigned timeslots and rooms before lower scores.This cycle continued until a feasible solution improved as resources were allocated to minimize penalties.When several heuristics were used with a modifier, the results were similar to other constructive methods for the Toronto datasets.Feasible results were obtained for the ITC 2007 datasets.
Pillay and Özcan [47] presented a generation construction HH that automates generating low-level construction heuristics.This study investigated two kinds of heuristics: arithmetic (AHH), evolved through genetic programming, and hierarchical (HHH), explored through three distinct approaches: HHH-GP (genetic programming), HHH-GA (genetic algorithms), and HHH-RG (random generation), each with its unique representation.Experimental results on the Toronto and ITC 2007 exam timetabling benchmark sets revealed AHH's superiority over HHH and traditional graph coloring heuristics.On the other hand, HHH-GA showed promise for the ITC 2007 course timetabling benchmark.Hence, the authors concluded that heuristic effectiveness varies based on specific educational timetabling problems.
Mweshi and Pillay [54] enhanced the framework of HH methods to create perturbative heuristics with a broader design.Initially, these heuristics came from basic actions and solution components using grammatical evolution.This extension added conditional constructs, incorporated data from the solution space, and broadened the syntax of fundamental actions to include a broader range of heuristics.They tested this approach on benchmark sets in three domains: boolean satisfiability problems, ETP, and vehicle routing problems.The effectiveness of these newly generated heuristics highlighted the approach's versatility and demonstrated its competitiveness with existing perturbative HH on the ITC 2007 exam benchmark.

E. HYBRID APPROACH
Abdullah and Turabieh [11] presented a tabu-based MA by integrating a genetic algorithm and a tabu search algorithm and evaluated its performance on the ITC 2007 exam and course problems.The algorithm employed various neighborhood structures for local search and integrated a tabu list to manage their selection.Experiments were conducted on ITC 2007 datasets using three different selections of neighborhood structures: ''random,'' ''best,'' and ''general''.The ''best'' sequence demonstrated superior performance in course timetabling, while the ''general'' sequence excelled in examination timetabling.The results surpassed the ITC 2007 exam track winners in 7 out of 8 datasets.
Ahandani et al. [12] proposed a structure integrating discrete particle swarm optimization (DPSO) and a two-stage HC local search for the Toronto ETP using three hybridization strategies.It involved both generation HH elements (associated with DPSO and heuristic updates) and selecting HH components (concerning low-level heuristic management and the use of two-staged HC).The DPSO algorithm used a mixed-population heuristic initialization method to create an initial population and iteratively improve it through DPSO and local search.The study's findings strongly supported the effectiveness of these DPSO algorithms with diverse structures and strategies as HH systems.They performed as well as or outperformed other HH methods and were on par with evolutionary algorithms in the literature.
Abdullah and Alzaqebah [18] created a hybrid self-adaptive bees algorithm (BA) by making three essential modifications: comparing selection strategies (tournament, disruption, rank), comparing local search algorithms (SA and LAHC) through parameter tuning, and looking into how a self-adjusting mechanism affects the effectiveness of neighborhood exploration.Experimental on the Toronto and ITC 2007 datasets identified the most effective BA algorithm enhancements, with the self-adaptive disruptive selection strategy combined with LAHC local search showing superior performance over other strategies (basic BA, rank selection, tournament selection, and SA local search).In 2015, [28] hybridized bee colony optimization with LAHC, disruptive selection, and self-adaptive, yielding results that are on par with the best-performing methods on Toronto datasets and mirrored the success of the top five winners in the ITC 2007 datasets.
Al-Betar et al. [24] suggested using the harmony search algorithm to create an optimization framework that combines different parts of MA and tested how well it worked on the uncapacitated Toronto datasets.The three components incorporated were the recombination operator for global improvement (GIM), the randomness operator for random improvement (RIM), and the neighborhood operator for local improvement (LIM).The combination of GIM, RIM, and LIM components worked well for complicated timetabling problems and did better than 22 other comparison methods, such as local search-based MH, MH-P, H, and HH.This method yielded the best overall outcomes in two instances out of twelve.
Fong et al. [23] presented a novel approach that hybridized the artificial bee colony (ABC) and GD algorithms, involving two phases: initialization and enhancement.The enhancement stage was divided into three sub-stages-employed bee, onlooker bee, and scout bee phases.The assimilation policy controlled the search in the employed bee phase, while the Nelder-Mead simplex search approach refined the solutions in the onlooker bee phase.In addressing the Toronto ETP, this technique exceeded the basic ABC algorithm and attained three novel best results while retaining its competitiveness against state-of-the-art methods.In 2015, [29] integrated a modified neighborhood search function in the employed bee phase of a variant ABC algorithm, drawing inspiration from the ''global best'' model found in particle swarm optimization.Evaluation of the same datasets revealed that the results of this variant were almost identical to those of the previous version.
Li et al. [30] presented an Evolutionary Ruin and Stochastic Rebuild approach, validated using uncapacitated Toronto datasets.The method involved four phases: (i) solution decomposition for solution breakdown, (ii) evolutionary ruin for destructive modification, (iii) stochastic recreate for restoration, and (iv) solution acceptance for probabilistic acceptance determination.It outperformed graph-based HH algorithms but was inferior to local search algorithms based on experimental results.
Müller [38] designed a multi-phase search algorithm for a real-life ETP at Purdue University and introduced nine new benchmark datasets.The algorithm began with a construction stage employing an iterative forward search algorithm and incorporating conflict-based statistics.The following phase used an HC algorithm to arrive at a local optimum, then a GD technique to account for oscillations in the overall solution value bound.Experimental results affirmed the two-hour time limit's reliability for creating Purdue's timetables.UniTime enabled interactive manual adjustments with solver recommendations to maintain quality during modifications.
Lei et al. [42] developed an adaptive coevolutionary MA for addressing the Toronto benchmark.This algorithm conducted an evolutionary search in two domains: heuristic space for global search and solution space for local search.
The heuristic space began with an HH approach to create an initial population and used the basic crossover operator and mutation operator to identify possible heuristic lists.Tailored evolutionary operators were utilized in the solution space to expand the exploration scope and optimize the solution.An adaptive coevolutionary strategy was used to select the search space adaptively.Compared to 11 popular methodologies, this algorithm achieved top-five results across 12 instances.
Aldeeb et al. [57] introduced a hybrid intelligent water drops (IWD) algorithm for addressing the Toronto ETP.The algorithm involved six phases for hybrid IWD: static parameter initialization, dynamic parameter initialization, assigning a random timeslot to an individual exam for each IWD, constructing IWD solutions, enhancing solutions, and termination.In the final stage, an enhancement loop integrated a local search optimizer (LSO), employing Move, Swap, and Kempe Chain neighborhood structures.Based on the experimental results, the algorithm outperformed nine swarm-based approaches in three datasets and achieved the best results compared to 10 metaheuristic approaches in one dataset.

F. MATHEURISTIC
Gogos et al. [15] presented a multi-staged algorithmic approach for solving the ITC 2007 ETP, primarily managed by the Greedy Randomized Adaptive Search Procedure (GRASP).This approach comprised three phases: preprocessing to uncover hidden dependencies, construction for timetable creation, and an improvement stage for solution optimization.The improvement phase incorporated HC local search, SA local search, and an IP sub-problem solved using the GLPK solver.In cases where the current solution could not be enhanced, a shaking stage rescheduled suboptimal exams.Experimental results demonstrated that this approach ranked second among the ITC 2007 competition winners.
Çimen et al. [66] introduced a Mixed-Integer Linear Programming (MILP) model and a heuristic algorithm to address the invigilator scheduling problem at Hacettepe University and Gazi University.When tested on eight real-life cases, both demonstrated effectiveness by outperforming schedules created by the faculty.The experimental results demonstrate that the MILP-generated schedules showcase an enhanced distribution of invigilation hours, avoidance of successive assignments, and prioritization of department-based duties.Simultaneously, the heuristic algorithm excels at creating schedules with a fair distribution of invigilation hours and avoids successive assignments, proving its applicability to larger-sized examinations.

V. CATEGORIZATION OF ETP SOLUTION METHODOLOGIES
In this study, we reviewed solution methodologies for ETP presented in 59 journal articles spanning from 2012 to 2023, as illustrated in Table 8.The distribution of the chosen papers based on publication year and categories is depicted in Figures 1 and 2, providing percentage and numerical perspectives.The publication trends over the years are illustrated in Figure 1, revealing fluctuating patterns.The peak publication rate, with ten publications representing 17% of the total, occurred in 2012, while the lowest percentage, with two publications reflecting 3% participation, was observed in 2013, 2018, 2020, and 2021.The categorization of these methodologies is depicted in Figure 3.As indicated in Table 8, there are fourteen hyperheuristics (HH), twelve mathematical optimizations (MO), twelve metaheuristics based on single solutions (MH-S), five approaches employing population-based metaheuristics (MH-P), eleven hybrid approaches (HA), four heuristics (HE), and two matheuristics (MAH).The highest figure is observed in the HH within these categorizations, while the least falls under the MAH.HH strives to elevate the level of generality in the operation of optimization systems.They operate on a search space defined by lower-level heuristics or even metaheuristics for the addressed problem.The primary focus of HH is the thoughtful selection or generation of the appropriate (meta-)heuristic for any given situation.Various selection HH has been explored in ETP, such as Monte Carlo Search [9], multiple move acceptance criteria [8], swarm-based computational intelligence integration [12], evolutionary algorithms combined with low-level heuristics citeb6, and estimation distribution algorithm [27].Additionally, multi-objective selection HH with GD and reinforcement learning [41] and multitask selective HH [53] have been investigated.Adaptive approaches were implemented in HH for selecting heuristics, ordering heuristics, or hybridizing heuristics based on a learning mechanism [16], [19], [20], [21].On the other hand, generation HH employed in ETP encompasses generation construction heuristics [47] and perturbative heuristics [54].Genetic programming, which includes gene expression programming [26] and grammatical evolution [17], [54], was employed to evolve acceptance criteria and generate selection heuristics.
However, as indicated in Table 8, there has been a discernible shift in research focus from a predominant emphasis on HH to an increasing inclination towards metaheuristics and MO over the timeline from 2012 to 2023. Figure 2 reveals a heightened interest in metaheuristic categories, constituting 28% of the selected studies: 20% in MH-S and 8% in MH-P.It is evident from Figure 3 that many of the selected metaheuristic algorithms encompass widely recognized and frequently employed ones, such as the MA, LAHC, TS, and GD.These algorithms' performance and experimental outcomes have demonstrated their efficiency and effectiveness.Notably, SA has received extensive coverage in publications, surpassing the previously mentioned methods.The hybrid approach has demonstrated exceptional performance by combining metaheuristics with (meta-)heuristics or incorporating local search methods into population-based methods.This approach significantly benefits by enhancing diversity within a population, thereby boosting the search capabilities of the resultant hybrid algorithm [24], [80].Notably, it has gained substantial traction, as evidenced by its 18% publication record.
The MO emerges as another highly utilized approach, with a substantial participation rate of 56%, as depicted in Figure 4, which illustrates the distribution of categories for ETP real-world problems.MO approaches are extensively employed in addressing real-world datasets, primarily owing to their effectiveness in handling small to medium-sized exam timetabling scenarios at the faculty or departmental level, mainly attributed to their assurance of optimal solutions and straightforward implementation through software integration with efficient heuristic algorithms.However, their efficiency diminishes when dealing with large-scale problems [55].In such instances, the complexity of MO solutions escalates exponentially, classifying these problems as NP-hard with exponential time complexity, making MO approaches suitable for the ETP when time and space complexity are not critical considerations.

VI. CURRENT TRENDS IN BENCHMARK ETP
Between 2012 and 2023, benchmark problems, particularly the Toronto benchmark (44%) and the ITC 2007 exam benchmark (29%), have been extensively studied in the exam timetabling domain, as illustrated in Figure 5(a).In contrast, the Yeditepe benchmark has received less attention, with only 5% of the studies.Meanwhile, realworld datasets have also gained attention, constituting 22% of the studies.Additionally, there has been a noticeable increase in exploring other real-world problems, as evidenced in Figure 5(b).Table 9 presents a compilation of benchmark datasets with their respective categories and references.
A. TORONTO DATASETS Table 9 highlights that hyper-heuristics (HH) constitute 32%, and hybrid approaches (HA) account for 23%, emerging as the most frequently employed methods for addressing large-sized Toronto datasets.Meanwhile, mathematical optimization (MO) approaches have a less significant presence, comprising only 6%.Despite their utilization in the Toronto benchmark problem, MO approaches exhibit limited effectiveness, with one solving only two small-size instances and another primarily focusing on method validation without demonstrating associated costs.
For this dataset, a total of 34 unique methodologies have been proposed.Among these, the state-of-the-art results are predominantly attributed to the multi-neighborhood SA approach by Bellio et al. [55] in most instances.Furthermore, Leite et al.'s cellular memetic [45] and Burke & Bykov's flexdeluge [35] each exhibit the best results in specific instances.Specifically, in [35], the average run time was 5.1 hours at a CPU speed of 3.2 GHz.In [45], a run time of 31.4 hours at a CPU speed of 2.0 GHz.Additionally, [55] achieved their best record running times under conditions of equal running time and CPU speed, as reported in [45].Before 2012, the hybrid variable neighborhood approach by Burke et al. [90] in 2010 exhibited state-of-the-art results in a specific instance.
The outcomes for this problem are highly sensitive to running time, where longer durations yield better results.Due to the heterogeneous nature of running times and processor speeds in the literature, achieving fair comparisons is challenging.Consequently, depending solely on the total iteration count as the sole termination criterion for the competition proves inadequate.Instead, attention must be given to the relative computational costs, which vary based on the specific instance, adding complexity to the comparison process.

B. ITC 2007 (TRACK 1) DATASETS
According to Table 9, HH dominates with 44%, followed by metaheuristics single-solution (MH-S) at 26%, as the most commonly adopted methods for the ITC 2007 exam benchmark problem.For comparability, the running time aligns closely with the ITC 2007 benchmarking tool's specified time limit.For this dataset, a total of 23 unique methodologies have been proposed.The state-of-the-art results reported within the competition time limit are predominantly associated with a step-counting hill climbing method introduced by Bykov and Petrovic [33] and a flex-deluge approach by Burke and Bykov [35], prevailing in the majority of instances.Additionally, a distributed scatter search method by Gogos et al. [81] excels in two instances.In contrast, an HH by Sabar et al. [26] achieved the best result in a single instance, similar to [81].Notably, the parallel distributed scatter search metaheuristic, introduced by Gogos et al. [82] in 2010, demonstrated state-of-the-art results across several instances.A detailed discussion on state-of-the-art results for the Toronto benchmark and the ITC 2007 exam benchmark can be found in the survey paper [74].

C. YEDITEPE DATASETS
The Yeditepe Benchmark, only four studies, involved applying MO approaches in The dataset's focus is restricted, possibly owing to its simpler formulation and the relatively diminutive scale of its instances compared to other benchmarks.The method for this dataset is multi-neighborhood SA, introduced by Buick et al. [67].Both the HH by Muklason et al. [41] and the MIP approach by Arbaoui et al. [50] achieve the best results in the same two instances, similar to Bulck et al.The features, source code, instances, and new best solutions for the Toronto benchmark [55], ITC 2007 benchmark [39], [67], and Yeditepe benchmark [67] are publicly accessible for this dataset on the Exam Timetabling ETT section of OPTHUB (https://opthub.uniud.it)(last accessed: January 20, 2024).This platform allows researchers to verify and upload new instances, along with their solutions and corresponding scores, thereby fostering comparison, collaboration, and reproducibility.

VII. FUTURE DIRECTIONS
Despite being more realistic than the incapacitated variant and featuring a more complex formulation than the Yeditepe benchmark, ITC 2007 (Track 1) still reveals a gap between benchmark standards and the practical implementation of ETP in terms of requirements.This gap, highlighted in Table 6, is marked by differences in benchmark and other real-world constraints, particularly period-related and room-related constraints, indicating a need for more consideration for all practical features.Addressing this discrepancy could be facilitated by introducing additional benchmark datasets that closely emulate real-world exam timetabling scenarios.These datasets should capture practical constraints, especially period-related and roomrelated constraints, crucial for effective resource management in institutions.Furthermore, ongoing investigations are impeded by the absence of crucial metadata, including details like the student's year and course, exam school, and faculty, which are essential for defining cohorts in existing benchmarks.Integrating this metadata into public datasets would significantly enhance the refinement of formulations and algorithms, aligning them more closely with student preferences [41].
Fairness holds paramount importance and is typically assessed through the spread of exam periods among all students.[36] introduced a formulation that approaches this issue from a student-centric perspective, considering scheduling specific examinations multiple times to improve fairness among diverse student groups.In a complementary fashion, [41] integrated fairness as an objective function within a cohort of students in a multi-objective optimization strategy designed to enhance overall fairness.Future research endeavors could delve into the students' demonstrated preference for timetables that are inherently fair, potentially leading to the development of examination schedules that align with these preferences and contribute to an elevated level of student satisfaction [41].
The primary limitation of metaheuristics lies in the necessity to configure a multitude of parameters, and the effectiveness of a metaheuristic hinges on parameter tuning.Despite the time-consuming nature of this process, which involves multiple runs to assess performance across instances with varied parameter settings, the universality of parameter tuning is advantageous.The competitive results were attained in [55] notably through SA, which performed tuning on artificial instances while avoiding excessive fine-tuning on 41494 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
benchmarks.Future research can explore the feasibility of implementing feature-based tuning, adjusting parameters aligned with each given instance's distinctive features.It is preferable to enable algorithms to autonomously adapt their behavior for optimal solutions or use automatic parameter tuning approaches such as REVAC, SMAC, ParamILS, and F-Race [39], [55], [67].
The hybridization of metaheuristics with machine learning methods, such as deep learning or reinforcement learning, indicates a growing opportunity for cross-fertilization.Another potential direction involves the parallel implementation of a metaheuristic algorithm [45].Burke et al. [9] showcased the effectiveness of integrating reinforcement learning with SA featuring reheating within a hyper-heuristic.Meanwhile, Muklason et al. [41] incorporated reinforcement learning and the great deluge algorithm into their hyper-heuristic.
The investigation of diverse hybrid techniques, designed to leverage the distinct advantages inherent in various search methods and metaheuristics, constitutes a notable and burgeoning trend in ETP.It may also involve employing metaheuristics for initializing the MIP or CP.Alternatively, delve into more intricate interaction mechanisms, such as the matheuristic paradigm, to further enhance the effectiveness of these methods [76].
The efficiency and promptness of a local search method depend on various factors, with the selection of both neighborhood structure and search space standing out as the most crucial determinants.Numerous implementations face challenges when their neighborhood structures prove to be overly simplistic.A notable trend observed in several state-of-the-art techniques [26], [33], [35], [45], [55], [67] applied to benchmark datasets is the integration of sophisticated neighborhood structures, particularly during the improvement phase.For instance, [55] achieved optimality by employing a combination of neighborhoods based on various atomic moves, suggesting the exploration of dynamic neighborhood probabilities through an online learning mechanism for future research.To boost local intensification, employ an adaptation mechanism to choose an appropriate neighborhood structure and oversee the search process [18], [28].Future research could focus on developing neighborhood operators in conjunction with Kempe-chain operators [48], [58].Introducing diverse neighborhood variations, such as shuffling periods instead of relocating exams, may prove valuable.Additionally, investigating the adaptation of these neighborhoods to handle weakly connected subgraphs could be an intriguing avenue for exploration [67].
The prevailing trend in timetabling research emphasizes a noticeable inclination towards metaheuristic methods and their hybridizations and MO approaches.Specifically, the judicious fine-tuning of the metaheuristic approach has showcased superior efficacy across most instances.However, MO approaches, proven more efficacious in a few instances [77], continue to be extensively utilized in addressing real-world timetabling scenarios, primarily attributed to the prevalence of small to medium-sized faculty-level exam timetabling instances.This preference stems from the inherent scalability of such methods, enabling the efficient generation of high-quality solutions within manageable timeframes.Despite the assurance of optimal solutions offered by MO approaches, their widespread adoption faces hindrances due to limited scalability in dealing with largescale problems.However, a phased approach in MO can be employed for problem decomposition [52].Nevertheless, recent strides in computational power and solver efficacy are progressively expanding the potential and practicality of MO in addressing larger real-world timetabling challenges [78].
Most real-world timetabling in the reviewed studies is conducted on a faculty or departmental basis.Addressing challenges across various departments within an institution through an integrated approach and resolving them constitutes a potential research area aiming to foster efficient and cohesive management.Local search algorithms are notable for their remarkable efficacy in addressing large-scale problems with only moderate CPU time requirements [40].The hybridization of vital elements from local search-based methods with population-based methods [24] and hyper-heuristics has proven particularly effective in handling extensive datasets, contributing to optimizing solutions for complex, real-world scenarios.
Instead of uniformly distributed study periods, it is pertinent to recognize that students devote more time to preparation for demanding examinations characterized by substantial content, which underscores the need for researchers to construct a framework endorsing an extended study duration before intricate exams, juxtaposed with a briefer preparatory period for more straightforward exams [77].
Another avenue for advancing the exam timetabling field involves conducting technically enriched surveys through the implementation of methodologies that emphasize state-ofthe-art algorithms and rigorously test their performances on benchmarks.

VIII. CONCLUSION
The survey primarily concentrates on high-quality journal articles published between 2012 and 2023.We identify emerging solution methodologies and problem variants within the ETP domain, providing potential avenues for further development to interested readers.The methodologies are categorized based on temporal intervals, the analytical techniques employed, and the datasets utilized.The problem formulations articulated in this paper are concomitant with a dedicated benchmark dataset meticulously tailored to the specific constraints inherent in the considered ETP variant.Despite over a decade since their publication, benchmark instances for the specified exam variations remain persistently challenging.Hence, there is a recognized need for new benchmarks that consider resource-related constraints to improve resource management, optimize utilization, and narrow the gap between applicable strategies and theoretical achievements.We hope that this survey will assist researchers in identifying unexplored and challenging problem variants or emerging, innovative solution methods within the expansive and rapidly evolving domain of the ETP.

FIGURE 1 .
FIGURE 1. Distribution of the ETP studies by publication year.

FIGURE 2 .
FIGURE 2. Distribution of the ETP studies by category.

FIGURE 3 .
FIGURE 3. The categorization of solution methodologies for ETP.

FIGURE 4 .
FIGURE 4. Distribution of categories for real-world dataset.

VOLUME 12, 2024 41493
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

FIGURE 5 .
FIGURE 5. (a) Distribution of studied datasets (b) Distribution of studied datasets over time.

TABLE 1 .
List of ETP journals studied.

TABLE 5 .
Datasets for ETP with web links.

Approaches: a
high level of integration, either by incorporating a metaheuristic algorithm into a search tool or integrating various metaheuristic algorithms into a unified framework.

TABLE 8 .
Solution methodologies for the ETP and studied datasets.

TABLE 9 .
Solution methodologies utilized for benchmark datasets.