Fitness landscape analysis of the simple assembly line balancing problem type 1

As the simple assembly line balancing problem type 1 (SALBP1) has been proven to be NP-hard, heuristic and metaheuristic approaches are widely used for solving middle to large instances. Nevertheless, the characteristics (fitness landscape) of the problem’s search space have not been studied so far and no rigorous justification for implementing various metaheuristic methods has been presented. Aiming to fill this gap in the literature, this study presents the first comprehensive and in-depth Fitness Landscape Analysis (FLA) study for SALBP1. The FLA was performed by generating a population of 1000 random solutions and improving them to local optimal solution, and then measuring various statistical indices such as average distance, gap, entropy, amplitude, length of the walk, autocorrelation, and fitness-distance among all solutions, to understand the complexity, structure, and topology of the solution space. We solved 83 benchmark problems with various cycle times taken from Scholl’s dataset which required 83000 local searches from initial to optimal solutions. The analysis showed that locally optimal assembly line balances in SALBP1 are distributed nearly uniformly in the landscape of the problem, and the small average difference between the amplitudes of the initial and optimal solutions implies that the landscape was almost plain. In addition, the large average gap between local and global solutions showed that global optimum solutions in SALBP1 are difficult to find, but the problem can be effectively solved using a single-solution-based metaheuristic to near-optimality. In addition to the FLA, a new mathematical formulation for the entropy (diversity) of solutions in the search space for SALBP1 is also presented in this paper.


Introduction
The broader assembly planning (AP) problem has three main subproblems: assembly sequence planning (ASP), assembly path planning (APP), and assembly line balancing (ALB) (Somaye Ghandi & Ellips Masehian, 2015).This study deals with a special type of ALB with wide applications in the manufacturing industry.Suppose that n assembly operations (such as welding, riveting, and screwing) in a facility are partitioned into j elementary tasks, each taking tj time to complete.The challenge in ALB is to assign these tasks to m assembly workstations such that all workstations have equal assembly times, and the precedence relations between the tasks are satisfied.The ALB problem is further classified into two NP-hard subproblems: Simple Assembly Line Balancing (SALB) and Generalized Assembly Line Balancing (GALB) (Scholl & Becker, 2006), as described below: SALB: This category is suitable for modeling (single-sided) assembly lines that produce a unique model of a single product with deterministically-known input parameters (Capacho Betancourt, 2007).A single-sided assembly line consists of a sequence of m workstations usually connected by a conveyor belt, through which the product units flow.Each workstation performs a subset of the n operations necessary for manufacturing the products.Each product unit remains at each station for a fixed time called the 'cycle time', c.In these assembly lines, workstations are consecutively arranged in a straight line.
Each product unit proceeds along this line and visits each workstation once (Gonçalves & De Almeida, 2002).SALB problems have been categorized into some types: i. Type 1 (SALBP1) minimizes the number of workstations (m) for a given fixed cycle time (c).ii.Type 2 (SALBP2) minimizes the cycle time for a given number of workstations.iii.Type E (Efficiency) (SALBP-E) maximizes the line efficiency (E) by minimizing c and m simultaneously and considering their interrelationships.Most research on SALBP focuses on SALBP1 and SALBP2, and few studies deal with the optimization of assembly line balancing efficiency or SALBP-E (Wei & Chao, 2011).iv.Type F (feasibility) (SALBP-F) determines if a feasible line balance exists for a given combination of m and c.GALB: This category of ALB problems is appropriate for balancing more complex assembly lines and has the following classifications (Capacho Betancourt, 2007): i.General two-sided assembly line balancing (2S-ALB) deals with lines with pairs of operating workstations positioned opposite to each other and each workstation performing a different task (Yadav et al., 2019).ii.Mixed-model assembly line balancing (MALB) involves lines that assemble several models of a basic product in an intermixed sequence.Considering that tasks have different times for different models, the problem is to optimize the balance (load) of the stations or any cost-oriented objective function by determining the best cycle time and assignment of tasks to the workstations.iii.U-line assembly line balancing (UALBP) deals with single-product assembly lines in which workstations are arranged on both sides of a U-shaped path.Thus, operators can work on either side of the path while simultaneously performing both early and late tasks of the assembly.iv.The robotic assembly line balancing problem (RALBP) involves flexible assembly lines comprising human workers, robots, and equipment (Chutima, 2022;Li et al., 2018).
On the other hand, solution approaches to the SALBP1 are categorized into two classes: 1-Exact solution approaches, which can find the optimal solution accurately but are not efficient enough for NP-hard optimization problems and their execution time increases exponentially with the problem size.In fact, the first methods that were developed to solve the SALBP1 belonged to exact approaches.Exact approaches include lower bounds, dominance rules, reduction rules, dynamic programming procedures, branch-and-bound procedures (Dolgui & Gafarov, 2019;Vilà & Pereira, 2013), and mathematical programming (Pastor & Ferrer, 2009;Scholl & Becker, 2006).In branch-and-bound procedures, bounding strategies and preprocessing rules are usually applied to increase the efficiency (Vilà & Pereira, 2013).Also, Intelligent techniques are elaborated to avoid complete enumeration (Dolgui & Gafarov, 2019).In the existing mathematical programs, an initial pre-process usually is carried out to calculate the range of workstations to which a task i may be assigned, aiming to reduce the number of variables of task-workstation assignment (Pastor & Ferrer, 2009).2-Approximate solution approaches, that are able to find good (near-optimal) solutions to NP-hard problems within short runtimes.Approximate approaches are divided into three categories: problem-specific heuristics, metaheuristics, and hyperheuristics.
Although problem-specific heuristic algorithms are developed based on the features and properties of the problem at hand, they have the shortcomings of getting stuck in local optima or converging prematurely to such points.The following works have utilized heuristic methods to solve the SALBP1: (Fathi et al., 2018;Hackman et al., 1989;Helgeson & Birnie, 1961;Kilincci, 2011;Pape, 2015;Scholl & Voß, 1997;Talbot, 1985).The success of heuristic algorithms strongly depends on the problem characteristics (Pape, 2015).Also, the effectiveness and efficiency of heuristic algorithms depend, among other factors, on the fitness function used.The fitness function has a major role since, in addition to being used to identify the best solution found, it is used to guide the search algorithm toward an area of the feasible region with promising, high-quality solutions.Therefore, the fitness function must be closely correlated with the objective of the original problem in order to avoid losing the best-found solution, due to miscalculation of the solution quality, and to avoid late convergence and consequently higher computational times.In addition, the computation of the fitness function must be easy and fast due to the iterative nature of heuristic methods (Fathi et al., 2018).
Metaheuristic algorithms are general search methods applicable to a wide range of problems and can find local optimal solutions through exploration and exploitation in the search space.These algorithms allow the generation of several solutions due to the incorporation of randomness in the procedure.On each iteration of these algorithms for solving the SALBP1, a task is chosen from a subset of the candidates using a probabilistic rule which may take into account a priority rule and information obtained by previous iterations (Bautista & Pereira, 2002).In the last two decades, numerous metaheuristic methods have been developed for solving the SALBP1, including constructive procedures (Ponnambalam et al., 2000), genetic algorithms (GA) (Álvarez-Miranda et al., 2021), tabu search (TS) (Abdeljaouad & Klement, 2021;Pape, 2015), simulated annealing (SA) (Nagy et al., 2020), ant colony optimization (ACO) (Bautista & Pereira, 2002;Bautista & Pereira, 2007;Zhong & Ai, 2017), artificial immune systems (AIS) (Zhang, 2018), discrete particle swarm optimization (PSO) (Dou et al., 2017), heuristics based on slope indices (Baskar & Xavior, 2020), the firing sequence backward algorithm (Kilincci, 2011), and variable-depth local search (Álvarez-Miranda et al., 2022).
Hyperheuristic algorithms are high-level, automatic search methods that manage a set of low-level heuristic algorithms to solve complex computational problems.In most cases, by combining machine learning techniques, the process of selecting, combining, generating, or matching several simple heuristic algorithms is used to efficiently solve computational search problems.Unlike metaheuristic algorithms that are used to solve only one problem, hyperheuristics create a system to solve classes of different problems.The following works have utilized hyperheuristic methods to solve the SALBP1: (Gonçalves & De Almeida, 2002;Hu et al., 2023;Meng et al., 2021;Özbakır & Seçme, 2022;Seçme & Özbakır, 2019).
While reviewing the literature on SALBP1 methods, we noticed that although most approximate solution approaches implement either single-solution-based (S-) metaheuristics (such as SA) or population-based (P-) metaheuristics (such as ACO), they do not provide justifications and reasons for using their selected algorithm.However, the effectiveness and efficiency of a metaheuristic algorithm for an optimization problem strongly depend on the landscape of the problem, and depending on the shape of the landscape, specific types of search methods with certain intensification and diversification capabilities will be more effective.Therefore, before selecting and implementing a metaheuristic algorithm for a particular problem, it is important (and sometimes necessary) to analyze the search space of the problem and identify the distribution and magnitude of its peaks and valleys.This analysis, known as Fitness Landscape Analysis (FLA), examines the shape of the problem's search space using the distribution of local optima and their relationships and distances to each other in the search space and provides a good understanding of the structure of the solution spaces of the problem at hand.
To the best of our knowledge, so far there are only two studies that have performed FLA in the field of assembly planning: (Somayé Ghandi & Ellips Masehian, 2015) and (Nourmohammadi et al., 2019).Both studies, however, have not addressed SALBP1 exactly or properly in sufficient breadth or depth; thus, it was necessary to fill this gap and present a comprehensive FLA for SALBP1, as we did in this paper, to provide a reference for researchers in the field.
Below, we highlight the differences between the present work and each of the works.
(1) The FLA in (Somayé Ghandi & Ellips Masehian, 2015) was performed for the ASP problem (sequencing the assembly of parts in a product) and not for the ALB problem.
(2) Although an FLA was done for the SALBP1 in (Nourmohammadi et al., 2019), it had serious drawbacks and inaccuracies listed as follows: (i) the FLA was done only for one small problem instance with 7 tasks, which is neither a benchmark problem nor a practical case; (ii) only one operator type (i.e., swapping) was implemented for generating neighboring solutions throughout the local search; (iii) no solution representation was presented and the way the local search was applied to a given solution is unclear; (iv) the size of the studied population of solutions (only 500) was not sufficient to draw inclusive conclusions; (v) some important FLA measures like Gap, Length of walks, and Autocorrelation were not computed or analyzed; (vi) the function used for calculating the entropy was specific to the quadratic assignment problem (QAP) and not to the SALBP1, where in addition to the sequence of tasks, their assignment to workstations is also critical; and most importantly, (vii) their final conclusion about the shape of the landscape made based on the FLA results was incorrect (we will talk about those later in Sec. 5).In this paper, we avoided the abovementioned drawbacks in our FLA by conducting it for numerous benchmark problems (83 in total) with large solution populations of size 1000, and by implementing four different neighborhood operators.In addition, we proposed a novel entropy function designed exclusively for SALBP1 and reached a comprehensive conclusion regarding the structure of the landscape of SALBP1 based on a larger number of statistical indices.
The results of the present study can be used to determine the class of heuristic optimization algorithms that will be more effective in searching the solution space and finding near-optimal (if not optimal) solutions.

Problem definition and notations
To manufacture a product on an assembly line, the total amount of work must be partitioned into a set of n elementary operations called tasks, which should be completed on a number of loaded stations (m) given a fixed cycle time (c).Performing task j requires certain equipment of machines and/or skills of workers and takes time tj.If the set of tasks Tk is allocated to workstation k, then the assembly time of workstation k is tt k = ∑tj, ∀j ∈ T k .The goal of solving SALBP1 is to determine the minimum number of stations, considering the following assumptions: − One homogeneous product is mass-produced, − The production process is predetermined, − The line is paced with a fixed cycle time c, − Tasks are performed without preemption and their operation times tj are deterministic, − The only assignment restriction is the precedence constraints, which are either already known or are the output of an ASP subproblem solved prior to the ALB problem, − The product of the assembly line was processed sequentially and consecutively from station 1 to station m. − Assembly machines and workers in all stations are similar.
Tasks and their relations can be visualized using a precedence graph (as depicted in Fig. 1(a)) containing a node for each task, the time of each task, and arcs for designating precedence constraints.The assembly precedence matrix (APM) is an n×n matrix in which each entry is denoted by a binary variable apmij that takes the value of 1 if task i is the predecessor of task j and 0 otherwise.For the sample solution presented in Fig. 1(b), the number of workstations is m = 4, the cycle time c = 10 minutes, the total assembly time ttotal = t1 + t2 + … + t7 = 1 + 5 + 4 + 3 + 5 + 6 + 5 = 29 minutes, and the theoretical lowerbound for the number of stations is m* = ⌈ttotal /c⌉ = ⌈29/10⌉ = ⌈2.9⌉= 3.The precedence constraint for Task 2 indicates that its processing requires all its predecessor tasks {1} to be completed.Task 2 must be completed to allow all successors {3, 5, 6} to start.Task 1 is the direct predecessor, and Tasks 3 and 5 are the direct successors of Task 2.

How huge is the search space of the SALBP1 problem?
The total number of possible permutations of n tasks is n! and the minimum number of possible workstations for assembling a particular task is m*.Therefore, the total number of feasible and infeasible solutions (i.e., the size of the entire search space) equals n!(m*) n , which clearly shows that SALBP1 is NP-hard because of the combinatorial explosion of its solution space.For example, for the small precedence graph shown in Fig. 1 with n = 7 tasks and m* = 3 (the theoretical lower-bound for the number of stations), the size of the entire search space (including feasible and infeasible solutions) equals 7!×3 7 = 11,022,480.
In addition, because a solution encoding (such as the one shown in Fig. 1(c)) has 2n cells, the diameter of the search space (i.e., the maximum distance between any two solutions) equals 2n as all cells in the two solutions may be different.In finding a feasible solution to SALBP1, the precedence relations between the tasks must be observed to accommodate technological and organizational conditions.This adds more complexity to finding the optimal solution that is also feasible.For the precedence graph shown in Fig. 1 with n = 7 tasks, m* = 3 stations, and Cycle time c = 10 minutes, the total number of feasible solutions that respect precedence relations is only 364 solutions (out of more than 11 million!).This clearly shows how hard it is to find the optimal feasible solution in the vast search space of the SALBP1.For more clarification, all the feasible solutions to the sequences [1,2,3,4,5,7,6] (13 in total) and [1,2,4,5,7,6,3] (16 in total) are shown in Fig. 2. For the solutions in Fig. 2(a), the minimum number of workstations is m = 4, and for the solutions in Fig. 2(b), the minimum number of workstations is m = 3, which is equal to the theoretical optimal number of stations m* = 3.Also, in general, the maximum number of workstations is equal to the number of all tasks (e.g., n = 7) which refers to the situation of each task being processed in a separate workstation.It is noted that the remaining 364 − (13+16) = 335 feasible solutions are based on other feasible sequences (i.e., those that comply with the precedence relations) of the tasks.

Steps of Fitness Landscape Analysis
The landscape of a problem is completely defined by properties such as the objective function, neighborhood, and types of solution representation.A graph G = (V, E) can be used to define the search space of a problem, where V is the set of vertices and E is the set of edges.Each vertex corresponds to a solution to the problem represented by some encoding, and each edge corresponds to a move operator used to generate new solutions.
Two solutions s1 and s2 are neighbors if s1 (resp.s2) can be reached from s2 (resp.s1) by implementing a move operator.Neighbor solutions are connected via an edge in the graph.Computing and analyzing statistical measures, such as the distribution and correlation of local optima in the landscape of the problem, can guide and enable us to infer the ruggedness and shape of the fitness landscape, thus proposing a suitable metaheuristic in that context (Talbi, 2009).
In this section, we present the steps of our fitness landscape analysis for the SALBP1, briefly listed as follows: 1. Selection of Test Problems -Our FLA is performed on 83 test problem instances based on 11 sample assembly line balancing problems with various cycle times.2. Generation of an Initial Population -For each test problem, we first generate an initial uniform random population U of 1000 random starting assembly line configurations (solutions), each encoded using the method shown in Fig. 1(c).

Performing Local Search -A simple hill-climbing local search algorithm is applied to each solution until either a local
optimum is reached or a preset number of iterations (e.g., 20) is completed.We then define the optimal population O as the set of all locally optimal solutions obtained from the local search.Calculation of FLA Measures -By having populations U and O, FLA is performed for the studied test problems by calculating some statistical measures of the correlations and distribution of the local optima in the search space of the SALBP1 problem.

Step 1: Selection of Test Problems
Because the structure of the problem instance affects the fitness landscape of a problem, we investigated 83 SALBP1 instances from 11 different SALBP1 benchmark problems with various cycle times taken from the well-known Scholl database1 , as shown in Table 1, to reach a reliable and context-free conclusion regarding the fitness landscape shape of the general SALBP1 problem.Table 1 also presents the size of the solution space of each benchmark instance calculated based on the discussion in Sec.2.1.

Step 2: Generation of an Initial Population
To generate the set U, an initial uniform random population of solutions of size N (= 1000) with sufficient diversity, we first created N random permutations of n tasks (called priority lists) and then built solution representations based on the lists.To do so, we employed two types of task assignment procedures, shallow and deep, each producing half of the population, as described below.Shallow task assignment: In this procedure, a random priority list (permutation) of tasks (named L) is generated, and then the first station is created with a total time tt1 = 0 and idle time I1 = c.Next, through an iterative process and starting from the left, the first unassigned task in L (say task j), which has a completion time less than or equal to the idle time of the first station, is assigned to that station, and the station's total and idle times are updated as tt1 = tt1 + tj and I1 = I1 -tj.After each assignment, list L is scanned again to find other tasks to be assigned to the current station.If such a task cannot be found, the number of stations is incremented by one (i.e., a new station is created with total and idle times equal to zero and c, respectively) and the list is re-scanned from the left to assign another task to the newly created station.This procedure is repeated until all the tasks are assigned to some station.
We name this method 'shallow' because the resulting solutions may be infeasible due to the precedence relations of the tasks being not considered.However, the generation of such solutions contributes to the diversity of the initial pool, which is essential for the success of metaheuristics.For example, applying this procedure to the random priority list L = [3, 1, 4, 5, 7, 6, 2] for the tasks presented in Fig. 1(a) generates the infeasible solution (3, 1), (1, 1), (4, 1), (5, 2), (7, 2), (6, 3), (2, 4) after sorting based on the station number.This procedure is repeated for N/2 random priority lists to create half of the initial population U.

Deep task assignment:
In this procedure, a random priority list (permutation) of tasks (L) is generated.Then, in each iteration i of the procedure, the first unassigned task in L (e.g., task πi = j) is selected and assigned to the current station k only if Ik ≥ tj and all direct predecessors of task j (DPj) have already been assigned.For example, applying this procedure to the random priority list L = [3, 1, 4, 5, 7, 6, 2] for the tasks presented in Fig. 1(a) generates the solution (1, 1), (4, 1), (7, 1), (2, 2), (3, 2), (5, 3), (6, 4) after sorting based on the station number.Basically, deep task assignment is based on the shallow task assignment, but additionally considers precedence relations.
We name this method 'deep' because the resulting solutions are merely feasible, as the precedence relations of the tasks are considered.This procedure is repeated for N/2 random priority lists to create the other half of the initial population U.The generation of random priority lists, followed by the above two procedures, constitutes the algorithm used to create a uniform random population U in the landscape analysis of SALBP1.

Step 3: Performing Local Search
After generating the initial population U of 1000 solutions using the shallow and deep task assignment methods, the steepest descent (hill-climbing) local search is performed on each solution to reach a solution with a local minimum fitness value.In each iteration of the local search, a neighboring feasible solution to an initial solution is generated using a randomly selected neighborhood generation operator (introduced in this subsection) and is kept only if it improves (decreases) the fitness function value.Iterations repeat until either a local optimum is reached, or a certain number of non-improving iterations are repeated.

Neighborhood-Generation Operators
We use four different neighborhood-generation operators-exchange, insertion, inversion, and float shift-to generate neighboring solutions, as explained below.The fourth operator, float shift, is originally designed by us.
Exchange operator: This operator randomly selects two tasks from a task priority list and exchanges (swaps) their positions.
As an example, in Fig. 4, consider the priority list L1 and its corresponding solution s1 obtained by the deep task assignment.Two tasks, 3 and 2, were randomly selected and switched to yield a new list L2.Then, the new neighboring solution s2 is generated by implementing the deep task assignment on L2 and sorting based on the workstations.By applying this operator, the total size of the neighborhood for any solution equals the number of ways two tasks can be selected in a list of size n, which can be expressed as C(n, 2) = n(n − 1)/2, where n is the total number of tasks and C( ) is the combination function.Insertion operator: In this operator, a task in the priority list is randomly selected and relocated to another position on the list.For example, in Fig. 5, assume that for the initial solution s1 and its corresponding task list L1, task 2 is randomly selected and moved to the new random position 2, resulting in task list L2 and neighboring solution s2 (after the deep assignment of stations).By applying this operator, the size of the neighborhood for any solution is n(n−1), because n tasks can be selected as the beginning, and n − 1 different positions (all but the current position of the task) can be selected as the destination of the relocation.Inversion operator: This operator randomly selects two positions in the list and reverses the sequence of all the tasks between these two positions.For example, in Fig. 6, for the initial solution s1 and its corresponding task list L1, two positions (1 and 7) are randomly selected, and the new list after applying the inversion operator is L2, which yields the neighboring solution s2 after deep task assignment.Through enumeration of all possible pairs of tasks that are two or more positions apart, it can be shown that for a list of n tasks, by applying this operator, the total size of the neighborhood of any solution is n(n−1)/2 − (n−1) − (n−2) = (n−2)(n−3)/2.Fig. 6.The inversion neighborhood generation operator.
The float shift operator randomly selects a task in a feasible priority list and relocates it to another position in its float set, thereby maintaining the tasks between all its predecessors and successors.Fig. 7 shows the result of implementing this operator on a given feasible priority list for Task 3, which moves from its current 4 th position to the 7 th position, which is within the float of Task 3, F3 = {π3, π4, π5, π6, π7}.The neighborhood size of this operator for a list with length n has an upper bound of n(n − 2) because n tasks can be selected and relocated to n − 2 positions (reserving the first and last positions of the list for the predecessor and successor tasks).More precisely, the neighborhood size is equal to ( ) , where | Fi | is the cardinality (size) of the set Fi.An advantage of the Float shift operator is that it maintains the feasibility of the solutions, while there is no guarantee that solutions obtained from the exchange, insertion, and inversion operators will remain feasible.An important property of any neighborhood generating operator is its connectivity property, which guarantees that any solution in the search space can be reached from any starting solution through successive applications of merely that operator.Fortunately, all the abovementioned neighborhood generation operators possess connectivity properties.Additionally, it is worth noting that after applying the exchange, inversion, or float-shift operators, the number of stations changed from four to three, thereby minimizing the number of required workstations.

Calculating the Fitness of a Solution
Two important properties of any solution to SALBP1 are its feasibility and quality, which are used to guide the search to converge to the final solution by selecting the best neighbor of a solution in each iteration.A solution s = (π1, w1), (π2, w2), …, (πn, wn) is feasible if any task πi (∀i = 2, …, n) is not a predecessor of any already assembled task πj (∀j < i).More formally, solution s is feasible if the number of its unsatisfied precedence constraints (UPC) equals zero.In fact, UPC(s l ) is the number of times a predecessor of a particular task is assembled after that task has been assembled in solution s l , calculated in Eq. ( 1), where the variable apm is defined as in Sec.2: In addition, to measure the feasibility of any solution s l generated and evaluated during the search, we define the Percentage of Unsatisfied Precedence Constraints, PUPC(s l ), as follows: As part of the fitness of solution sl, we are also interested in minimizing m, the number of workstations in the assembly line, as well as the Smoothness Index SI(sl) calculated in Eq. ( 3), as presented by (Baykasoglu, 2006;Hong & Cho, 1999), in which ttk and ttmax are defined earlier.A small SI(s l ) for solution s l implies that the processing times in all workstations are almost equal, which contributes to a smooth and level distribution of the workload and enhances worker and equipment utilization.

(
) expressed as , can now be defined as a weighted aggregation of the above criteria l s of The total fitness function The first term in Eq. ( 4) counts the total number of unsatisfied precedence constraints among all checked precedence constraints for solution s l , and the second term calculates the total weighted sum of the smoothness indices and the number of stations in the assembly line.In the SALBP1 we try to minimize the function f(s) so that feasible and smooth solutions with least workstations are favored, and therefore to bias the search toward feasible solutions (as feasibility is the most important feature of a solution), we set the values of the weighting factors in (4) to α = 3 (the coefficient of PUPC), β = 1 (the coefficient of SI), and γ = 2 (the coefficient of m), thus prioritizing feasibility over smoothness, and smoothness over the number of workstations.

Step 4: Calculation of FLA Measures
Two groups of statistical measures are calculated to describe the properties of the fitness landscape of the SALBP1 problem: Distribution measures and Correlation measures, as defined below: (1) Distribution measures study the topology of locally optimal solutions in the objective and search spaces.For landscape analysis, several measures are calculated for both the U and O populations, together with their relative variations (∆).Our investigated distribution measures are Dmm(P) (the average distance between each pair of solutions) and its variation ∆Dmm, ent(P) (entropy in the search space) and its variation ∆ent, Amp(P) (amplitudes of solutions) and its variation ∆Amp, and Gap(P).It is noted that we designed a new mathematical formula for calculating the entropy in the search space of the SALBP1, as presented in Sec.3.4.1.Also, whenever required, we use the Hamming distance dH(s1, s2) to measure the distance between any two solutions, defined as the number of bitwise differences in their encodings.For example, dH(s1, s2) = 7 for solutions s1 and s2 in Fig. 4. (2) Correlation measures are used to analyze the correlation between the relative distance of solutions and their quality, as well as the correlation between the distance of solutions to the best-known solution and their quality gap.These measures include Lmm (average length of the walks), ρ(dH) (autocorrelation function), and FDC (fitness-distance correlation).FDC analysis can be visualized using the FDC scatterplot, presented in the Sec. 6.
Table 2 lists and defines the major distribution and correlation measures typically used for FLA, expressed for a general population P of size | P |.When needed, P is replaced with populations U or O.More details about these measures can be found in (Talbi, 2009).

A New Entropy Measure for SALBP1
Entropy, or lack of predictability, is a concept first used in the 19 th century for describing the degree of disorder or randomness in a thermodynamic system, which then was statistically formulated by Paul Shannon in 1948 as a measure of random losses of information in telecommunication signals.The concept was later evolved as a measure of irregularity or diversity in a population in Information Theory.Entropy has found wide applications in many scientific fields such as thermodynamics, physics, information sciences, biology, cosmology, and economics.In the context of optimization problems, the concentration or dispersion of solutions can be assessed through the concept of entropy.When the entropy is weak or close to zero, it indicates a concentration of solutions, whereas high entropy implies a significant dispersion of solutions within the search space.Although Shannon proposed the general formulation for the entropy of population X in terms of a discrete set of probabilities p i as follows: 1 ( ) ( )log ( )

 
The normalized average Hamming distance between any two solutions of the population set P. It is an indicator of the concentration of a population P of solutions in the whole search space S. ( ) The diversity of solutions in the search space, where qijk is the number of all solutions in P that have task i in the j-th position of the assembly sequence assembled in workstation k, or mathematically, πj = i and wj = k.It is an indicator of the diversity of solutions in a population.( ) ) )  Amplitude: the relative difference between the objective values of the worst (i.e., maximum) (s w ) and the best (i.e., minimum) (s b ) solutions in the population set P.
( ) ( ) The relative differences between the amplitudes of initial (U) and locally optimal (O) solutions.It indicates the distribution of optimum solutions in the search space and reveals whether the search space is rugged or not.


Autocorrelation function: computes the correlation of all solution pairs having a Hamming distance of dH in the search space, where f̅ is the average fitness value of the whole population and σ f 2 is its variance.There is no universally applicable formulation for entropy as it varies depending on the specific optimization problem being addressed.To the best of our knowledge, there has been no entropy formulation customized for Assembly Line Balancing problems in general, and the SALBP1 in particular.To fill this gap, in this paper, we are introducing an original formulation for entropy as follows:

cov( , )
( ) ( ) in which qijk is named the locality variable and counts the number of all solutions in population P that have task i in the j-th position of the assembly sequence assembled in workstation k.Mathematically expressed, πj = i and wj = k.
In a population P of solutions for SALBP1, the maximum entropy ent(P) = 1 indicates the highest possible diversity of solutions, which occurs in the following cases: (1) All solutions have distinct task sequences; or (2) If two or more solutions have completely or partially equal task sequences, same tasks with equal positions among those solutions are allocated to workstations in a different way in each solution.In either of these cases, for any task i, there will be only one unique combination of j and k (among all n×m possible combinations) for which qijk = 1, and for the remaining n•m−1 combinations of j and k, we have qijk = 0. Also, in a population P of solutions for SALBP1, the minimum entropy ent(P) = 0 indicates the lowest possible diversity of solutions, which occurs when all solutions in the population are completely similar and therefore only for one unique combination of i, j, and k, qijk = | P |, and for all remaining n 2 m−1 combinations of i, j, and k, qijk = 0.

Fitness Landscape Analysis Computational Results
For conducting the fitness landscape analysis for the SALBP1, the hill-climbing local search algorithm was applied to a population U of 1000 solutions for each of the 83 instances introduced in Sec.3.1, to obtain 83,000 local optimal solutions, and then the statistical measures introduced in Sec.3.4 were calculated for each instance.In addition, the average values of all measures for each test problem set, as well as the total averages of all measures over all the problem sets were calculated.The computational results are reported in Table 3.Here we expand on some noticeable facts: 1.While most ∆Dmm values are positive, there are few negative values, mostly in the Tonge dataset, which occur when the average distance between all local optima pairs were greater than the average distance between all pairs of initial random solutions.This implies that in those instances, local optimal solutions cover a larger part of the search space compared to the initial random solutions.2. While most ∆ent values are positive, there are few negative values, mostly in the Sawyer, Scholl-297, and Tonge datasets, which imply that the average diversity of locally optimal solutions in the search space was greater than those of initial random solutions.3.While most FDC values are positive, there are few negative values, mostly in the Heskiaoff dataset, which show that the landscape is 'misleading', meaning that solutions closer to the global optimum are not necessarily high quality or feasible, leading the search away from the optimum.This fact will be further discussed in Sec. 6.

Analysis of Distribution Measures
The entropy variation (∆ent) and the average normalized pairwise distance variation (∆Dmm) in the search space play key roles in evaluating the distribution of the obtained local optimal solutions in the landscape.There are three combinations of the values of these two measures (Talbi, 2009): Case 1: A search space with high ∆ent and high ∆Dmm is called One-Massif or Massif Central, meaning that most local optimal solutions are concentrated in a dense and small region, as shown in Fig. 8(a), first row.Therefore, using a single-solution metaheuristic to search the space is more suitable than using a population-based metaheuristic because the latter would waste more time searching for relatively sparse areas of the space.Case 2: A search space with high ∆ent and low ∆Dmm is called Multi-massif, meaning that local optima are localized in several attraction areas, as shown in Fig. 8(a), second row.Therefore, population-based metaheuristics are better choices than single-solution-based metaheuristics because the latter cannot sufficiently explore various solution clusters scattered across the search space.Case 3: A search space with low ∆ent and low ∆Dmm is called Uniform, meaning that local optima are scattered rather evenly in the search space, as shown in Fig. 8(a), third row.Therefore, starting from a random initial solution, the local search converges rapidly to a nearby local optimum.(Talbi, 2009).
Based on the computational results in Table 3, we plotted the scatterplot of average ∆ent vs. average ∆Dmm for all 11 benchmark problem sets in Fig. 9, in which in addition to the benchmark sets, in which the total average with ∆̅ Dmm = 0.068 and ∆̅ ent = 0.014 is shown as a larger red blob.Since both ∆̅ Dmm and ∆̅ ent are small, 'Case 3' above is established, suggesting that overall, the local optima in SALBP1 are generally uniformly scattered over the search space.Further analyses specific to certain problem sets or instances therein are provided below: i.The relatively higher average value of ∆Dmm for the Mitchel test problem (0.3136), especially for c = 14 (instance no.37 in Table 3, with ∆Dmm = 0.8890) indicates that its local optima are located far from each other over the search space.On the other hand, since the average value of ∆ent for this test problem is almost close to the ∆̅ ent = 0.014, it can be concluded that the local optimal solutions of this problem are uniformly distributed in its search space.But their distance of solutions in this space is on average greater than the distance of local optimal solutions of other sample problems.
ii.The relatively high average values of ∆Dmm and ∆ent for the Jackson test problem (0.2097 and 0.1763, respectively) indicate that, unlike the other 10 test problems, 'Case 1' above is true for this problem, meaning that the distribution of the local optima over the search space of this problem is One-massif.iii.The ∆Amp measure is another important criterion that describes the objective space.The total average value of ∆̅ Amp = −0.144for all problem sets implies that the landscape is almost plain or rugged plain (as shown in Fig. 8(b)), and there is no clear difference between the qualities of the local optimal solutions in different areas of the search space.
iv.As shown in Fig. 10, the Arcus-83, Arcus-111, and Tonge problem sets have unusually larger ∆Amp values (−0.334, −0.421, and −0.252, respectively) compared to the other benchmark sets, suggesting that their landscapes are somewhat like a valley, and the qualities of obtained local optimal solutions were clearly better than the qualities of their starting random solutions.v.While most ∆ amp values are negative in Table 3, there are few positive values in the Scholl-297 dataset, indicating that the solution quality gap, i.e., f(s best ) -f(s worst ), and the diversity of the initial population were larger than those of the optimal population.This means that the local search was able to effectively converge the starting solutions toward relatively comparable local optimal solutions.
vi. Across all benchmark problems, the relatively large total average gap between the global (or best-known) solutions and the obtained local optimal solutions (Ga ̅ p ̅ (O) = 1.605 = 160.5%),as shown by the red blob in Fig. 11, indicates that global optimum solutions in the SALBP1 problem are hard to find, and basic local search methods are unlikely to discover even near-optimal solutions efficiently.In particular, the Arcus-111, Kilbridge & Wester, and Heskiaoff problem sets have unusually large gaps (281.8%,395.7%, and 540.1%, respectively), and hence, are more difficult to be solved to optimality compared to other benchmark sets.

vii.
The two datasets Mertens and Jaeschke, which have the smallest number of tasks (7 and 9, respectively), stand out among other problem sets due to their near-zero average ∆Dmm, ∆ent, and ∆ent.The average gaps (Ga ̅ p ̅ (O)) of these two datasets are also the smallest among all problem sets.The mentioned measures are so low because their populations of initial and optimal solutions (U and O) are not substantially different; meaning that the initial solutions generated by the shallow and deep task assignment procedures were already good enough to be close to local optimal solutions.This may bring up the question that how it is possible to produce such high-quality initial solutions even before improving them through a local search.This is because of the small number of tasks in those datasets that lead to smaller solution spaces compared to the huge search spaces of other datasets.In addition, the deep task assignment procedure guarantees the generation of feasible solutions with their PUPC = 0 in the fitness function (4), which are naturally of good quality (i.e., have small fitness values).As a result, we do not recommend these two datasets be included in future studies of SALBP1 due to their simplicity and lack of challenge in solving.

Analysis of Correlation Measures
The average length of the walk measure (Lmm) provides an estimate of the landscape's ruggedness and the correlation between the distances of solutions to the best-known or global optimum solution and their quality gap.A small value of Lmm implies that a short walk (or number of steps) is required to reach a nearby local optimum from a random solution; hence, the landscape is rugged.The value of Lmm depends on the test problem and its cycle time.For example, analyzing the computational results in the Table 2 reveals that the landscape of the Mertens test problem (with Lmm = 0.01) seems to be more rugged than other problems, and hence converges faster to a local optimum solution.
In addition, we calculated the autocorrelation measure for three Hamming distances of dH = {2, 4, 6}, which yielded almost equal average values for ρ(2), ρ(4), and ρ(6), as shown in the last row and the rightmost columns of Table 3.This implies that variations in the objective values of any pair of solutions are not sensitive to their distance from each other; hence, the landscape is rugged.Of course, it should be noted that the Mitchel test problem is an exception to this rule, and due to the clear difference between the values of ρ(2), ρ(4), and ρ(6), the search space for this problem is flat.
Finally, we calculated the total average FDC value of all benchmark problems (= 0.594).This value is not sufficiently large to conclude that the problem is of the 'misleading' type, implying that the move operator will not guide the search near the global optimum.Among the studied problems, the Heskiaoff problem set had the smallest average FDC value (=0.20), implying that finding its global optimal solution is more difficult than for other problems.To analyze the Heskiaoff problem in more depth, we plotted the FDC diagrams of 1000 local optimal solutions for six different cycle times: c = 138, 205,216,256,324,and 342. Fig. 12 presents the obtained FDC plots, in which the hollow (blue) circles indicate infeasible solutions and solid (red) dots show feasible solutions.As stated earlier, the maximum Hamming distance between any two solution encodings of an n-task assembly line is 2n, and so for the Heskiaoff test problem with 28 assembly tasks, the maximum value on the x-axis in FDC plots will be 56.For this test problem, the FDC value is very small (even negative for scenarios with cycle times of 205, 256, and 342), which shows that there is no correlation between the distance of the solutions to the bestknown solution and their fitness.Therefore, achieving a global optimal solution for this test problem is difficult and requires an effective search algorithm.
Analyzing the FDC plots in Fig. 12 shows that: − Almost all the distances of the local optimum solutions to the best-known solution are within the interval [18,55], indicating that the FDC is weak; therefore, the problem is difficult to solve.Moreover, in general, feasible solutions have shorter Hamming distances to the best-known solution than to infeasible solutions.− Feasible (red) and infeasible (blue) solutions overlap at much fitness-distance points.This indicates that discriminating infeasible solutions from feasible solutions is difficult or impossible.It was noticed that in some cases (especially in Fig.As a conclusion to the FLA, it can be stated that the fitness landscape of the SALBP1 is a rugged plain (as shown in Fig. 8(b)) with uniformly distributed local optima and no apparent correlation between the quality and distance of solutions to the global optimum.Therefore, a single solution-based metaheuristic algorithm is more suitable for solving this problem.

Conclusion
Unlike its name, the Simple Assembly Line Balancing Problem Type 1 is not a simple problem.An optimal solution must satisfy the precedence relations among all tasks and assign as many tasks as possible in as few assembly workstations as possible, such that the total task time of each workstation does not exceed the given cycle time and the workloads of the stations are as even (balanced) as possible.The problem is proven to be NP-hard, and thus, finding globally optimal solutions for large instances is practically prohibitive.
Despite the considerable amount of work on solving SALBP1, there is no methodical study of the structure and landscape of the problem; hence, choices of using (meta)heuristic solution methods have mostly been arbitrary and lacking strong justification.In this paper, for the first time, we present a comprehensive and in-depth Fitness Landscape Analysis to understand the structure and topology of the solution space of SALBP1.The analysis was carried out using seven distribution and three correlation measures for 83 instances taken from 11 benchmark problem sets known in the literature, and revealed that the problem's landscape is more like a rugged plain with local optimums scattered all over the search space uniformly, suggesting that a single-solution-based metaheuristic method would be more successful in searching the space and finding an optimal or near-optimal solution than a population-based metaheuristic.
It is noted that since various benchmark problem sets with different number of tasks, precedence relations, and cycle times were used, the outcome of the presented FLA adequately covers the SALBP1 in its totality, and therefore can be safely used to describe the fitness landscape of the problem.Also, the new formula developed for measuring the entropy of solutions in the search space of the SALBP1 is another novelty of the present paper.
FLA has seldom been performed for assembly line balancing problems; thus, as a future research direction, we propose conducting such analyses for other types and sizes of SALB problems such as SALBP2, SALBP−E, SALBP−F, as well as for GALB problems including 2S-ALB, MALB, and UALB.
(a) Precedence relations of 7 tasks (circles) with their times (tj) (in minutes) shown above them for the Mertens (1967) benchmark problem.(b) A feasible assembly line configuration of the tasks assigned to 4 workstations.(c) Encoding of the configuration shown in (b).

Fig. 3 .
Presents the pseudocode of the used local search.

Fig. 3 .
Fig. 3. Procedure of the hill-climbing local search performed on each initial solution of the population U.
between normalized average Hamming distances of initial uniform random solutions (Dmm(U)) and the normalized average Hamming distances of local optimum solutions (Dmm(O)).
The relative distance between the entropy of initial uniform random solutions (ent(U)) and the entropy of local optimum solutions (ent(O)).


the relative difference between the objective value of local optimum solutions and the global optimum solution s*.Average length of walk: required number of steps needed for finding a local optimum solution from a random solution si ∈ U.
Correlation: the correlation between the quality of local optimum solutions and their distance from the global optimum solution.The set F = {f1, f2, …, fn} contains the fitness values of solutions and the set D = {dH1, dH2, …, dHn} contains the Hamming distances of the solutions to the best-known solution.
(a) Distribution of local optima in the landscape affected by ∆ent and ∆Dmm; (b) Four types of search space

Fig. 9 .
Fig. 9. Average values of ∆ent and ∆Dmm for the studied 11 SALBP1 benchmark problem sets (blue dots), with their total average shown as a large red blob (shadowed).

Fig. 11 .
Fig. 11.Average gaps (Gap(O)) between the obtained local optimal solutions and global or best-known solutions of the solved 11 SALBP1 benchmark problem sets (blue asterisks), together with the total average gap over all the problems (the red asterisk). 12

Table 1
The specifications of the investigated benchmark test problems NT = No. of tasks (n), CT = Cycle time (c), LB = Lower bound of the number of stations (m*),TMIN = Theoretical minimum size of the entire search space, n!(m*) n , TMAX = Theoretical maximum size of the entire search space, n!(n) n

Table 2
Distribution and Correlation measures and their average numeric values for FLA

Table 3
Results of computing statistical measures for 83 SALBP1 instances from 11 benchmark problem sets.

Table 3
Results of computing statistical measures for 83 SALBP1 instances from 11 benchmark problem sets(Continued)

Table 3
Results of computing statistical measures for 83 SALBP1 instances from 11 benchmark problem sets(Continued)