Solving the buﬀer allocation problem using simulation-based optimisation

In production lines, buﬀers function as a means to decouple stations, which reduce the eﬀect that station failures and varying process times have on the complete line’s throughput. However, adding larger buﬀers can be costly, for example, in the automotive industry where it results in increased working capital. This manuscript addresses the buﬀer allocation problem (BAP), seeking the smallest total buﬀer size while meeting a prescribed throughput by employing a simulation-based optimisation approach. A Tabu Search algorithm searches the solution space for the optimal buﬀer conﬁguration while a discrete event simulation model evaluates each conﬁguration, accounting for the machine (un)reliability. Since the multiple simulations add a sizeable computational burden, our approach introduces a novel neighbourhood search mechanism, which borrows from the Theory of Constrains. Solving test sets available in the literature suggest that this approach is 18 times faster than prior Adaptive Tabu Search approaches for small problems, and more than ﬁve times faster for medium-sized problems.


Introduction
Factors such as uneven processing time and station failure create instability in the produc-tion line, which has a negative effect on throughput. These factors can cause other stations to become either blocked or starved. Buffers are locations in which semi-completed parts, also called work-in-progress (WIP), are stored within a production line to decouple segments of the line. If a station that has a buffer for its exit material fails, all downstream stations can remain operational while there are parts in their respective buffers. If the station cannot be repaired before the buffer empties, and start supplying downstream, the production of the downstream station will stop. If there is a buffer before the failed station, all upstream stations will be able to produce while there is space in the buffer. The bigger the buffer, the longer the production line can run independently.
By including buffers, the required capital to realise and operate the production line is increased. WIP is also increased, leading to higher running capital. Minimising capital investment, by reducing the number of buffers while providing sufficient buffer to reduce the effect of equipment instability on production availability, makes the optimal placement of buffers in the line a vital problem to solve. The problem of allocating buffers (location and size) optimally is known as the BAP. In this study the BAP is solved for a serial and non-serial heterogeneous process time, unreliable production line.
The BAP is frequently solved using a two-step approach [11]. The evaluation step calculates the throughput of a production line for a given buffer configuration. Various versions of evaluation have been employed in the literature. Analytical methods have been used for the BAP. Exact analytical methods are only applicable to small-sized problems. For larger systems, approximation methods can be used. Methods employing algebra, calculus or probability theory are used to approximate the throughput of the line. A method known as the decomposition method is widely used in BAP as an evaluative method. The literature found on the BAP using the decomposition method is limited to serial production lines. Alternatively, simulation is used to determine line throughput. Simulation is the imitation of a system being studied which is performed on computers by creating a model, or digital twin, of it. Various line topologies can be modelled, including tree structure lines. Each station can have different random variables describing the processing times. Failure of stations can also be included in the model. But the disadvantage of simulation is the time it takes to evaluate a scenario. This can be reduced by creating a discrete event simulation (DES) model specifically for the application that does not have the additional general tools available in commercial simulation software.
Simulation on its own, as well as analytical methods, are not optimisation techniques per se. Hence, the second generative step moves through the solution space and considers the various buffer configurations to be evaluated [11]. The generative step aims to find a (near) optimum buffer configuration in the shortest time possible. Methods employed include complete enumeration, traditional search, heuristic search, and metaheuristics. Complete enumeration evaluates all possible configurations and is only applicable to small problems. Traditional search methods and heuristics test only a subset of the buffer configurations. The disadvantage is that they get stuck at a local optimum. Metaheuristics add mechanisms to the way it moves through the solution space and have the main advantage that they can escape local optimum. The methods start with an initial buffer configuration B B B. With the given buffer configura-tion, the evaluative method can determine the throughput of the line, f (B B B). Because the initial configuration might not be the optimal solution, the generative method generates new buffer configurations for the evaluative method to test. This is repeated until a (near) optimal solution is found.
In this study, simulation-based optimisation (SBO) is used to solve the BAP. A DES model created in Java is used as evaluative method. An adaptive tabu search with theory of constraints neighbour generation is proposed to solve the BAP for both a serial and non-serial heterogeneous unreliable production line. The performance of the algorithm is investigated for small-size as well as medium and large-sized problems. Finally the scalability of the proposed solution is tested on a non-serial large sized problem from industry.
The rest of the article is organised as follows. The assumptions of the model and the BAP is described in the next section based on a comprehensive literature review on the BAP. The method of using an evaluative and generative element iteratively to solve the BAP is studied. In §3 the SBO approach is developed. A simulation program is created in a general programming language, Java. This program is a newly designed blueprint using the library of the stochastic simulation in Java (SSJ). The program simulates any size of a serial production line by just specifying the number of machines, random parameters, replication length and simulation length as well as the buffer vector. In §3.2 the generative method, two metaheuristics are compared for finding the optimal buffer configuration for maximising throughput. The first method is based on the works of Demir et al. [10]. The second method is a proposed improvement on the first to improve the evaluation time. In §4 this artefact is tested on serial production lines. First, the validity of the simulation model is checked. Then the two generative methods are compared across various production line lengths and random parameters. The best, that is the method that achieves a near-optimal solution within a reasonable amount of time, will be used with the simulation model to solve the larger and more realistic BAP. Lastly, the proposed methodology is applied to a body shop production line. The use of SBO in the BAP will help to increase our knowledge on its effectiveness with solving the BAP for complex lines such as the body shop.

Literature review
The BAP deals with finding the optimal buffer configuration to incorporate in the produc-tion line to achieve a specific objective [11]. Total buffer size N is allocated among the K − 1 intermediate buffer locations in the production line with K machines (Figure 2). Associated with each machine M k , k ∈ K K K are several parameters. The first, t k , denotes the processing time required at machine k. Secondly, f k denotes the time between failures of machine k while r k denotes the repair time of machine k. We denote the line's total buffer size as N = where B k denotes the number of buffer units at location k.
Due to the complexity of the BAP, numerous publications are available in the literature. For a comprehensive study of the available literature on the BAP see the works of Demir et al. [11]. The BAP can be expressed in three main forms depending on the objective function.
Objective 1 Maximise the throughput of the line for a given fixed number of buffer N [10,19,20].
The BAP formulation is then expressed as where N is a predefined value.
Objective 2 A prescribed throughput, f , must be achieved with the minimum total buffer size [12,15,21]. The formulation then changes to Objective 3 Minimise the average WIP inventory, Q(B B B), for the buffer configuration subject to the total buffer size and prescribed throughput. The problem is then formulated as where both N and f are predefined values.
Solving the BAP for objective 2 is the aim of this article. Production plant designers are faced with the problem of achieving a balance between throughput and WIP in buffers.
The BAP solution approach is also classified based on line parameters. A production line where the processing time of all stations are similar is called a homogeneous production line. Conversely, if the process times are different, it is known as a heterogeneous line [25]. In some cases, literature ignores the possibility that failures may occur, or that failures are negligible, in which case the line is assumed to be reliable. However, when failures are explicitly addressed in the model, the line is referred to as being unreliable [11]. BAP has been proven to be an NP-Hard combinatorial optimisation problem [11]. A complex system such as a production line cannot be expressed by a simple mathematical equation where the throughput of the line can be calculated for a given buffer configuration. Due to the lack of this algebraic relation and the problem being NP-Hard, a method with two elements are frequently used to solve the BAP. A generative and evaluative method is executed in an iterative manner. The generative method generates various permutation of buffer allocation from the solution space for the evaluative method to evaluate and determine the throughput of the line.

Generative method
The simplest way to find the best buffer configuration is to generate all the possible buffer permutations and test every single one of them. This is known as complete enumeration. Thus, complete enumeration is only possible for very small problems.
Because it is not possible to generate all the possible buffer permutations, a subset of them can be generated. The question is then how do we select this subset of permutations from the solution space that needs to be tested and how is the best then found. Various search methods have been used in literature. Traditional search methods have been applied to the BAP. Methods such as gradient search algorithm, knowledge-based methods as well as degrading ceiling local search heuristics have been used [14,15,20]. These methods start with an initial solution. They then generate all possible permutations of the initial solution that can be reached in one step. A step can be seen as increasing the buffer space limit with a specific value and decreasing another with the same value. All the permutations are tested, and the buffer configuration that results in the highest throughput is then considered as the best solution. This is repeated until no other improved buffer configuration can be found. Although these methods are more efficient than complete enumeration, they cannot escape local optima.
Unlike the previous search method, metaheuristics are allowed to select a permutation that results in a worse throughput than the current best. This allows the metaheuristics to escape a local optimum and move to other areas in the solution space in the hope to find a better local optimum or, ideally, the global optimum. Metaheuristics have been widely applied in the buffer allocation problem. Typical solution methods in this area are genetic algorithm, tabu search, simulated annealing and ant-colony.
Genetic algorithm is based on the theory of evolution that favour reproduction of individuals with specific traits. In Dolgui [13] two different genotype generation mechanisms are compared for the genetic algorithm. The first increases or decreases an individual buffer by one, the second approach continues increasing or decreasing the gene as long as an improvement in the goal is achieved. Cruz [9] employed a special version multi-objective genetic algorithm to solve the BAP for objective two.
Demir et al. [10] used an adaptive Tabu Search (ATS) to solve the BAP for an unreliable, unbalanced line. The objective is to maximise the throughput of the production line f (B B B), for a given buffer size constraint N . The algorithm distributes a fixed number of buffer between each buffer location B k . In their paper Demir et al. [10] proposed a new ATS and compared it to the simple Tabu Search (TS) referred to as standard Tabu Search (STS). Various changes are made to the STS which are discussed below.
Neighbour generation The moves through neighbouring solutions are depicted by the notation i, j, meaning that a given number of buffers is added to location i and the same number is subtracted from a location j. All the possible i, j scenarios are generated from the current solution. In the STS the size of buffer change at location i, j is set to 1. In the ATS the incremental size is subject to the problem size. It is set to 1 for small and medium-sized problems, i.e., 5 and 10-machine lines. For large problems involving 20-40 machine lines, the size is set to 1% of the total buffer size, rounded up.
Tabu list The full move tabu criterion was employed in both the STS and ATS. If the move i, j produces the better objective function from the various scenarios evaluated then move j, i becomes tabu for a certain amount of iterations until it is moved off the list. The tabu tenure, which is the length of the tabu list, is set to √ ns where ns = neighbour solution space size. The tabu tenure for the ATS is tuned adaptively. Initially, the tabu tenure is set to a predefined minimum value. It is then calculated for each move. If the objective function is improved, the tabu tenure is decreased by 1. If the solution is not improved, it is increased by 1, subject to an upper and lower limit for the tabu tenure.
Intensification The ATS also employs an intensification strategy. If a solution found to be the best does not change for a certain number of iterations, (0.25×N ), the increment (decrement) size is reduced to 1 for large-sized problems.
Restart diversification implements several random restarts during the optimisation. After 12.5 × N iterations, a new buffer configuration is generated randomly. This allows for unsearched areas to be considered.
Continuous diversification forms part of the regular search process. A counter is used to track the number of times a specific move is performed. When the counter reaches a predefined value (12.5 × N ), the move is penalised and made less attractive. The penalty is determined as 10 −4 , but can be adjusted.
Demir et al. [10] also showed that the quality of the initial scenario on which the algorithm starts could affect the quality of the final solution. Three different methods for determining the initial allocation of buffer were compared. Either using the ratio of failure to repair rate, the processing rate or using random initialisation. Experiments showed that using the ratio of failure to repair rate, where the machine with the higher ratio receives more buffer for its exit buffer, results in a good initial solution. The algorithm will continue until one of two stopping criteria is reached. Either after 50 × N iterations or after no improvement is made to the global best for 25 × N iterations.
Demir et al. [12] expanded on the work of Demir et al. [10] to solve the BAP for objective two. The inner loop includes an ATS to obtain a maximum throughput for a given buffer size N as discussed above. Binary search and TS was evaluated for the outer loop. The outer loop sequentially decreases the total buffer size N to find the desired throughput with a minimal buffer. First, the outer loop will use a specified N from the user. In most cases, the initial N is a big number. The inner loop will then determine the best throughput possible for N , by searching for the relevant buffer configuration. If the throughput is higher than the desired throughput, the outer loop can decrease the size of N . Again, the best throughput for the new N is determined. This is repeated until the size of N cannot be decreased as it will result in a lower throughput than required. The proposed method was tested across a wide range of possible line properties. Both binary search and TS for the outer loop were able to solve small to large scale problems. For large problems, the binary search was executed faster than TS, but the authors noted that using parallel implementation of the TS can result in faster execution time than the binary search.
The main advantage of metaheuristics are that they can escape local optimum. Their main disadvantage is that they are not problem specific and thus, they have to be adjusted to produce solutions for a specific problem.

Evaluative method
Every time the generative method generates a new buffer configuration B B B, that is new values for the buffer's maximum capacity, B k , the buffer configuration needs to be evaluated. Recall that a line's throughput is a function of its buffer vector, f (B B B). A large number of buffer permutations needs to be evaluated to get the best solution. It would have been ideal if a simple algebraic formula could be expressed for f (B B B), as an algebraic formula would be quick to solve. Unfortunately, machine failure and processing time are stochastic in nature. The algebraic relation between the buffer configuration and throughput of the line is very limited. The next best option is to approximate the line's throughput, using analytical methods. The line reliability and topology influences the approach we can use.
For very short, serial production lines exact results based on queuing models are possible [11]. For larger production systems, approximation methods are used. Two approximation methods studied in the BAP literature are the decomposition method [9,10,12,19] and aggregation method.
The general idea of the decomposition method it to break the line up into L smaller lines, while the aggregation method combines the stations. These methods are very similar in implementation. The decomposition method is time-efficient in the calculation of the line's throughput and is based on the theory of queueing networks. The decomposition method uses a set of equations that link the decomposed two-machine lines together. These non-linear equations are solved to determine the throughput of the line. Burman [5] improved the method to be able to model heterogeneous lines. The main advantage of the decomposition method is its computational efficiency. However, the disadvantage is that it can be applicable only under the assumptions that processing rates are either deterministic or exponentially distributed and failure and repair rates are either geometric or exponentially distributed random variables.
An alternative method used as an evaluative method for the BAP is a simulation model. The imitation created with the simulation is known as a model and is built with certain assumptions. With simulation, the real system is not analysed, but the model. It must then be assumed that if the model is an adequate imitation of the system, that the results obtained with the model might also be obtained in the real-life system if similar changes are made to it. Simulation allows any line topology to be simulated. Reliable and unreliable lines can be modelled, and each station can have unique parameters. Due to this flexibility, it has been used widely in complex systems such as production lines [17].
Simulation is applied in various research contributions [2,3,4,7,8,16,18,21,22,23]. There are two ways in which simulation is used as an evaluative method. The simulation itself can be used to determine the throughput for each buffer configuration. Alternatively, a meta-model can be made of the simulation, and this meta-model is then used to determine the throughput. A simulation model is a representation of a real-world system, whereas the term meta-model refers to a mathematical approximation of a simulation model. Meta-models are developed to obtain an understanding of the relationship between the input variables and the output variables of the system under investigation. The complexity of cellular manufacturing is also a motivation for the use of meta-models as the evaluative method when assigning buffers. Lee [18] developed an approach to find the buffer configuration that leads to the lowest cost in terms of investment and running cost. Amiri & Mahtashami [2] proposed a multiobjective formulation to solve the BAP for an unreliable line. A detailed DES is used to build a meta-model, which is then used for estimating the throughput. The objective is to maximise the throughput of the line and to minimise intermediate buffer storage. Numerous methods have been used to develop meta-models. The one used by Amiri & Mahtashami [2] was a polynomial regression model. Can & Heavey [6] investigated the robustness and accuracy of genetic programming to create meta-models. In a subsequent paper, Can & Heavey [7] created a meta-model of a simulation of the line studied and used genetic programming and artificial neural networks to solve the meta-model. Chan & Ng [8] made a simulation on Siemens IV to find an algebraic relation between line throughput and buffer sizes. The use of meta-models allows for more valid representations of complex systems being modelled than the decomposition method as well as faster execution than pure simulation. The disadvantage of the meta-model is that if the algebraic relation between line throughput and buffer size is not valid for all scenarios, an invalid representation is being solved. Most papers using analytical methods use deterministic or exponential times for the process, failure and repair times. The methods are limited to simpler line topologies.
Simulation is utilised in the literature to relax these restrictions. The simulation model allows general function distributions to be used (e.g., normal, gamma, Weibull and uniform) for all parameters of the production line. Simulation is employed for a more realistic representation of the dynamic behaviour of a system. DES is used for its effective way of estimating almost any system performance, given that the input data is accurate. Research papers considering real-life systems usually employ simulation as the evaluative method. DES is a model that executes by moving through the simulation by changing the state of variables at separate ("countable") points of time called events [17]. These events are scheduled in a chronological event list, and the simulation is executed by moving from one event to the next. Upon reaching an event, specific actions need to be taken by the simulation program which, in turn, can result in more events being scheduled.
The various components of a DES is presented by Law [17]. An event can be anything from a part leaving a machine, machine failure or workers becoming available. The time at which these events occur is usually stochastic. When a simulation model is created the structure of the line is first defined in the simulation program. For each stochastic machine process, failure and repair times are also defined in the simulation program. At this point, it is also necessary to define the simulation length that is required. This represents the system in real life. The simulation length can range from seconds to years. The advantage of using simulation on computers is that the computational time of the simulation is a lot quicker than the needed simulation length. A simulation representing months worth of production can be computed in minutes.
Once the model has been defined in the simulation program, it can initiate the simulation. Upon initialisation, the simulation program schedules the events. The time of each event occurrence, t k , is randomly generated from the distributions. This is added to the current time to determine the time of occurrence. The simulation also schedules the simulation end event, which is the specified simulation length. A simulation clock is used to keep track of where the simulation is in the event list. At initialisation, the simulation clock is at 0 s. The simulation clock needs to move to the first event in the list. The simulation clock can be advanced by two approaches: Fixed-increment, the simulation has a predefined time increment size. Assume the incremental size is 0.01 s. The simulation clock will advance with 0.01 s throughout the entire simulation. At each point, the simulation identifies if any event has occurred. This time incremental size should be small enough so that with each increment, an event cannot be overlooked. The disadvantage of this approach is that the simulation program requires time evaluating "inactive" time where events do not occur.
Next-event time overcomes this disadvantage by skipping inactive time. This approach will move from the time of one event to the time of the next event. The reason for this is that no changes to the system states occur during this inactive time. This reduces the computational time of the simulation program.
Assembly line simulator (ALS) was developed in Java, due to the common adoption of the Java programming language. Java programs are easily distributable and work on multiple platforms. The ALS was implemented using the SSJ library. SSJ is an organised set of code, offering general-purpose libraries for DES in Java. It contains the classes and objects required for the various components of DES. Kose et al. [16] solved the BAP for a real-world facility, namely a thermo technology company in Turkey. Due to the complexity of the system, a simulation was done using Arena. Spieckermann et al. [21] conducted a comprehensive case study at BMW AG. A simulation optimisation approach was used to solve the BAP. An existing simulation model made using SIMPLE++ was combined with commercially available optimisation packages, the WitnessOptimizer and SIMPLE/GA. Simulation is a common tool within the automotive industry. It is very convenient to use when determining a line's throughput, but very expensive as each evaluation of the objective function requires at least one simulation run. The use of commercial optimisation tools had to be considered as black boxes, with only a small degree of configuration. Execution times can be up to several days and evaluated solutions per optimisation run had to be restricted.

Material and methods
Recall that solving the BAP requires finding the best size and configuration of buffers in the production line. What is considered best will depend on the objective function that is specified. In this article, the best buffer configuration will be the one that achieves the desired throughput with the minimum total number of buffers, objective two. The lines under investigation are subject to machine failure and different processing times, and these are stochastic in nature. It cannot be assumed that the distributions for these process rates are exponentially distributed or that the failure and repair rates are either geometric or exponentially distributed. These assumptions are required for approximation methods such as the decomposition method. The throughput of these lines can be evaluated using simulation as the evaluative method, with any distributions. Evaluating all possible buffer configurations is computationally not practical. Consequently, we employ an optimisation procedure to navigate the solution space more efficiently and generate useful configurations. Demir et al. [12] solved the BAP for objective two using tabu search. The method was extensively tested on various line sizes and machine parameters, forming a solid base for generative method. The combined use of simulation and metaheuristics is known as SBO.

Discrete event simulation
DES require various components to function. The SSJ package, [24] is used to assist with the programming. SSJ contains libraries that can manage the event list and generate random variables.
The SSJ package provides the programming code for: initialise routine, event routine, timing routine, library routine and the report generator. For this article, a unique main program is created to combine each of these routines. Two specific event routines are created to meet the requirements of BAP.
The various components of DES and the interaction between them are shown in Figure 3. A model creation program is used to create the model of the line itself. The number of machines, total buffer limit and machine parameters are defined. A main program is required that controls all the other components. The main starts by invoking the initialisation routine. This routine sets the simulation clock to 0 time units. It initialises all system states and creates the event list with all the initial events. The simulation program then returns control to the main program, which then invokes the timing routine. This routine is responsible for managing the event list. It determines which event is next on the list as well as its time of occurrence. It then advances the clock to this event's time. The simulation program again returns control to the main program, which then invokes the event routine. There is a specific event routine for each type of event. This routine contains all the actions that need to be taken when the event is executed. The event routine updates system states, statistical counters and generates any future events. If the simulation end time is reached, the simulation stops and the report generator returns reports on statistical counters. If this time is not yet reached, the timing routine is again invoked to move to the next event. This action between the timing routine and the event routine is repeated until the simulation end time is reached. The library routine generates the random times required during the event scheduling.
Two main events are programmed into the event routine. The first event is activated when a part leaves a machine, called a departure event, and a second is activated when a machine fails failure event.
Departure event is triggered when a part is completed and leaves the station. Law [17] explains the structure for a general departure event of a simple simulation model. This structure is expanded to meet the needs of the BAP. The departure event is scheduled in the event list and represents a machine completing a part. When the event is performed by the simulation, the logic in flow chart Figure 4, is executed. The departure event first needs to move the completed part to the downstream station. If the downstream station is idle and starved (meaning the buffer was empty and no part was available to process) the completed part can be sent directly to the downstream station for processing. Transfer time is ignored in the simulation. It is assumed that transfer time is constant. Transport time can be included by increasing the processing time of the station with the additional transport time. Upon part arrival at the downstream station, the station state is set to not idle and not starved. The downstream station immediately starts service on the part. The new random process time is generated and added to the current simulation time. The completion time of the event is scheduled in the event list. If the part arrived at a station that is failed, repair time is also generated. The departure event is delayed by the repair time. A new failure time is generated and added to the event list. The failure status is changed to not failed. If the downstream station is not idle, it means that parts might be in the buffer before the station. It is necessary to check if the buffer has reached its capacity. If there is space left in the buffer, the part can exit the current station and join the queue while if it has reached its capacity, the part cannot exit. The current station's blocked state is set to blocked and the departure event ends.
After the part has left the current station, it is ready to receive parts, the station's state is set to idle. If there are no parts available in the upstream buffer, the station's state is set to starved.
The departure event ends.
If a part is available from the upstream buffer, it can be removed from the queue and enter the station, station' state is change to not idle. All parts in the queue are moved up one space. Upon arrival in the station the random processing time for the part is generated. This is based on the random distribution defined in the model creation step. The processing time of the part is added to the current simulation time to determine the time the part is planned to be completed. The next departure event is scheduled in the event list. The model checks if the part arrived at a station that failed while it was idle. If the machine's failed state is failed, repair time is generated randomly from the defined distribution. The departure event that was scheduled is now delayed by the repair time. If the machine's failed state is not failed, no adjustment to the scheduled departure event is needed.
It is possible that the upstream station was blocked by a full buffer before the departure event was executed. If a part was removed from the upstream buffer, it now has space available for the upstream station to place its blocked part. If the previous station was blocked, a departure event for it could be triggered. If it was not blocked no further action is needed, and the departure event is completed.
Failure event is triggered when a station fails. The flow chart for the failure event is shown in Figure 5. If the station is idle, it means no part is currently being processed by the machine. In practice, the failure will only be detected when the next part arrives at the station. Thus repair is not yet scheduled, but the station fail state is set to failed.
If the station is not idle the failure will stop the machine processing, and the failure is immediately detected. Repair time is generated randomly from a random distribution. The departure event of the current station is delayed by the repair time. A new failure event is scheduled by generating a random time between failure and adding it to the current simulation time. The time between failure is generated from a specified random distribution.
The DES is implemented in Java and used to evaluate the throughput of the production line scenario generated by the generative method. The simulation returns the throughput as well as statistics on the amount of time a station was either blocked or starved.

Tabu Search
Demir et al. [11] proposed two TS metaheuristics working in an iterative manor to solve the BAP for objective two. The inner loop using the ATS determines the maximum throughput a line can achieve for a given buffer size N . The ATS neighbour generation evaluates all the scenarios that can be reached by either increasing or decreasing each individual buffer size. Some of these neighbours actually decreases the buffer after stations that are highly blocked and increase for stations that are rarely blocked, generating bad neighbours. To overcome this an alteration to the work of Demir et al. [10] is proposed for neighbour generation, this method is called the Theory of Constraints Tabu Search (TOCT).
A smaller set of good neighbours are generated. This is achieved using the theory of constraints.
It states that the performance of a line is limited by a specific station, the slowest one, known as the bottleneck. If the overall performance of the line is to be improved, this single station needs to be improved first. It is crucial that this station is never blocked nor starved. This bottleneck can be identified with reports from the DES.
The DES provides statistics on how long each station was blocked during the simulation. It is thus possible to allocate weight to each buffer indicating how much time it blocked its upstream station. The longer the station is blocked, the higher the weight allocated to the buffer. The swap index is then randomly generated, where the buffer that has the highest block weight has a higher probability of being picked, and the buffer with the lowest block weight has the lowest probability of being picked. This station's buffer is increased. The buffer that will decrease in size is also chosen randomly, where the buffer that has the smallest block weight has the highest probability of being chosen and the buffer with the highest block weight the lowest probability. Only 0.5×(Number of machines K) neighbours are generated per iteration using the proposed method. This dramatically decreases the number of neighbours that need to be evaluated per iteration of the inner tabu. The factor 0.5 is quite an arbitrary choice in this paper but can be subjected to sensitivity analysis in future work.
Another alteration to the ATS heuristic is the inclusion of a list, storing the throughout for each evaluated buffer configuration. After generating new neighbours, the generated neighbours are compared to the items in this list to see if it has been evaluated previously. If it has been evaluated, the previously achieved throughput can be used, and the scenario does not have to be re-evaluated. The rest of the algorithm is the same as the ATS in the work of Demir et al. [10].
The outer loop is based on the work of Demir et al. [12]. It reduces the total size of N to obtain the smallest N possible while still meeting the required throughput. The algorithm starts with an initial buffer size of N 0 . Upon initialisation N should be a sufficiently large number. The size N is then given to the inner tabu algorithm. This determines the best buffer configuration and returns it as well as the total throughput.
The outer tabu loop is then used to generate a new total buffer size. Again the inner loop evaluates this and returns the best buffer configuration and throughput. This is repeated until either of two stopping criteria is reached: • no better solution has been found within a certain number of iterations, • the maximum number of iterations is performed.

Computational study
Experiments on small-sized, medium and large-sized problems with varying machine para-meters are done to evaluate the efficiency of the proposed solution method. At first serial production lines are considered.
The DES execution time is compared to commercial simulation software. Experiments are done to determine the simulation length and replication that achieves a balance between execution time and solution quality. The DES is then used to test the proposed TOCT against the ATS inner loop.
The BAP is solved for objective two by using the DES with the TOCT inner loop and outer loop from Demir et al. [12]. Finally the proposed solution approach's scalability is tested on a real world non-serial production line.

Data sets
Data sets used in Demir et al. [10] will also be used in this article. Four line sizes with the number of machines denoted by the set K = {5, 10, 20, 40} are evaluated. For each line, the total buffer size N is calculated by multiplying the number of machines by a factor of 5, 10, and 20. Thus 12 different serial line scenarios are used, representing lines ranging from small to large as shown in Table 1.  Lines with five machines are considered small, while those with 10 machines are considered medium in size. Lines with 20 and 40 machines are regarded as large lines.
Each scenario will be referred to by its number of machines (K) and total buffer size (N ). As per Demir et al. [10], the range of line scenarios allows the effectiveness of the proposed method to be tested across various line sizes. For each line scenario the various machine characteristics are tested. Again the work of Demir et al. [10] is referred to. The following section explain how the random process time, time between failure and repair time for each machine are defined. Table 2 shows the eight different line parameters.
Process time: the time it takes to complete a part. The processing time is randomly sampled from a uniform distribution, U (a, b). Each machine's process time will first be described with parameters, a = 5, b = 15. Then experiments will be done on the line with parameters, a = 5, b = 45. The distribution for each machine is the same.
Time between failure: each machine is subject to failure. The time between failure is randomly sampled from a geometric distribution G(p K ). Unlike the processing time, each machine in the line will have a unique distribution, some machines are more reliable than other. Each machine has a unique parameter p for the geometric distribution. This parameter is randomly sampled from a uniform distribution, p K = 1 U (a,b) ∀K to give each machine a unique distribution. Two different parameters are considered for the uniform distribution. Repair time: each machine has a repair time once it fails. The repair time is randomly sampled from a geometric distribution G(p K ). Similar to the failure time, each machine in the line will have a unique parameter for the geometric distribution. Some machines are easier to repair than others. This parameter is randomly sampled from a uniform distribution

DES experiment and results
The DES is tested in this section. The execution of the experiments is done on a computer having a 3.80Ghz Intel Core i5-7600K processor and 8 GB of RAM.
Before any model and its results can be used to solve the BAP, the simulation program needs to be tested to establish if the program is executing the simulation as expected. The DES is compared to a commercially available program to test similarity of results.
Only 10 of the 12 line scenarios in Table 1 are modelled in the commercial program due to licensing constraints. Scenarios 40.400, 40.800 are not considered. For each line scenario, three different values for each machine's failure and repair time are generated based on the parameters in Table 2.
Thus each of the 80 scenarios is tested three times for a total of 240 tests. For these tests, a random buffer configuration B B B is generated for each total buffer size N .
Simio TM is a widely used commercial simulation software. Both DES and Simio TM were run for a duration of 300 000 time units to ensure a steady state is reached. The simulation is replicated 1 000 times to get 1 000 different throughput results. At the end of the simulation run, the throughput is recorded, and the average results across the 1 000 replications used to compare the DES program's results to those of Simio TM . Secondly, the computational time of the programs model using DES is compared against that of Simio TM to see if a simulation in Java can outperform a commercial program.
The % throughput error between the Simio TM line and DES is compared using the calculation in (1) while the time factor is calculated using equation (2) Simio TM computational time DES computational time .
(2) Table 3 shows the results of the test. The first column shows the problem scenario, (K.N ).  Table 3: Simulation results from Java tool compared to Simio TM program.
The second column shows the average throughput difference between the DES tool and the Simio TM program calculated with equation (1). The third column shows the time difference between the two programs calculated with equation (2). The DES simulation program created in Java has on average a 0.84% difference in total throughput compared to Simio TM . This difference is small and is due to the stochastic nature of simulation.  DES had a significant speed advantage compared to Simio TM . On average, Simio TM takes 71 times longer than the proposed program based on DES. For the small-sized problems, Simio TM took 23 minutes to complete the simulation run and replications and 75 minutes for large-sized problems. In comparison, the DES only took 19 seconds for small problems and 1.5 minutes for large problems.
The experiment above verified that the simulation program using DES can obtain a throughput similar to that of commercial software. DES is also faster than commercial software. From this point, all simulations used in experiments are done using the Java simulation program created for this manuscript using DES. The simulation tool will be used by the inner tabu loop to determine the throughput of a line for a given buffer configuration. The inner loop will generate thousands of permutations that need to be tested. Thus it is crucial that the simulation achieves an accurate throughput in the shortest amount of time.
Two factors influence both the simulation's computational time and the quality of the solution it provides: the simulation length is the amount of time the simulation model uses to test the line, and the simulation replications, which is the number of times the simulation is repeated for each problem instance.
The simulation model in Java is quite fast. Still, if it takes 1.5 minutes for every evaluation during the generative method, the time required to solve the BAP can be excessively long. As the solution quality is time-dependent, a balance is needed between solution quality, simulation length and the number of replications.
Simulation steady state is a state at which the value of the throughput has acceptable small fluctuations over time. At the start of the simulation, throughput can be relatively fast because all the buffers are empty and all the machines are operational. As time progresses, buffers start filling up, and machines begin to fail. This can lead to a different throughput compared to the one achieved at the start of the simulation. Thus solution quality is dependent on the simulation length.
To find this point for the simulation model, all the possible scenarios in Table 2 are tested across all the line sizes in Table 1 With each replication the initial system state can vary due to the random parameters used. To obtain a stable throughput from the simulation, multiple replications of the same simulation, each with a unique random seed, are done, and the throughput of each simulation is recorded. An average throughput across all simulations is then used to reduce this variation. This can be done because each replication is independent. The number of replications performed is directly correlated to the accuracy of the average statistic. Again, a large number of replications comes at the expense of execution time.
To find a good number of replications, simulations of the problem scenarios were done, each replicated 1000 times. Initially, the average throughput varies significantly from 1 to 100 replications and then decreases to a small variation concerning the total throughput. Although more replications do result in less variation in the average throughput, at 200 replications, the variation is already minimal compared to the total throughput achieved by the line. Experiments showed the throughput at 200 replications is only 0.01% different from the throughput at a 1000 replications.
The experiments above showed that the accuracy of the line's throughput is dependent on both simulation length and simulation replications. The longer the simulation length and the higher the number of replications, the more accurate the throughput obtained is. This is crucial for the inner loop. When two buffer scenarios are compared, it should be able to accurately say that a change in the throughput was due to the change in the buffer configuration and not due to natural throughput variation within the simulation model.
Unfortunately, the increase in simulation length and number of replications come at the cost of computational time. To minimise the simulation computation time, for all experiments, a simulation duration of 10 000 and 200 replications will be used. This will result in a simulation steady-state error of below 5% and a small fluctuation in average throughput across the number of replications.

TOCT experiments and results
Once again the eight line scenarios in Table 1 and the 12 machine parameters in Table 2 are used. A total of 96 experimental combinations are generated. Because the inner tabu loop must be repeatable, each of the 96 experiments will be repeated ten times. This is done to determine whether the inner loop was able to achieve the same result for an experiment across all ten replications. The 12 problem sets are grouped into three classes: small (K = 5), medium (K = 10) and large (K = 20, 40). For the small-sized problem, the efficiency of the ATS and TOCT algorithm are compared against complete enumeration (CE). However, for medium and large scale problems CE is not computationally feasible and will not be tested using CE but will be compared against the ATS.
The experiments were done on the Lengau cluster of the Centre for High Performance Computing (CHPC), South Africa, parallelising the tasks over 261 threads. Table 4 shows the results from the experiments on small problems using ATS. First the ATS is compared with the CE.
The first column shows the problem set, shown as machine and buffer size, (K.N ). The second column shows the experimental scenario from Table 2. The third column shows the % error of the average optimal (maximum) throughput achieved, compared to the solution found using CE. To calculate the % error we use For each of the ten replications the absolute % error is calculated and averaged for each experiment scenario. Column four show the average computational time of the ATS algorithm in minutes. This is the time it took one iteration of the inner loop to complete. Column five show the average number of iterations it took the algorithm to find the best throughput it returned. All entries in the table are the averages across the ten replications of each scenario. Experimentation on small lines shows that the ATS method is capable of returning a throughput similar to the global optimum achieved using CE. The absolute error for ATS in comparison to complete enumeration was 0.30%. This small fluctuation can be attributed to the inherent stochasticity of the simulation, and in some instances, the method resulted in a better throughput than the CE.  The experiments on small-sized problems are repeated, using the new proposed TOCT method.
To calculate the % error we use Table 5 shows the results from the experiments on small problems using TOCT.
Similar to the ATS method, the TOCT is capable of achieving throughput similar to the global optimum achieved using CE. The absolute error for TOCT in comparison to CE was 0.88% versus the 0.30% for ATS. The significant difference between TOCT and ATS is computation time. The ATS method took, on average, 18 times longer to execute.
The third performance comparison is the number of iterations the program took to reach the solution it returned. For these small sized problems, K = 5, various total buffer sizes are tested, N = 25, N = 50, N = 100. The inner tabu is terminated if one of the two termination criteria are met. Either after N × 50 iterations, termination criteria 1, or after N × 25 iterations without an improvement on the optimal solution, termination criteria 2. Table 6 shows the stipulated number of iterations required to meet the termination criteria for the different scenarios. The actual number of iterations at which point the best throughput found does not improve for the ATS method is shown in column five in Table 4. Column five in Table 5 shows the number of iterations at which point the best throughput found does not change for the TOCT method.
Both the ATS and TOCT found, on average, the optimal solution in fewer iterations than the maximum allowed iterations. After that point, the algorithm kept running until the termination  criteria were met but could not find a better solution. The ATS needed fewer iterations compared to TOCT because it can find a better solution per iteration due to complete neighbourhood generation, but it takes much longer to do a single iteration. The termination criteria can thus be revised if the same results are found for medium and large-sized problems.
Complete enumeration is not possible for medium and large-sized problems. For the medium sized problems, the throughput achieved using ATS and TOCT is compared using equation (5), which shows the calculation used to determine the throughput deviation between the two methods.
% throughput deviation between ATS and TOCT = |f Table 7 shows experimental results for the medium sized problems.
The first column shows the problem set for the medium sized line, and the scenario in the second column. For each of the ten replications, the % throughput deviation is calculated using equation (5) For experiments on lines with 10 machines and total buffer size 200, only two of the eight scenarios are tested using the ATS method due to long execution times. The average error between the   ATS method and TOCT method is 1.68%. The ATS method takes 5.5 times longer to complete compared to TOCT. Similar to small-sized problems, the throughput achieved across the ten replications are tightly grouped. Once more the number of iterations the program took to reach the solution it returned is considered. The same termination criteria apply. Both ATS and TOCT found, on average, the optimal solution at fewer iterations. After that point, the algorithm kept running until the termination criteria is met but could not find a better solution.
Large sized problems can only be evaluated using the proposed TOCT algorithm. Due to the size of the problem and the increase in computational time required as the total buffer size and number of machines increase, ATS becomes computationally expensive to test comparatively. Note due to long computational time, experiments on lines with 40 machines and 400 N as well as 800 N could not be performed using TOCT.  For the large sized problems, more outliers are present compared to medium and small-sized problems. Similar to the previous experiments, maximum throughput is achieved within fewer iterations than required by the termination criteria.
The inner tabu loop experimentation showed that the proposed TOCT method could satisfy the BAP's first objective: finding the buffer configuration B B B that results in the maximum throughput.
For small problems, the TOCT method achieved similar results to CE and ATS within a much shorter time. The reduction in execution time makes it possible to solve medium sized problems using SBO. Large sized problems with large number of buffer 400N as well as 800N still require large computational time to solve even with the proposed method.
The experiments indicate that in the majority of cases, the maximum throughput was achieved with fewer iterations than currently set up in the termination criteria. For future experiments, the termination criteria for the inner tabu loop is changed to either 20 × N or 10 × N iterations without an improved solution. This can significantly reduce the execution time required to run the inner tabu algorithm and makes it more practical for the use in the outer tabu loop. Replicating each experiment 10 times indicated that the inner loop returned sufficiently similar incumbent (near-optimal) solutions. It is therefore argued that it is safe to run the inner loop once for each N provided by the outer loop as its results are similar to multiple replications.

Solving the BAP using SBO, experiments and results
The final piece of the SBO method is the outer loop. Recall that the main objective of the SBO is to find a buffer configuration, B B B, that results in the smallest total buffer size N while achieving a required throughput. The experiments thus require a target throughput. Similar to the experiments on the inner loop the line scenarios presented in Table 1 and Table 2 are used. To determine the target throughput, each experiment is done with half the initial N . For example, a line with K = 5 machines and N = 25 total buffer size, the inner tabu loop is executed using N 2 , 25 2 = 12. For a line with K = 5 machines and N = 12 total buffer size the inner loop achieved a maximum throughput of 918. This then becomes the target for the outer loop. It will start with the initial total buffer size N = 25 and search for the smallest N that results in a throughput of at least 918. It can be that the outer loop only return N = 12, or if possible, it could achieve a throughput of 918 with even a smaller total buffer size.
The execution of the experiments is done on a computer having 3.80Ghz Intel Core i5-7600K processor and 8 GB of RAM. For small-sized problems, the experiment was repeated five times. For medium and large-sized problems, the experiment was only replicated once. The CHPC is not used to run these experiments but a desktop. Do to computational resources, only limited number of replications are tested. All entries in the tables are the averages of the number of replications for each problem instance. The results for small-sized problems are shown in Table 9.
The first column is the problem set from Table 1. The second column is the problem scenario from Table 2. The throughput target the algorithm needs to achieve is shown in the third column. The actual throughput and the corresponding optimal buffer size to achieve this is shown in the fourth and five column respectively. Lastly, the time to solve one instance of the SBO in minutes is shown in the sixth column. For small sized problems the final throughput achieved by the model is very similar to the objective. In ten of the scenarios the model achieved the throughput at N 2 of the starting N . In 14 of the scenarios, the model achieved the required throughput with less than N 2 of the starting N . Experimentation showed that DES can effectively be used to solve BAPs. In conjunction with the proposed inner loop, using TOCT and the outer loop based on the works of [10] the BAP is solved for various line sizes and line parameters.

Solving BAP a case study
Body-in-White (BIW) is a production phase in which the automobile's metal body, called the body in white, is assembled using preformed pieces of metal. Such a line can consists of hundreds of welding robots, the production line this case study is based on has 292 robots. The lines are subject to failure and are partially unbalanced. Throughput of these manufacturing lines are generally very high and is affected by factors such as variation in processing times and reliability.
The topology is a tree structure, various machines have more than one station feeding parts into them. The three main sub-assemblies are produced in separate areas, the Front End, Centre Floor and Rear End. The Front End has seventeen machines. To conform with the BAP notation, each of these production cells will be simulated and referred to as a single machine K. Each of these machines are separated with a small buffer. The Centre Floor has nine machines while the Rear End has twelve. These stations are followed by the main line, which produces the underbody and sidef rames. Each machine has a unique processing time, which can be simulated with a uniform distribution. Each machine also has unique failure and repair times. Not only do the distributions of the various machines' failure and repair rate differ, but also its parameters. Some are modelled using Gamma, some using Beta distributions.
Together the total facility has 63 machines, 59 buffer locations and a total buffer size N of 318. The BIW problem is considered a large-sized problem. The SBO method developed in this manuscript can be used to solve the BAP for complex lines such as this production system. A simulation is created of the line, using its tree topology. Each machine has unique random distributions that represent the processing time, failure rate and repair time, respectively.
The simulation has a simulation length of three days (72 hours). The simulation starts with all buffers empty and a single part in every station. The throughput for the first two days are not recorded as it allows the simulation to reach a steady state. The throughput for the third day is then stored and used for analysis. The simulation is replicated 10 times and the average of the throughput across the 10 replications is returned as the performance result for the line.
The local automotive plant is measured on daily output performance. Stability is crucial and the line needs to achieve its targets daily. The current buffer allocation and line parameters results in an average daily throughput of 239.7 units per day.
Using the TOCT inner tabu the optimal buffer configuration resulted in a daily throughput of 282.7 units per day. Reconfiguring the allocation of buffer increased the performance of the line by 43 units from the initial state.

Front End
Centre  Table 11: Buffer allocation results production phase 1.
The objective function of the BAP for this article is achieving a specific throughput with the smallest total buffer size. The proposed SBO method is used on a production line of a local automotive manufacturer of luxury vehicles. A target throughput of 230 units per day needs to be achieved by the line. The SBO method obtained a buffer configuration that resulted in 242.9 units per day with a total buffer size of 126. That is a reduction of 192 units in the total buffer size. The algorithm required 3 hours and 55 minutes to finish. Table 11 and Table 12 show the buffer allocation for the initial case study versus the optimal buffer allocation. None of the buffers where reduced to 0 thus, the optimal buffer configuration is to have more small buffers among the line than fewer large buffers. Some buffer locations (B47, B50, B52, B56) increased in size. The case study shows that the proposed SBO method is able to solve the BAP for complex, realistic lines.

Conclusion
In this study, we propose a SBO approach to solve the BAP for unreliable, heterogeneous serial and non serial production lines. The objective is to achieve a desired throughput with the smallest buffer size. The proposed approach uses DES designed specifically for the BAP as evaluative method and integrated TOCT as inner loop and TS outer loop as generative method. The TOCT uses an alternative neighbourhood generation mechanism based on theory of constraints. The proposed methods elements are tested on serial lines and finally applied to a case study.
The results on the experiments on the DES model show it can achieve similar throughput to that of commercial software, with 0.84% difference in total throughput compared to Simio TM . DES has a significant time advantage compared to Simio TM , for small-sized problems, Simio TM took 23 minutes compared to 19 seconds for DES, and 75 minutes for large problems compared to 1,5 minutes for DES. Initial experiments on simulation length and replication show that a balance between solution quality and execution time for this problem can be achieved. Simulation length of 10 000 and 200 replications are used. It must be noted that these values can be very problem specific.
Experiments on the TOCT show that for small-sized problems the absolute error for TOCT in comparison to CE was 0.88% versus the 0.30% for ATS. The significant difference between TOCT and ATS is computation time. The ATS method took, on average, 18 times longer to execute. Experiments also show that the metaheuristics solution stops improving long before the termination criteria are met. For medium-sized problems the average error between the ATS method and TOCT method is 1.68%. The ATS method takes 5.5 times longer to complete compared to TOCT. Large sized problems the ATS method was computationally impossible to run, as well as experiments on line with 40 machines and 400, 800 N could not be done with TOCT.
The proposed SBO method is capable of solving the BAP for objective two. For small sized problems the final throughput achieved by the model is very similar to the objective. In ten of the scenarios the model achieved the throughput at N 2 of the starting N . In 14 of the scenarios, the model achieved the required throughput with less than N 2 of the starting N . With the large sized problems, 3 scenarios 20.100.8, 20.200.2 and 20.200.8 was not successfully solved, for medium-sized problems and remaining large-sized problems the model was able to achieve a required throughput close to the target throughput. 45 scenarios was less than or equal to N 2 , of the starting N . Only three scenarios 20.100. 8, 20.200.2 and 20.200.8 was not successfully solved. A higher throughput is achieved with a total buffer size required bigger than N 2 of the starting N .
The model can be adapted to non-serial lines. The automotive manufacturing plant is measured on daily output performance. Solving the case study for objective one show that the throughput of the line can be increased from 239.7 unites per day to 282.7 units with the same total buffer size. Solving it for objective two with a throughput target of 230 units per day, the throughout was achieved with 242.9 units per day with total buffer size of 126, a reduction of the total buffer size with 126. The algorithm required 3 hours and 55 minutes to finish.
The use of parallel implementation can have a significant improvement on the computational time of the model and is another direction for future research.