Dataset of metaheuristics for the flow shop scheduling problem with maintenance activities integrated

This data article presents a flow shop scheduling problem in which machines are not available during the whole planning horizon and the periods of unavailability are due to random faults. The experimental dataset consists of two problems with different sizes. In the largest one, about 2400 problems were analysed and compared with two diffuse metaheuristics: Genetic Algorithm (GA) and Harmony Search (HS). In the smallest, about 600 problems were analysed comparing the solution obtained with an exhaustive algorithm with those obtained by means of GA and HS. This dataset represents a test-bed for further works, allowing a comparison between the solution quality and the computation time obtained with different optimization methods. The substantial computational effort spent to generate the dataset undoubtedly represents a significant asset for the scientific community.


Specifications
Industrial Engineering Specific subject area Scheduling Flow shop systems with maintenance activities Type of data Matlab workspaces

Value of the Data
• The main purpose of this data is to provide a test bed of Flow Shop Scheduling Problem (FSSP) integrated with preventive maintenance and stochastic breakage. In particular, problems of different size and computational complexity were proposed in order to compare heuristic algorithms in solving problems similar to those in real industrial applications. The computational effort to solve such problems, finding the optimal sequence of jobs or a solution close to the optimal -depending on the size of the problem -represents an additional value of the data. • In this research, it is proposed to separately solve two minimization problems: (i) makespan minimization; (ii) Earliness Tardiness Penalties (ETP) minimization; they have been resolved with the use of two diffuse metaheuristics for medium and large problems: Genetic Algorithm (GA) and Harmony Search (HS). Small problems are solved also using an exhaustive algorithm. This test bed and its solutions will benefit all researchers involved in the topic of scheduling in flowshop systems in order to compare the results obtained with other solving algorithms they have developed. New experiments could exploit the proposed results by comparing new heuristic algorithms. This comparison can be made both in terms of the quality of the solution (i.e. a solution with a better value according to one of the objective functions considered) and in terms of the time needed to compute the solution.

Data Description
To compare the performance of a Genetic Algorithm (GA) and Harmony Search (HS), two sets of scheduling problem are presented.
The first one deals with small problems. The proposed heuristics were compared with an exhaustive search method able to find the optimal solution for relatively small problems in a Table 1 Description of the files in the folder "Problems.zip".
In the second set, which concerns more significant problems, the two heuristics were compared considering different scheduling scenarios created according to different optimization and problem generation criteria. In this case, the machines are less reliable (i.e., they are more likely to fail and require additional maintenance, resulting in a higher scheduling complexity).
For this class of problems, the dataset contains six classes of problems for each objective function (Makespan minimisation or ETP minimisation): three with low scheduling complexity and three with high scheduling complexity. For each class, 100 problems were solved for a total of 600 × 2 × 2 = 2400 experiments.
We solved the problems exhaustively on a Google Cloud virtual instance with the following features: 72vCPU Intel Skylake and 270GB of memory. The same was done for the two heuristic algorithms on a different virtual instance with the following features: 4vCPU Intel Skylake and 15 GB of memory.
The Stopping condition in both heuristic methods was set at 15 minutes of stall time (i.e., if the solution does not improve for 15 consecutive minutes, then the algorithm will stop).
The dataset can be downloaded from [1] . In the file "Problems.zip" 800 files named Problem1.mat, Problem2.mat ... Problem800.mat can be found ( Table 1 ). Each file contains a MATLAB workspace with the following information: • N is the number of Jobs.
• M is the number of Machines of the flowshop system. • CL is the size of the problem (N + PM). • s is the setup time matrix (the size is NxNxM): each cell (i,j,k) represents the setup time on machine k to switch from job i to job j. In the considered problems, it is assumed that the setup time to switch from job i to job j and the setup time to switch from job j to job i are the same.   The result of calculations is present in the file "Solutions.zip". Each folder ( Table 2 ) contains three files: "solutions comparison.xlsx", "GA solutions.xls" and "HS solutions.xls". In folders "problems (0 01-10 0)" and "problems(101-20 0)", the file "EA solutions.xls" is present.
The content of the Matlab file workspace (i.e. problem1.mat) can be viewed and exploited using open software such as python.
The following is an example script that uses the scipy library to load the workspace of a .mat file into python (release 3.

Experimental Design, Materials and Methods
Problems are generated using the following rules used by [2] : • Job processing time is a Gaussian random variable with mean 100 and standard deviation 25.
• Setup times are uniformly distributed between 0 and 19.
• The average time required to carry out corrective maintenance is evenly distributed between 15 and 25. • The average time required to carry out the planned maintenance is evenly distributed between 30 and 50.
For the problems related to the minimisation of ETP, these additional parameters are defined: • Earliness penalty (a): equal to 1.
For the generation of due dates in a scheduling problem on a single machine the rule proposed by [3] was modified to be adapted to the scheduling problem with multiple machines: Due dates are uniformly distributed with average μ DD and range R DD . Hence, the value of these parameters in individual problems are deterministic. To create different problems (each relating to machines and jobs characterized by different parameters), the aforementioned random procedures were adopted. Therefore, this dataset was generated using variables and parameters that cannot be found elsewhere.
The data, reported here, were used in [4] in order to schedule together production and maintenance activities in a flow-shop environment.

Ethics Statement
It is not applicable.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.