Dataset for solving a hybrid flexibility strategy on personnel scheduling problem in the retail industry

This data article describes datasets from a home improvement retail store located in Santiago, Chile. The datasets have been developed to simultaneously solve a staffing and tour scheduling problem that incorporates flexible contracts and multiskilled staff. This Data in Brief article is related to the published article “Hybrid flexibility strategy on personnel scheduling: Retail case study” [1]. The datasets contain real, processed, and simulated data. Regarding the real and processed datasets, they are presented for three different store sizes (4, 5 or 6 departments). Real datasets include information about the employment-contract characteristics, cost parameters, and a forecast of the number of employees required in each department for each day of the week and each time period into which the operating day is divided. As regards the data processed for the case study, they include the set of skill sets considering that the employees can be trained in a maximum of two store departments. Regarding the simulated datasets, they include information about the random parameter of staff demand in each store department. The simulated data are presented in 90 text files classified by: (i) Store size (4, 5 or 6 departments). (ii) Coefficient of variation (10, 20, 30%). (iii) Instance identification number (10 instances per scenario that resulted from combining the store sizes and coefficients of variation). Researchers can use the datasets for benchmarking the performance of different approaches with the one presented by Porto et al. [1], and in consequence, they can find solutions to the same (or similar) type of personnel scheduling problem. The dataset includes an Excel workbook that can be used to randomly generate staff demand instances according to a chosen coefficient of variation.

of variation (10, 20, 30%). (iii) Instance identification number (10 instances per scenario that resulted from combining the store sizes and coefficients of variation). Researchers can use the datasets for benchmarking the performance of different approaches with the one presented by Porto et al. [1] , and in consequence, they can find solutions to the same (or similar) type of personnel scheduling problem. The dataset includes an Excel workbook that can be used to randomly generate staff demand instances according to a chosen coefficient of variation.
© 2020 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license.
( http://creativecommons.org/licenses/by/4.0/ ) Specifications Real and processed data from a retail store and simulated data generated using a Monte Carlo simulation in an Excel workbook Data format Mixed (raw and analyzed) Parameters for data collection Real and processed datasets are presented for three different store sizes. Real data including the employment-contract characteristics, cost parameters, and a forecast of the number of employees required in each department by weekday and time period of the day. Processed data including the set of skill sets. Simulated data containing randomly generated staff demand in each store department Description of data collection Real and processed data were collected from a home improvement retail store. Simulated data were generated using the Excel formulas of inverse probability distribution and random values. The outputs are presented in 90 text files classified by: (i) Store size (4, 5 or 6 departments). (ii) Coefficient of variation (10, 20, 30%

Value of the data
• The datasets contain real, processed, and simulated information that can be used to simultaneously solve a staffing and tour scheduling problem that incorporates flexible contracts and multiskilled staff, considering a retail store with uncertain demand. • Researchers can use the datasets for benchmarking the performance of different approaches with the one presented by Porto et al. [1] , and in consequence, they can find solutions to the same (or similar) type of personnel scheduling problem. • The data from this research can be useful to determine which are the staffing levels by contract type, the cost-effective multiskilling levels, and the weekly shift programming that minimize the costs associated with training and over/understaffing in a retail store.
• Researchers can use these data to solve other types of personnel scheduling problems, such as a shift scheduling problem, a day-off scheduling problem, or a personnel assignment problem. • The datasets include an Excel workbook that can be used to randomly generate staff demand instances according to a chosen coefficient of variation.

Data description
The human resources management in the retail industry faces predictable phenomena such as demand seasonality, as well as unpredictable phenomena such as demand uncertainty and unscheduled staff absenteeism [1] . Such phenomena produce periods of over and understaffing that can increase labor costs and deteriorate customer service levels (CSL) [1] . To minimize this mismatch between employee supply and demand, companies have used different labor flexibility strategies to solve personnel scheduling problems [2] . Porto et al. [1] expressed that there are four type of strategies typically implemented: flexible contracts, multiskilled staff, collaborative teams and temporary employees.
The database presented in this article was used by Porto et al. [1] to simultaneously solve a staffing and tour scheduling problem that combines the following two labor flexibility strategies: (i) flexible contracts, which allow to relax the duration of the shifts and the number of weekly hours employees must work; and (ii) multiskilled staff, employees trained to work on multiple departments, which allows store managers to transfer available multiskilled employees from overstaffed departments to understaffed departments.
Staffing is a type of personnel scheduling problem that determines how many employees are required in each type of contract, and how many of them will be multiskilled employees and in which task types (or store departments). In addition, tour scheduling is another type of personnel scheduling problem where days-off and shifts are scheduled simultaneously over a given planning horizon (typically one week). The solution of a staffing and tour scheduling problem must minimize labor costs while maintaining or improving the CSL.
The data presented in the next sections are derived from a home improvement retail store in Santiago, Chile. In this store the employees are assigned to different departments, each of which constitutes a store business unit and at that level is where the store's training and employee scheduling decisions are made. This database contains real, processed, and simulated data that will be described below.

Real data
The real data include information about the sets and parameters that can be used to solve a hybrid flexibility strategy on personnel scheduling problem in the retail industry. Table 1 shows the notation, description, and values of these data.
Regarding the sets, it presents the days of the week that the store is operating ( D ), the time periods into which the operating day is divided ( P ), the contract types ( C ), shifts ( T ), workday types ( J ), store departments ( L ), workdays for each contract type ( J c ) and the shifts for each contract type and workday type ( T cj ). As for the parameters, it presents the costs of staff shortage ( U ) and staff surplus ( O ), and the base wage for each type of contract ( G c ). Finally, it provides some additional parameters which are the start and end times of shifts ( s t , e t ), the start and end times of period ( u p , b p ), the length of each shift ( h t ), and the number of weekly working hours according to the contract type ( e c ).
Particularly, the parameter associated to the forecast of the number of employees required in each department for each day of the week and period ( r ldp ) is presented in Table 2 . This staff demand is expressed in terms of 28 time periods between 8:00 and 22:00, each period has a length of 30 min.
Shifts per day, indexed by t . Each shift is defined by a start and end time.
Indicating that each contract type may have different feasible shift durations.
Shifts for employees with contract type c and workday See Table 2 s t   In order to maintain the optimization problem computationally tractable, Porto et al. [1] considered three store size (SS). Each SS represents the number of departments that the store has. The first SS has four departments, the second SS included five departments, and the third SS included six departments. Tables 1 and 2 show a SS of six departments, but to represent the other SS it would only be necessary to adjust L to the number of departments (i.e., L = { 1 , 2 , 3 , 4 } for the SS of four departments and L = { 1 , 2 , 3 , 4 , 5 } for the SS of 5 departments). Also, for the parameter r ldp , the demand for the corresponding SS can be selected in the order presented in Table 2 .
The retailer data associated to the case study were provided by SHIFT SpA [3] , a firm that optimizes the shift schedules of thousands of employees across Latin America. Two types of contracts were defined for the addressed personnel scheduling problem, based on established practices in the Chilean retail sector. The first contract type is FT45, for full-time employees working 45 weekly hours, while the second contract type is PT30, for part-time employees working 30 weekly hours. As mentioned before, the set C represents the contract types, indexed by c , and the values c = 1 indicates a FT45 contract and c = 2 a PT30 contract.
As regards the workday types, there are three possible workdays in the PT30 contract (i.e., 5, 6 and 10 h) and there is one possible workday in the FT45 contract (i.e., 9 h). As noted above, the set J represents the workday types, indexed by j , and the values j = 1 indicate a workday of 5 h, j = 2 a workday of 6 h, j = 3 a workday of 9 h and j = 4 a workday of 10 h.
As a result, Table 3 shows the set of shifts per day ( T , indexed by t ) that was obtained for each combination of contract type and workday type (i.e. workday groups). Column 1 shows the workday groups, which are structures formed by the type of contract, the working days per week, and the working hours per day. For example, the workday group FT45 {5 × 9} represents an employee under a full-time contract that works 45 h per week, spread over 5 days per week and 9 h per day. Columns 2 and 3 in Table 3 present detailed information for each shift belonging to a workday group. Column 2 indicates the start ( s t ) and end ( e t ) time associated to each shift t . Note that, the shifts can only start every 30 min and are limited by the store operating hours (i.e., 8:0 0-22:0 0). Column 3 shows for each shift presented in the Column 2 what is their respective value of t ∈ T cj . Note that each workday group is equivalent to one of the sets T cj , which represents the shifts for employees with contract type c and workday type j , The workday group FT45 {5 × 9} is equivalent to the set T 13 , the workday group PT30 {6 × 5} is equivalent to the set T 21 , the workday group PT30 {5 × 6} is equivalent to the set T 22 , and the workday group PT30 {3 × 10} is equivalent to the set T 24 . The last column presents the number of shifts that can be considered for each workday group, whose total sum is equivalent to Finally, a detailed description of how we estimate the personnel demand and the staff costs will be addressed in the experimental design, materials, and methods section.
The second set is L w which represents the subset of store departments that are included in the skill set w , ∀ w ∈ W , L w ⊆L . According to the value of w, L w can have one or maximum two departments. On one side, if L w has a single department, it is associated with a single-skill subset. On the other hand, if L w has two departments, it is associated with a multi-skill subset. In this last case, a distinction is made between which is the primary department and which is the secondary one.
In Table 4 for example, L 1 = { 1 } is associated with the single-skill subset w = 1 , where employees are trained to work only in department 1. L 2 = { 1 , 2 } is associated with the multi-skill subset w = 2 , where employees are trained to work in department 1 as primary, and in department 2 as secondary. It should be noted that even though L 2 = { 1 , 2 } and L 7 = { 2 , 1 } are associated with multi-skill subsets where employees are trained to work in department 1 and department 2, they differ because their primary department is 1 and 2 respectively. This property also applies to the rest of the multi-skill subsets.
The third set is W l which represents the skill sets that allow an employee to work in the department l , ∀ l ∈ L , W l ⊆W . To differentiate the primary department from the secondary department, we defined W out l and W in l . First, W out l represents the multi-skill sets where department l Table 4 Description of the data processed for the case study of a retail store with six departments.

Notation Description Value
Sets W Set of skill sets, indexed by w .
Includes subsets with only one skill (i.e., single-skill) and subsets with maximum two skills (i.e., multi-skill). 2 , 3 , 4 , . . . , 36 } Additional model sets derived from the sets defined above L w Subset of store departments that are included in the skill set w , ∀ w ∈ W , L w ⊆L . Each subset has maximum two departments.
Skill sets that allow an employee to work in the department l , ∀ l ∈ L , W l ⊆W . Each skill set has 2 | L | − 1 skills. Multi-skill sets where an employee has a primary department different to l , but he/she also has an additional skill to work in the department l , ∀ l ∈ L , W l ⊆W . Each multi-skill set has | L − 1 | skills. is considered the primary department of the employee, but he/she also has an additional skill to work in a secondary department, ∀ l ∈ L , W l ⊆W . Second, W in l represents the multi-skill sets where an employee has a primary department different to l , but he/she also has an additional skill to work in the department l , ∀ l ∈ L , W l ⊆W .

W in
In addition, we assume a minimal training cost per each hired multiskilled employee, such that M w = 1 US $; -week / multi -skilled employee . Note that, the training cost is zero for the skill sets w with only one skill (i.e., single-skill).
Finally, we provided three files with the sets and parameters of the real and processed data written in AMPL, for each SS. The files named 'SS4-real-processed-data.dat', 'SS5-real-processeddata.dat' and 'SS6-real-processed-data.dat' contain the values that correspond to a SS of four departments, five departments and six departments, respectively. All files can be downloaded from the Mendeley repository that was provided in Data accessibility Section (see Specifications  Table).

Simulated data
The forecast values of the parameter r ldp , which represents the number of employees required in department l , on day d , in period p , ∀ l ∈ L , d ∈ D , p ∈ P , were presented in Table 2 . But also, in order to evaluate the potential benefits of the joint use of flexible contracts and multiskilled employees in the face of different changes in staffing demand, Porto et al. [1] created simulated demands considering three different variability levels in each department: CV = 10%, 20% and 30%. Such that, CV is the coefficient of variation of demand from the forecast values presented in Table 2 .
Combining the SS (4, 5 and 6 departments) and the CV (10%, 20%, 30%) the resulting number of scenarios is 9. A Monte Carlo simulation was used to randomly generate instances for demand in each store department following a zero-truncated normal probability distribution (this prevents negative demand values). In our case, 10 demand instances were generated for each of the scenarios, resulting in 90 instances. These 90 simulated demands are provided in the text files listed in Table 5 and can be downloaded from the Mendeley repository that was provided in Data accessibility Section (see Specifications Table).
Each file contains the number of employees required in each store department, such that each row represents one of the 7 days of the week, and each column represents one of the 28 time periods into which the retail store's operating day is divided. The file names are identified by a three-character code i-j-k , where i = SS4, SS5, SS6 indicates the store size (4, 5 or 6 departments); j = CV10, CV20, CV30 indicates the coefficient of variation (CV = 10, 20 or 30%); and k = 01, 02, 03, 04, 05, 06, 07, 08, 09, 10 represents the instance identification number (10 instances per scenario).
As an example, Figs. 1 and 2 were created to visualize the simulated demands. Fig. 1 shows the first instance of the simulated demand, for the scenario with six store departments and a coefficient of variation equal to 10% (i.e., the data from the 'SS6-CV10-01.dat' file). And Fig. 2 shows the first instance of the simulated demand, for the scenario with six store departments and a coefficient of variation equal to 30% (i.e., the data from the 'SS6-CV30-01.dat' file). In both instances, departments 3 and 4 are the ones with the highest demands, and department 1 has the lowest demand, just like the forecast values of r ldp presented in Table 2 . For CV = 10%, the number of required employees ranges from 1 to 7, considering all departments, days, and time periods. Meanwhile for CV = 30% the number of required employees ranges from 0 to 8, considering all departments, days, and time periods.  Finally, a detailed description of the tool we used to perform the Monte Carlo simulations will be addressed in the experimental design, materials, and methods section.

Experimental design, materials, and methods
In this section, we present a complete description of the experimental design and methods used to acquire the data regarding the skill sets, the staff costs, and the forecast and simulated demand. First, we describe how the skill sets were designed. Second, we explain the reasoning behind the estimation of the base wage per type of contract and the costs of training, staff shortage and staff surplus. Third, we indicate the methods used to obtain the forecast staff demand in each store department. Finally, we provide a description of the program used for the Monte Carlo simulation, which generates random parameters of staff demand.

Skill sets
The sets W were designed using square matrices such that the number of rows and columns is equal to the store size (SS). Fig. 3 shows the matrix notation, matrix representation, and linear vector for each set W associated with each SS (4, 5 and 6 departments). In the matrix representation of sets W the main diagonal represents the single-skill subsets (indicated in bold). Furthermore, the entries above and below the main diagonal represent the multi-skill subsets. Note that, in the model, the set W was finally written as a linear vector, thus the matrix representation is for illustration purposes only.
For example, when SS = 6, the matrix notation a 1, 1 is associated with the single-skill subset w = 1 , where employees are trained to work only in department 1. In addition, the matrix notation a 1, 2 is associated with the multi-skill subset w = 2 , where employees are trained to work in department 1 as primary, and in department 2 as secondary. As well as that, the matrix notation a 2, 1 is associated with the multi-skill subset w = 7 , where employees are trained to work in department 2 as primary, and in department 1 as secondary. Finally, as already explained in Section 1.2 , the other skill sets (i.e., L w , W l , W out l , W in l ) are derived from W .

Staff costs
Regarding the cost structure, we set the base wage per type of contract according to the Chilean Labor Law effective in 2018. The weekly base wage for a full-time employee (i.e., FT45 contract) is equal to G 1 = 100 US$ ; /week , whereas for a part-time employee (i.e., PT30 contract) is equal to G 2 = 90 US$ ; /week . Note that, the weekly cost of a full-time employee is greater than the weekly cost of a part-time employee, since the former works 15-h more than the latter. However, one hour of work by a part-time employee is typically more costly than one hour of work by a full-time employee.
In regards to the cost of training associated with multi-skilled employees, we assume a minimal training cost of M w = 1 US$ ;-week / multi -skilled employee . Henao et al. [4] , Henao et al. [5] , Henao et al. [6] , and Porto et al. [1] explain that this assumption allows for interpreting the results as an upper bound on the potential contribution of multiskilling to store performance.
As for the shortage and surplus costs, we assumed to be the same for all departments, time periods and days. The shortage cost is equivalent to the cost of the expected lost sales, and according to historical data of the retail store is 61 US$/period on average. The surplus cost represents an opportunity cost, incurred by paying for idle staff who could be assigned to other productive tasks in the store, and the average cost was determined to be 15 US$/period using historical data.

Demand forecast from a retail store
SHIFT SpA is a specialized firm that provides advisory services in workforce management for different companies in Latin America [3] . As mentioned before, SHIFT SpA provided the real data for our case study, which are derived of a home improvement retail store located in Santiago, Chile. In particular, the staff demand by department was obtained through a specialized software that runs in two steps: (1) forecast the sales and transactions and (2) generate the staff requirements.
In the first step, the software uses a multiple linear regression to forecast the amount of sales and number of transactions of the store by department, day, and time period. This procedure is based on historical data. To improve the estimation of the regression, the store must have at least 2-6 years of data. In the second step, the software transforms the predicted sales and transactions in a staff demand expressed in man-hours, considering the typical customer service times. Then, given a pre-established level of service, the number of employees required in each store department per day and time period is determined.

Monte Carlo simulation
In the Mendeley repository that was provided in Data accessibility Section (see Specifications  Table), we supply an Excel workbook named 'Monte Carlo Simulator.xlsx' which contains the Monte Carlo simulation used to generate the instances for demand in each store department following a zero-truncated normal probability distribution (simulated data). This workbook contains one worksheet that can be used to generate demand instances one by one, for a given store department.
Since the staff demand follows a normal distribution, two parameters are required to generate the instances: (i) the forecast of the number of employees required in the department for each day of the week and each time period, which is given in Table 2 ; and (ii) the coefficient of variation, which can be selected by the user to represent the uncertainty in the staff demand that best fits its case study. In the Excel worksheet, these parameters or inputs are placed in a yellow fill cell, this color indicates that the values can be edited.
The rest of the input data is calculated using Excel formulas, and are presented in gray text to indicate that these cells should not be edited. In the worksheet, the standard deviation is calculated as the product between the forecast demand and the CV. Then some statistics are calculated to truncate the outputs. Since the distribution is zero-truncated, the standard score (z) and its respective quantiles were calculated.
Finally, the demand instance is calculated using the Excel formulas of inverse probability distribution and random values. The random values range between the quantile (associated with the standard value of demand equal to zero) and 1, and a step size of 0.0 0 01. This ensures that the generated demand values are not negative, that is, that they are greater than or equal to zero. The random parameters of staff demand outputs are organized in 7 rows representing the days of the week, and 28 columns representing the time periods into which the retail store's operating day is divided. Such values are presented in blue text to indicate that these cells are the outputs and should not be edited.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.