Instance dataset for resource-constrained project scheduling with diverging material flows

This data article describes an instance dataset motivated by the problem of scheduling a project with diverging material flows. Such material flows are released during the execution of the project and are subject to limited processing and storage capacities. Typical examples are nuclear dismantling or other deconstruction/demolition projects, where large amounts of material must be classified, scanned for hazardousness, and processed accordingly. The problem setting is mathematically described as a resource-constrained project scheduling problem with cumulative resources (RCPSP/c). The RCPSP/c deals with finding a project schedule with minimal makespan that satisfies temporal, renewable resource, and cumulative resource constraints. In total, the dataset comprises 192 artificially generated instances that are suitable for testing models and solution methods. In addition, we provide our best found solution for each instance and different modeling variants (e.g., for two types of objective functions). These solutions were computed by heuristic solution methods. The dataset serves as a benchmark for researchers evaluating the performance of solution methods for the RCPSP/c or the more general problem class with resources that can be produced and consumed.

Operations Research Specific subject area Resource-constrained project scheduling with cumulative resources Type of data 1) Archive files (.zip) containing comma-separated values files (.csv) for instances and solutions 2) Excel file (.xlsx) providing an overview of instances and solutions 3) Archive file (.zip) containing Java source files (.java) and class files (.class) for a program to check the feasibility of solutions How the data were acquired Instances were artificially generated by a Java program. Solutions were computed by a Java program using different heuristic solution methods (cf. [1] ). All programs were run on an AMD Ryzen 9 (4.0 gigahertz, 12 cores) with 128 gigabyte of RAM. Data format 1) Raw 2) Analyzed 3) Java program Description of data collection We defined nine parameters for describing the characteristics of an instance. For each parameter, we defined a set of possible levels. For each combination of the parameter levels, we artificially generated a pair of instances. Each pair consists of an instance with granular and an instance with aggregated operations. In total, we got 96 pairs (i.e., 192 instances

Value of the Data
• The dataset (cf. [2] ) contains problem instances of the resource-constrained project scheduling problem with cumulative resources (RCPSP/c) presented by [1] . These instances can be used to evaluate the performance of suitable solution methods. • Since the RCPSP/c is a particular case of a scheduling problem with resources that can be produced and consumed, the provided instances additionally serve as test instances for this general problem class, which has been introduced and discussed in several publications, such as [3][4][5][6][7][8][9] . • The best found solutions using heuristic methods are also provided and can be considered as a benchmark. • The dataset includes a program for checking the feasibility of solutions. It helps researchers verifying the correct implementation of the problem formulation and the functionality of their solution methods. • The dataset will benefit researchers involved in developing solution methods for the scheduling problem with resources that can be produced and consumed (also called 'cumulative resources', 'reservoirs', or 'storage resources').

Objective
The dataset has been generated to evaluate the performance of solution methods for the resource-constrained project scheduling problem with cumulative resources (RCPSP/c) presented by [1] . As a supplement to [1] , this data article provides details about (i) the naming of instance and solution files concerning different modeling variants, (ii) the format of the instance and solution files, (iii) the instance generation procedure using an illustrative example, and (iv) the problem formulation concerning different modeling variants. The objective of this data article is to support researchers in accessing and reusing the generated dataset. For example, following the description of how to retrieve the parameters of the RCPSP/c from the provided CSV files, researchers can implement their own parser for reading the instance data.

Data Description
The dataset consists of three archive files and one Excel file. The archive file 'Instances.zip' contains instances of the resource-constrained project scheduling problem with cumulative resources (RCPSP/c) presented in [1] . The archive file 'Solutions.zip' contains the best found solutions for the instances in 'Instances.zip'. The Excel file 'Overview.xlsx' provides an overview of instance characteristics and objective values of the best found solutions. The archive file 'Solu-tionCheck.zip' contains a Java program's source code for checking the solutions' feasibility. The dataset can be downloaded from [2] .
The RCPSP/c is an extension of the well-known resource-constrained project scheduling problem (RCPSP) (cf. [10] ). It aims to include diverging material flows, typically occurring in largescale dismantling projects (e.g., nuclear dismantling projects), into the RCPSP. These material flows can impose delays on the project schedule due to limited processing and storage capacities. More formally, the RCPSP/c simultaneously deals with (i) scheduling a project using a set of (project) activities and (ii) scheduling the processing of material flows using a set of operations.
The conceptual problem formulation is provided in Appendix A . The RCPSP/c is computationally challenging due to the cumulative resource type required for modeling the limited storage capacities. In contrast to the renewable resource type considered in the RCPSP, the availability of cumulative resources depends on all previous requirements. Due to its application-oriented formulation, the RCPSP/c can be considered a particular case of the general class of scheduling problems with resources that can be produced and consumed. Thus, the presented dataset also serves as a benchmark dataset for this general problem class, which has been introduced and discussed in several publications, such as [3][4][5][6][7][8][9] . However, until now, no consistent convention has been established for the verbal and formal formulation of such problems and, to the best of our knowledge, no general test instances are publicly accessible so far. For example, [3][4][5] use the term 'cumulative resource', [6] refers to 'reservoirs', [ 7 , 8 ] speak of 'consumption and production of resources', and [9] refers to 'storage resources'.
For modeling the processing of material flows, the term 'operation' has been introduced in [1] . However, there are different modeling variants depending on the type of operation (cf. [1] , Section 7): Each processing step of each material unit can be modeled as a single operation. This type is called granular operations (short: 'gra'). Alternatively, processing several material units can be modeled together as one operation, in which case we speak of aggregated operations (short: 'agg'). The type of operation has an impact on the problem formulation, as can be seen in Appendix A .
Each instance is recorded as a separate file within 'Instance.zip'. Instances differ depending on the type of operation. Thus, instance files are named with '[number]_{gra, agg}.csv', where [number] ranges from 1 to 96. In total, the 'Instances.zip' archive file contains 192 instance files. Since instances with aggregated operations are derived from instances with granular operations Section 3 ), each instance pair represents the same problem setting. For example, instances '1_gra.csv' and '1_agg.csv' represent a project with 30 activities, 350 released material units, the same processing and storage capacities, and so on. They only differ in the way operations were generated. On average, the number of operations in an instance with aggregated operations is 93% lower than in the corresponding instance with granular operations. The Excel file 'Overview.xlsx' provides all details about the instances. Each line in this file refers to one of the 96 instance pairs. Column B indicates the PSPLIB-instance (cf. [11] ), which has been used as the starting point for generating the instance pair in this dataset. Columns C to K indicate the levels of the parameters used for the instance generation (cf. [1] , Section 8.1). Columns L to N provide additional characteristics of the generated instances.
The comma-separated value (CSV) format has been chosen for recording the instance files, with the semicolon serving as the delimiter. Table 1 describes the structure of each instance file. The first column in Table 1 indicates the number of consecutive lines to be traversed so that problem parameters listed in the second and third column can be retrieved. Each file also contains header lines, as indicated in Table 1 . These headers are for comprehension purposes only, i.e., they can be skipped by a parser.
For positive cumulative resource requirements, an activity or operation 'replenishes' material units into a cumulative resource. For negative cumulative resource requirements, an operation 'depletes' material units from a cumulative resource. Activities may not deplete by definition since we deal with diverging material flows. We use parameter f ik here instead of f iw in [1] , where w is the index of a material flow path in the set of material flow paths W. The parameters f ik result from converting the data structure used in [1] by setting f ik := w ∈ W | k 1 (w )= k f iw for all activities i = 0 , . . . , I + 1 and all cumulative resources k ∈ R γ , where k 1 (w ) denotes the first cumulative resource in material flow path w ∈ W. That is, we sum over all material units following a material flow path starting with k . This conversion helps us to simplify the problem formulation (cf. Appendix A ) and the structure of the instances since we no longer have to deal with material flow paths.
There are different modeling variants for cumulative resources depending on the type of work progress involved in the problem setting (cf. [1] , Section 7): Work can either progress in a stepwise (short: 'step') fashion, which results in cumulative resource requirements occurring at the start and end times of activities or operations. Such models were investigated by [ 3 , 5 , 7 ], for example. Alternatively, work can progress linearly (short: 'lin') with time, which results in a uniform distribution of the cumulative resource requirements over the execution time of activities or operations. This modeling variant has been introduced by [4] . The type of work progress has an impact on the problem formulation, as can be seen in Appendix A .
In its original formulation in [1] , the objective of the RCPSP/c is to minimize the project makespan (short: 'project'). Since the project only consists of activities, the project makespan equals the latest end time of all activities. An alternative type of objective function is to minimize the total makespan (short: 'total'), which equals the latest end time of all activities and operations (cf. [1] , Section 8.4.4).
Each best found solution is recorded as a separate file within 'Solutions.zip'. Solutions differ depending on the type of operation, the type of work progress, and the type of objective function. Thus, solution files are named with '[number]_{gra, agg}_{lin, step}_{project, to-tal}_solution.csv', where [number] ranges from 1 to 96. In total, the 'Solutions.zip' archive file contains 768 solution files.
The CSV format has been chosen for recording the solution files, with the semicolon serving as the delimiter. Analogous to Table 1 , Table 2 describes the structure of each solution file.
The Excel file 'Overview.xlsx' lists the objective values of all best found solutions in columns O to W. Besides, it provides two lower bounds LB PSPLIB and LB f low (cf. [1] , Section 8.3) in columns X to AB. These bounds differ depending on the type of objective function.
To check whether a solution is feasible for a specific instance, we provide a Java program named 'SolutionCheck' as a part of the dataset. It enables researchers to check the feasibility of their self-computed solutions. This helps them verifying the correct implementation of the problem formulation and the functionality of their solution methods.
The Java program was written with the Eclipse IDE and the Eclipse project was exported as an archive file 'SolutionCheck.zip'. It can be reimported into Eclipse by right-clicking within the package explorer and selecting 'Existing projects into Workspace.' Other IDEs offer similar wizards for importing source code.
Number of released material units by non-fictitious activities Portion of non-fictitious activities releasing material units Renewable resource factor for processing steps Duration vector for processing steps The program consists of four classes split into three packages. No graphical user interface is provided. All settings must be specified in the source code of the class 'MainClass.java' in package 'main'. Here, the pathname of the instance and the solution to be checked must be entered. Furthermore, the type of operation ('gra', 'agg') and the type of work progress ('lin', 'step') must be chosen correctly. Inline comments explain all necessary adaptions in the source code. After running the program, it prints 'Solution is feasible', or information about the first violated constraint to the console. Please note that the instance and solution files must match each other. The solution file must be recorded according to the same structure as the files provided in 'Solutions.zip' (cf. Table 2 ). Otherwise, exceptions might occur.

Experimental Design, Materials and Methods
The procedure for generating instances with granular operations is described in [1] , Section 8.1. As explained there, we employ two parameter groups for characterizing the instances: Project and material flow parameters. Table 3 summarizes these parameters along with the levels we used for the instance generation. We get 2 5 · 3 = 96 combinations of all levels according to a full factorial experimental design. For each combination, we generated one instance, which is why our instance dataset comprises 96 instances.
Since we use PSPLIB-instances as the starting point for generating our instances, the possible levels of project parameter I are restricted to the levels prescribed by the PSPLIB (i.e., { 30 , 60 , 90 , 120 } ; cf. [11] ). Of these, we chose the smallest and the largest value. We fixed project parameters NC, RS, and RF A to one single level since they do not considerably impact the performance of solution methods (cf. [1] , Appendix A2 ).
For each material flow parameter, we limited ourselves to two or three levels. The reason for this is that with the given modeling variants (type of operation, type of work progress, and type of objective function) and with different solution methods, the number of required solving runs is a multiple of the number of instances. We chose 200 as the largest value for NREL , so that with P REL = 1 and I = 120 , a total of 24,0 0 0 material units are released. For comparison: for the dismantling of reactor 2 of the Philippsburg nuclear power plant in Germany, it is stated that 15,590 tons of radioactive residues are released [12] . If we assume that one ton is modeled as one material unit, our largest instance with 24,0 0 0 material units is comparable to a large nuclear dismantling project. Another value of 50 exists for NREL , which allows for generating smaller instances. We chose 200 as the smallest value for INV , which means that with NREL = 200 , just the material released by one activity fits into a storage facility. With INV = 10 0 0 , more storage capacity is available. For P REL , we selected the values 0.25 and 1, thus creating instances in which only some of the activities release materials, as well as instances in which all activities release materials. For RF P , we chose the values 0 and 0.5. In the case of RF P = 0 , there do not ily link activities if there is a single source node ( = fictitious start activity 0) and a single sink node ( = fictitious end activity I + 1 ). However, no circles of precedence relations are allowed since this would cause infeasibility. Using the instance generation procedure defined in [1] , flow-induced precedence relations E f low and release relations E rel form out-trees in the precedence network for each activity releasing material flows. In the case of granular operations, these out-trees only branch at their root, i.e., at the activity releasing the material flows. Fig. 1 (also shown in [1] ) exemplarily depicts a precedence network of an instance with granular operations. Arc weights equal the minimum time lags or, in the case of precedence relations, the durations of the predecessors.
Instances with aggregated operations are derived from instances with granular operations: We replace several granular operations following the same upstream material flow path and modeling the same processing step by one aggregated operation. This aggregated operation spans the total duration and cumulative resource requirement of the replaced granular operations (cf. [1] , Section 7). Renewable resource requirements remain unchanged.  periods. Note that the resource requirements are not included in the figures. When replacing granular operations with an aggregated operation, we set the minimum time lag between the predecessor and the aggregated operation in such a way that if we split the aggregated operations back into granular operations, all of the original time lags would be satisfied. Continuing the example from above, Fig. 1 indicates that d min Using the formulations provided in Appendix A , all other constraints are also always satisfied if, for a given solution, we split aggregated operations back into their underlying granular operations. Hence, the optimal objective value of the problem with granular operations constitutes a lower bound for the objective value of the respective problem with aggregated operations (if the type of work progress and the type of objective function remain unchanged). For example, the objective value of an optimal solution for instance '1_gra.csv' is a lower bound for the objective value of any solution for instance '1_agg.csv'.
For computing the best known solutions provided in 'Solutions.zip', we employed the following solution methods: For instances with granular operations: • The problem-specific schedule generation scheme (SGS) presented in [1] , Section 6.
For instances with aggregated operations: • An adaption of the problem-specific SGS presented in [1] , Section 6, to instances with aggregated operations. • A variant of the problem-specific SGS presented in [1] , Section 6, where activities and operations are scheduled in an integrated way. That is, the decomposition into the two procedures SuperSchedule and SubSchedule in [1] has been removed. Instead, activities and operations are scheduled equally in SuperSchedule . We only implemented this variant for the case of aggregated operations because the number of operations is significantly smaller here than in the case of granular operations. • A generic SGS following the idea of [13] (cf. [1] , Appendix A6).
We implemented all these solution methods in Java without interfacing with external libraries. For each instance and each type of operation, type of work progress, and type of objective function, we ran each suitable solution method using a multi-start metaheuristic as described in [1] , Section 8.2, with a time limit of ten minutes. We ran all computations on an AMD Ryzen 9 (4.0 gigahertz, 12 cores) with 128 gigabyte of RAM.

Ethics Statements
The authors declare that their work complies with the ethical requirements for publication in Data in Brief. They confirm that their work does not involve human subjects, animal experiments, or any data collected from social media platforms.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
Instance dataset for resource-constrained project scheduling with diverging material flows (Original data) (Mendeley Data).

Appendix A. Problem Formulation
In this appendix, the mathematical formulation of the underlying RCPSP/c is provided. Subsection A1 deals with the notation and definitions that apply to all modeling variants. Subsection A2 presents definitions that specifically depend on the chosen type of operation and type of work progress. Based on the two previous subsections, Subsection A3 provides the mathematical formulation of the RCPSP/c. Table 4 presents the notation of problem parameters. Table 5 presents the notation of decision variables. Table 6 presents definitions of decision-dependent functions that apply to all modeling variants. Table 4 Notation of problem parameters.  Notation Denotation

A.1. Notation and general definitions
set of flow-induced precedence relations ( j, j ) ∈ E f low flow-induced precedence relation between operations j (predecessor) and j (successor) d min j j ∈ Z ≥0 minimum time lag between the start times of operation j and release relation between activity i (predecessor) and operation j (successor) d min minimum time lag between the start times of activity i and operation number of units of renewable resource k ∈ R α required by activity i ( = renewable resource requirement) r α jk ∈ Z ≥0 number of units of renewable resource k ∈ R α required by operation j number of units of cumulative resource k ∈ R γ required by activity i ( = cumulative resource requirement) r γ jk ∈ Z number of units of cumulative resource k ∈ R γ required by operation j ( = cumulative resource requirement) Table 5 Notation of decision variables.  Table 6 Definitions of decision-dependent functions.

Function and definition Denotation
set of activities requiring renewable resources at time t, given schedule S set of operations requiring renewable resources at time t, set of activities requiring cumulative resources at time t, given schedule S portion of activity i that has been completed at time t, given schedule S portion of operation j that has been completed at time t, given schedule S o

A.2.1. Definitions in the case of granular operations and linear work progress
In addition to the general definitions in Table 6 , Table 7 presents definitions of decisiondependent functions in the case of granular operations and linear work progress. Table 7 Definitions of decision-dependent functions in the case of granular operations and linear work progress.
Function and definition Denotation total requirement of cumulative resource k ∈ R γ by activities at time t, given schedule S A r γ jk total requirement of cumulative resource k ∈ R γ by operations at time t, given schedule S o In the case of granular operations, each operation models the application of one processing step to one material unit. Consequently, r γ jk ∈ { −1 , 0 , 1 } holds for all operations j and all cumulative resources k ∈ R γ . An operation depletes one material unit from a cumulative resource at its start, processes it, and replenishes this material unit into another cumulative resource at its end. This notion is reflected in the definitions of the functions A Table 7 . It is also possible that an operation depletes and replenishes from/into the same cumulative resource. For such an operation j, r γ jk = 0 holds for all cumulative resources k ∈ R γ (cf. paragraph about 'neutral operations' in [1] , Section 5).
Activities release material units uniformly distributed over their execution time. This notion is reflected in the definition of f γ k ( S, t ) in Table 7 , where the cumulative resource requirement f ik is multiplied with x i ( S, t ) ( = the portion of activity i that has been completed at time t; see also [4] ). Since we assume an indivisibility of the material units, the floor function · is included in this definition. It also simplifies the implementation because it prevents decimal numbers when dealing with the cumulative resource profile.

A.2.2. Definitions in the case of granular operations and stepwise work progress
In addition to the general definitions in Table 6 , Table 8 presents definitions of decisiondependent functions in the case of granular operations and stepwise work progress. Table 8 Definitions of decision-dependent functions in the case of granular operations and stepwise work progress.

Function and definition Denotation
f ik total requirement of cumulative resource k ∈ R γ by activities at time t, given schedule S A  Table 7 is always lower than or equal to f γ k ( S, t ) in Table 8 , the optimal objective value of the problem with linear work progress constitutes a lower bound for the objective value of the respective problem with stepwise work progress.

A.2.3. Definitions in the case of aggregated operations and linear work progress
In addition to the general definitions in Table 6 , Table 9 presents definitions of decisiondependent functions in the case of aggregated operations and linear work progress. Table 9 Definitions of decision-dependent functions in the case of aggregated operations and linear work progress.

Function and definition Denotation
total requirement of cumulative resource k ∈ R γ by activities at time t, given schedule S A set of depleting operations requiring cumulative resources at time t, given schedule S o r total requirement of cumulative resource k ∈ R γ by operations at time t, given schedule S o In the case of aggregated operations, each operation models the application of one processing step to an arbitrary number of material units. Consequently, r γ jk ∈ Z holds for all operations j and all cumulative resources k ∈ R γ . Similar to activities, these cumulative resource requirements are uniformly distributed over the execution times of operations. This is why the definition of r Table 9 . For depletions ( r γ jk < 0 ), we use the ceil function · instead of the floor function. The definitions of the sets A γ + k ( S o , t ) and A γ − k ( S o , t ) also differ from the definitions given before. Both sets now include all operations that have started before a time t.

A.2.4. Definitions in the case of aggregated operations and stepwise work progress
In addition to the general definitions in Table 6 , Table 10 presents definitions of decisiondependent functions in the case of aggregated operations and stepwise work progress.

Table 10
Definitions of decision-dependent functions in the case of aggregated operations and stepwise work progress.  Table 10 . Since f γ k ( S, t ) and r γ k ( S o , t ) in Table 9 are always lower than or equal to f γ k ( S, t ) and r γ k ( S o , t ) in Table 10 , the optimal objective value of the problem with linear work progress constitutes a lower bound for the objective value of the respective problem with stepwise work progress.

A.3. Mathematical Formulation
Based on the notation and definitions above, we mathematically formulate the RCPSP/c as with depending on the type of objective function: = z = S I+1 in the case of project makespan minimization , in the case of total makespan minimization .
Constraints (1b) are the precedence constraints. Constraints (1c) are the flow-induced precedence constraints. Constraints (1d) are the release constraints. Note that for instances with granular operations, d min j j = d j holds. This is why the formulation of (1c) in [1] , Section 5, differs from the formulation here. Constraints (1e) are the renewable resource constraints. Constraints (1f) are the cumulative resource constraints. Constraints (1g), (1h), and (1i) define the domains of the decision variables.