Uncertainty quantification with risk measures in production planning

This paper is concerned with a simulation study for a stochastic production network model, where the capacities of machines may change randomly. We introduce performance measures motivated by risk measures from finance leading to a simulation based optimization framework for the production planning. The same measures are used to investigate the scenario when capacities are related to workers that are randomly not available. This corresponds to the study of a workforce planning problem in an uncertain environment.


Introduction
Uncertainty quantification is currently an active research topic including a wide field of applications. In this work, we focus on the numerical evaluation of performance measures for a production network model whose dynamics is stochastic in the sense that machine failures or capacity drops can randomly occur. According to [21], this scenario fits into the context of parametric variability which is one possible source of uncertainty. In order to measure the performance of a production regarding different production plans, risk-adjusted performance measures might be used. They are formally given as a mapping from the expected outcome (e.g. profit) and its corresponding risk to some value, see [12,26]. Hence, measuring the performance of a stochastic production model is related to the quantification of risk or uncertainty since the risk is caused by equipment failure being one part of business risk.
Having tailored performance measures at hand, production planning considerations and questions of control can be studied. For example, production planning and control of continuous production models have been studied in [1,25] and production planning of discrete production models with randomness in [20]. Obviously, there is a high interest in the definition of performance measures that support planning decisions and also allow for optimization purposes, see [23,24]. However, we remark that the model under consideration is only academic so far. If production data is available, continuous production

The deterministic model
We briefly recall the production network model from [6,13] and its stochastic extension to a load-dependent model from [16]. To focus on the main ideas, we restrict on the case of a production network consisting of one queue-processor unit first. This means, we consider a processor with an unbounded queue in front, which represents the storage. We assume that the processor is defined on an interval (a, b) ⊂ R, i.e., with length L = ba, and use ρ(x, t) as the density of goods at x ∈ (a, b) and time t ≥ 0. Here, the interpretation of x can be the spatial position of the goods within the processor or the so-called degree of completion. The dynamics of the density is given by the following hyperbolic partial differential equation where μ ≥ 0 is the maximal capacity and v > 0 the production velocity. If we prescribe a processing time T prod , we have the relation v = L T prod . The queue is placed in front of the processor and its length q is modeled by the following ordinary differential equation i.e. as the balance between inflow into and outflow. Here, g in (t) is the inflow into the queue, which can be an externally given inflow G in (t) in case there is no predecessor. If the queue has predecessors, the inflow into the queue is the weighted sum of the outflow out of the incoming processors. The outflow of the queue can be described as follows: if the queue is non-empty, the processor takes its maximal capacity μ, and if the queue is empty, the processor can take the inflow into the queue, which is bounded by the maximal capacity μ. Summarizing, this reads as The coupling of the processor and the corresponding queue is described by a boundary condition ρ(a, t) = g out (t) v and initial conditions ρ(x, 0) = ρ 0 (x) ∈ L 1 ((a, b)), q(0) = q 0 ∈ R ≥0 are given. Figure 1 summarizes the modeling equations graphically, where the gray boxes at a represent the queue load following the ODE and the black solid line the density on (a, b) with dynamics described by the hyperbolic PDE.
We now describe the extension to the network case, see [6]. Suppose that equation (1) holds for a density ρ e on interval (a e , b e ) with production velocity v e > 0 and capacity μ e ≥ 0 for every arc (or processor) e ∈ A in a directed network G = (V, A), where G describes the production network topology with nodes V and edges A = {1, . . . , N}. We denote by δv and δ + v the set of all ingoing and outgoing arcs for every vertex v ∈ V. At vertices without any  and for every v ∈ V with |δ + v | > 0 we assume distribution rates A v,e ∈ [0, 1], e ∈ δ + v , i.e. how much of the flow is distributed to the subsequent processors. They satisfy e∈δ + v A v,e (t) = 1. An example of a production network is shown in Fig. 2, where, for example, at node 2 the distribution rates are given as A 2,2 = α 1 and A 2,3 = 1α 1 .
Letting s(e) denote the starting node for a given arc e, we adapt the flow into and out of the queue of processor e as follows: This deterministic model is well-defined if the initial densities and the inflow functions are of bounded total variation, see [6]. Moreover, we can define a solution if the inflow and initial densities are in L 1 by using an extended solution operator S μ , i.e. S μ st u is the solution starting at time s with initial condition u = (q 1 0 , . . . , q N 0 , ρ 1 0 , . . . , ρ N 0 ) at time t ≥ s and with capacities μ = (μ 1 , . . . , μ N ), see [4,6]. This deterministic flow S μ will determine the deterministic evolution between the machine failures or capacity drops. As already mentioned in the introduction, due to the missing data, we focus on the mathematical description of the stochastic model in the following section.

Stochastic extension: load-dependent capacities
In the following, we introduce the stochastic production network model from [16]. We assume that the capacities μ e can take finitely many non-negative values μ e (i) for i ∈ {1, . . . , C e }, C e ∈ N and we introduce the variable r(t) = (r 1 (t), . . . , r N (t)) ∈ {1, . . . , C 1 }×· · ·× {1, . . . , C N } determining the capacities used in the production network at time t. Combining all ingredients leads to deterministic dynamics where To incorporate random capacity drops in the production network, we follow the theory of piecewise deterministic Markov processes, see e.g. [7,19]. Since we have the deterministic evolution between the jump times given by Φ, we only need to specify the intensity ψ at which jumps occur and the distribution of the jumps η. To do so, we use rate functions λ e ij describing the rate that processor e has a capacity change from μ e (i) to μ e (j) and assume That means, we assume for all y = (r, q, ρ) ∈ E and B ∈ σ (E) λ e r e r e t, (q e , ρ e ) , l =r e λ e r e l (t, (q e , ρ e )) ψ(t, y) ε (r 1 ,...,r e-1 ,l,r e+1 ,...,r N ,q,ρ) , where ε x is the Dirac measure with unit mass in x and is the σ -algebra on E. Here, P denotes the power set, B(R N ≥0 ) the Borel σ -algebra on R N ≥0 and σ (L 1 ((a k , b k ))) the Borel σ -algebra generated by the open sets induced by the L 1norm. If the rate functions are continuous with respect to (t, y) and uniformly bounded, then there exists a stochastic process Y = (Y (t), t ∈ [0, T]) ⊂ E on some probability space (Ω, F, P), which is piecewise deterministic between the jumps and follows the deterministic production network equations, see [16].
To construct sample paths of the stochastic process Y , we use a numerical scheme for the deterministic evolution Φ and a thinning algorithm for the jump times simultaneously. The approximation of the deterministic evolution consists of a left-sided Upwind scheme for the densities ρ e and the forward Euler method for the queue-length evolution q e . Let T n ≥ 0 be the time of the nth jump to the value of Y n ∈ E, then a thinning algorithm produces the next jump time T n+1 and post-jump location Y n+1 as it is shown schematically in Fig. 3.

Figure 3 Thinning algorithm
For the description of the procedure we assume a uniform boundλ on ψ. Starting from T n with the value Y n , we take an exponentially distributed time ξ 1 with meanλ -1 and use the deterministic evolution to obtain the value Φ T n ,T n +ξ 1 Y n at time T n + ξ 1 . With an acceptance rejection method, we decide whether a jump is accepted with probability . If a jump is accepted, the new state of the system Y n+1 is produced by the kernel η. This procedure is repeated until the final time horizon is reached.

Stochastic extension: capacities as clusters
In the following, we study the scenario where the capacity (μ e (t), t ≥ 0) at processor e consists of N e ∈ N individual capacities. To be more precisely, we assume that μ e (t) can be written in the form where (X e i (t), t ≥ 0) is the capacity of cluster part i. This is important if the production capacity depends on N e workers that are randomly not available. We assume that every part of the cluster can be on or off, i.e. X e i ∈ {0, 1}, and leads to capacity drops of the capacity process μ e . In the context of individuals, X e i (t) represents whether worker i is available for work or not. Figure 4 shows possible realized capacities in the case of the diamond network, cf. Fig. 2, at time t 1 and a later time t 2 . We note that the total capacities are not conserved, i.e. workers are either available or not.
The next question is how we can incorporate these ideas into the setting of the model presented in Sect. 2.1. The mathematical idea is to interpret every (X e i (t), t ≥ 0) as a continuous time Markov Chain (CTMC) on a common probability space (Ω, F, P) and that they are independent of each other. The following lemma 2.1 provides the main tool to numerically evaluate the workforce planning problem in sub Sect. 4.2. To simplify the notation, we neglect the index e in the following. for λ 0 , λ 1 > 0. The stochastic process defined by is a CTMC with Q-matrix satisfying The proof can be found in the Appendix. Note that the proof consists of basically two steps: first proving the Markov property and second using combinatorics to compute the generator Q of the process.
We also have the following remark. (3) shows that X(t) is binomially distributed, i.e.

Remark 2.2 Equation
The steady-state distribution is consequently because of The latter allows for a simple availability analysis of the capacities because the parameters λ 0 and λ 1 are known from estimations. Equation (4) describes the long term probabilities, which give an easy representation of the capacity one can expect in this production unit. The expected capacity is E[X(∞)] = N λ 0 λ 0 +λ 1 , which is useful in practical applications to identify bottlenecks in the production.
Since the entries q ij of the generator Q describe the transition rate from state i to j with i = j, we have This choice embeds this load-independent model into the load-dependent model by the special choice of the rate functions λ e ij (t, (q e , ρ e )) independent of t, q e and ρ e .

Performance measures
As we have seen, the stochastic production network model introduced allows for random capacities capturing load-dependent failure rates. Since the model is driven by a stochastic process, there is a need for tailored evaluation tools, so-called performance measures, if we are given a set of sample paths. This section is devoted to classical performance measures for production models and performance measures based on risk measures motivated from the finance area. Performance and in particular risk measures are statistical measures that can be used as predictors of investment risk and volatility in many applications. Even though such measures are mostly applied in finance, they can be also used to evaluate production systems.

Classical performance measures
Considering sample paths of the production network at every point in time is too detailed in most cases and aggregated quantities are of high interest. According to [15,17], the aggregated outflow until time t ≥ 0 of the complete network can be computed as where V out = {v ∈ V : δ + v = ∅} are the nodes without a subsequent processor. We can also define as the cumulative sum of all queue-loads up to time t ≥ 0. Both quantities above are realvalued random variables. Classical performance measures are for example the expectation and variance of these quantities. They only measure the outflow or the network queue loads and do not include any information on the profit of the company.

Risk measures in performance measures
If we consider the profit until time t as a random variable Π(t) (a functional of Y ), where Y is a stochastic production network model, then we can include monetary aspects. We could also use not risk-adjusted performance measures, as e.g. the expectation for the random variable Π(t), to describe the performance of the production. However, it turns out that they are not the best choice especially in the context of optimization, see Sect. 4. The reason is that a high expected profit can incorporate a high risk that we intend to measure in an appropriate manner. One possibility is the quantification of probability of a bankruptcy, i.e., which has different unit (probability) than the profit (monetary unit). One has to keep in mind that this probability does not include any information about the needed surplus to capture "bad" events. There, we can help us with the so-called Value at Risk and the Average Value at Risk, see e.g. [2,10,27], which have been introduced in the context of finance and insurance. They correspond to the class of monetary risk measures, which we introduce in the following definition 3.1 taken from [10]. for every X,X ∈ H with X ≤X, we have (X) ≥ (X) (Monotonicity) 2. for every X ∈ H, m ∈ R, it holds that ρ(X + m) = ρ(X)m (Cash invariance) and it is called a coherent monetary risk measure if additionally it holds that 3. for every λ ≥ 0, we have (λX) = λ (X) (Positive homogeneity) 4. for every X,X ∈ H, it follows that (X +X) ≤ (X) + (X) (Subadditivity).
Property 1 of Definition 3.1 states that, if X is interpreted as the profit, the risk of a less profitable company is higher. Assume that we have a risk-free surplus of m, then the risk is reduced by m, see property 2. Property 3 induces a normalization, i.e. (0) = 0 and ensures an easy scaling of different currencies. Property 4 describes how aggregation leads to a reduction of risk.
One very common risk measure is the Value at Risk (V@R λ (X)). It is defined as for some level λ ∈ (0, 1) and a real-valued random variable X on some probability space (Ω, F, P); see [10]. The Value at Risk is simply a quantile and cam be rewritten as with the upper quantile function One can show that V@R λ is a monetary risk measure, which is positive homogeneous but not subadditive. Since the profit of a production network will contain sums of individual costs of machines, we can not guarantee that the Value at Risk incorporates risk diversification effects. One can easily construct a coherent risk measure from the Value at Risk, which is the Average Value at Risk, and it is defined by From the computational point of view, we can estimate the Value at Risk (6) from a sample of profit realizations by an estimation of the quantile function such as quantile a in Matlab. This can again be used to compute the Average Value at Risk with, e.g. the Matlab function integral b .
Remark 3.2 Following [10], AV@R can also be expressed as where (z) + = max{0, z} and q is a λ-quantile (e.g. q = q + X (λ)) of X. This yields for continuous distributed X an alternative approach from a computational point of view since solving 1 λ inf t∈R (E[(t -X) + ]λt) directly leads to the V@R λ (X) and the AV@R λ (X).

Computational results
In this section, we numerically investigate the performance measures, their similarities and differences. First, we investigate the optimal routing problem and comment on different combinations of distribution rates in a diamond network with respect to the performance measures. Second, we study the impact of cluster sizes, i.e. available workforce, on the performance measures. The simulation results should help decision makers to understand risks, uncertainties and complexities. Therefore, all results are described and interpreted in detail.

Distribution parameter planning in the load-dependent case
Let the topology of a production network be given as a diamond network, see Fig. 2, where α 1 , α 2 ∈ [0, 1] are the two distribution parameters, i.e., a percentage of α 1 is fed from processor one into queue two (A 1,2 (t) = α 1 ), 1α 1 from one to three (A 1,3 = 1α 1 ) and the same for α 2 from processor two to queue five and 1α 2 to queue four.
As before in Sect. 3.2, we analyze the profit but here for different distribution rates α 1 and α 2 . To do so, we adopt the profit functional, which is now given as min v e ρ e b e , s , μ e r e (t) · p(s) -e∈A q e (s) · C e q (s) ds.
The price of the product is p(s) ≥ 0 and C e q (s) ≥ 0 is the storage cost at time s for storage e ∈ A.
We assume p(s) = 1 and C e q (s) = 0.1 for every e = 1, . . . , 7 and s ∈ [0, T]. The queueprocessor units all have a production velocity v e = 1 and a length of one, and we start with an empty system at full capacity. The capacities are given by (ordered by states) μ 1 ∈ {0, 3}, μ 2 , μ 3 ∈ {0, 1, 2}, μ 4 ∈ {0, 1}, μ 5 ∈ {0, 2}, μ 6 ∈ {0, 1, 3}, and μ 7 ∈ {0, 2, 3}. To include the load-dependency we follow [16] and define the Utilization Ratio UR by with λ rep,max,e = 10, λ rep,min,e = 4 and λ down,e = 2. This means we consider two different types of processors with two or three capacity states, identical repair rates but different breakdown rates. We cannot state an explicit stationary expected capacity as in Sect. 4.2 and simulations are needed to determine which combination (α 1 , α 2 ) performs in a "optimal" way. In Fig. 6 We see a strong influence of the distribution parameters on the profit. The vertical black line describes the allocation of (α 1 , α 2 ), which is the best choice for the corresponding evaluation of the profit. In the case of the sample mean, the choice (α 1 , α 2 ) = (0.4, 1) leads to the highest value of 6.2087; see Table 1. In contrast, the sample standard deviation is small for the choice (α 1 , α 2 ) = (0.9, 0), with a value of 1.9261. The lowest bankruptcy probability follows from the choice (0.5, 0.7), which is close to the best choices of the Value at Risk (0.6, 0.9) and the Average Value at Risk (0.5, 0.8) in Fig. 5. Regarding the different evaluations of the profit sample, the choice of combination (α 1 , α 2 ) strongly depends on the choice of the evaluation criterion, but there is a tendency to equally distribute at node two and more into processor five than in processor four at node three. Feeding more into Page 12 of 21   Obviously, the bankruptcy probability is a worse measure in optimizing (α 1 , α 2 ) for this example since the relevant region is very flat and disturbed by the errors from the Monte Carlo simulation. In high contrast, both, the Value at Risk and Average Value at Risk form a nice shaped measure and neglecting the Monte Carlo errors it seems to be convex as well.

Study of cluster sizes and workforce planning
We analyze the stochastic production network model with two arcs, see Fig. 7, where capacities are given as in Sect. 2.3 and we define the profit Π(t) as the cumulative revenue reduced by storage and cluster-size costs up to time t ≥ 0.
In formulas, this reads as min v e ρ e b e , s , μ e r e (t) · p(s) where we denote with p(s) ≥ 0 the price of the product, C e q (s) ≥ 0 and C e N (s) ≥ 0 are the storage and cluster size costs, respectively, at time s. Typical examples for the cluster size cost are maintenance costs or salaries. If is a monetary measure of the risk, we can interpret (Π (T)) as the risk of the company of the aggregated loss and gains up to time T.
In the following, we study an example and consider a production network model in the form of a chain of two processors. The capacities of the processors satisfy μ e (t) ∈ {0, . . . , N e } for given cluster sizes N 1 , N 2 ∈ N. We further assume that the times between failures and the repair times of each part of a cluster are exponentially distributed with mean MTBF 1 = 80, MTBF 2 = 50 and MRT 1 = 10, MRT 2 = 20. This results in the rates and means that the expected capacity of the second processor for N 1 = N 2 is lower than the capacity of the first one. We assume a processing velocity v e = 1 for both processors, a time step t = 1 and a time horizon of T = 365. The length of each processor is one, and to satisfy the Courant-Friedrichs-Lewy (CFL) stability condition, we choose a spatial step size of x = 1. Given that G 1 in (t) = 10 is a constant inflow of ten parts per unit time, we start with an empty production. The only missing parameters are the cluster sizes N 1 and N 2 and our goal is to analyze how the profit Π(T) changes if we choose different combinations of cluster sizes. To evaluate the profit, we assume storage costs C 1 q = C 2 q = 0.01, cluster size costs C 1 N = 4, C 2 N = 6 and a price p = 10.02. The following simulation results are based on M = 10 4 Monte Carlo samples. Figure 8 contains the sampled values of the expected profit (a), the standard deviation (b) and bankruptcy probability (c).
In this example, the expected profit strongly depends on the cluster sizes, i.e. there are few combinations that lead to a positive expected profit. The highest expected profit is achieved with the choice N 1 = 10 and N 2 = 12 with a profit of 2.3535 · 10 3 , as Table 2 states.
With respect to the standard deviation, we obtain the combination N 1 = N 2 = 15 having the lowest standard deviation of 0.6733 · 10 3 ; see Table 2. This is reasonable because in this case, the production is less affected by capacity drops due to the large cluster sizes.
The bankruptcy probability in Fig. 8(c) shows a tight region in which the probability of going bankrupt is low. The combination with N 1 = 10 and N 2 = 12 leads to a bankruptcy probability of 0.0804 and the same combination as the expected profit.
For a level of λ = 0.1, Fig. 9 contains the Value and Average Value at Risk for this level. The cluster sizes N 1 = 10 and N 2 = 12 lead to a Value at Risk of -0.2283 · 10 3 . This can be interpreted as follows: even if we have debts of -0.2283 · 10 3 , the probability to face bankruptcy at T = 365 is lower than 0.1. The more pessimistic risk measure, Average Value at Risk, leads to a best choice N 1 = 8 and N 2 = 10 with a AV@R 0.1 (Π(T)) of 0.6255 · 10 3 , i.e., we should have a surplus of 0.6255 · 10 3 for these cluster sizes.
All introduced performance measure evaluations in Table 2 are given for reasonable cluster size choices. Obviously, the best choice for the standard deviation leads to bad performance in all the other performance measures. The allocation N 1 = 10 and N 2 = 12 implies the best solution in terms of expectation, bankruptcy probability and Value at Risk with level 0.1. The Value and Average Value at Risk seem to work quite well here and give information about the surplus one has to hold to capture bad events and avoid Page 15 of 21  a bankruptcy with a high probability (given by 1λ). One has to be careful with the interpretation here. The profit is defined as the cumulative difference of earnings and costs and does not imply detailed information at every time between zero and T.

Conclusions
We have analyzed performance measures for a stochastic production network model. It turned out that different performance measures might lead to totally different results and the choice of the appropriate measure should be a crucial point in the development of an optimization framework. In our examples, the measures Value at Risk and Average Value at Risk always lead to reasonable results and a suitable curvature in the set of feasible solutions. In particular, from a practical point of view, a major benefit of our approach is a comprehensive numerical analysis of a fully time-dependent stochastic production model that keeps track of random capacity reductions. The combined study with risk measures also allows for a performance evaluation in a simulation environment. So far, the model is limited to the consideration of homogeneous products but could be extended to multicommodity flows in a straightforward way. Furthermore, the well-known buffer allocation problem [8,22,28] used for the design of manufacturing systems can be analyzed within this framework to find optimal buffer sizes. Therefore, future work will deal with the formulation of rigorous optimization problems for the stochastic production network model with respect to relevant performance mea-sures. Mathematically, a focus will be the study of the quality of solutions and the speed of convergence to an optimum. We also hope to get more insight in the decision making for more realistic flow lines.

Appendix: Proof of Lemma 2.1
Proof The proof consists of two steps. First, we prove the Markov property of (X(t), t ≥ 0), and we compute the Q-matrix in the second part. Clearly, the stochastic process (X(t), t ≥ 0) takes only values in the finite, discrete space E X := {0, . . . , N}, which together with the σ -algebra E X = P(E X ) builds a measurable space. We prove the Markov property with Dynkin's criterion [9, theorem 10.13] and follow the ideas of [3, problem 4.14]. We have N CTMCs with state space {0, 1} given, and we construct the N -dimensional Markov process (Y (t), t ≥ 0) with Y (t) = (X 1 (t), . . . , X N (t)) on the product space (Ω N , A N , P N ). This process takes values in the measurable space (E Y , E Y ) = ({0, 1} N , P({0, 1} N )), and we obtain the process (X(t), t ≥ 0) defined by the measurable surjective mapping . Let (U t , t ≥ 0) be the semigroup of Markovian kernels given by (Y (t), t ≥ 0), i.e., for every x ∈ E Y , s, t ≥ 0 and B ∈ E Y . If we can show that for every x,x ∈ E Y with ψ(x) = ψ(x) the equality for every t ≥ 0 and Γ ∈ E X holds (Dynkin's criterion), then we know that the stochastic process (X(t), t ≥ 0) is a Markov process. From x ∈ ψ -1 ({m}), it follows that x σ ∈ ψ -1 ({m}) for every m ∈ E X and every permutation σ because of the structure of ψ. Therefore, we can compute for a permutation such that x σ = y holds. Hence, we have the Markov property of (X(t), t ≥ 0) by Dynkin's criterion, and it is a CTMC. We calculate the transition probabilities of the stochastic process (X(t), t ≥ 0) to obtain the Q-matrix. First, we know for every i = 1, . . . , N and for every l, m ∈ {0, 1} that the transition probability fulfills P X i (t + t) = m|X i (t) = l = 1 l (m)(1tλ l ) + 1 1-l (m) tλ l + o( t) ( 8 ) in the limit t → 0; see [18, pp. 28]. We choose k, j ∈ {0, . . . , N}, and we set C = E N in the following. Then, we can write P X(t + t) = k, X(t) = j = α∈C |α|=k β∈C |β|=j P X 1 (t + t) = α 1 , . . . , X N (t + t) = α N , X 1 (t) = β 1 , . . . , X N (t) = β N = α∈C |α|=k β∈C |β|=j N i=1 P X i (t + t) = α i |X i (t) = β i P X i (t) = β i using the independence of the stochastic processes and the multi-index notation At this point, we can use the transition probability (8) and merge all terms of order o( t). This yields

+ o( t).
To simplify the computation, we analyze both summands in the last equation separately, and we start with the first one (9). The product of the indicator function implies α = β, and we put higher orders of t into o( t) again. By doing this and observing that k has to equal j, we see The last equality follows from N l=1 λ β l = jλ 1 + (Nj)λ 0 since j entries of β are one and all the others are zero. Now, we analyze the second summand (10), which we simplify by merging terms to o( t). We distinguish two cases, i.e., the case k = j + 1 and the case k = j -1, since all remaining cases are impossible. In detail, if, e.g., k = j + 2, then α has two entries more with value one, which implies that α and β are different in at least two entries. Hence, the product in the second summand is zero.
Summarizing all computations yields the transition probability P X(t + t) = k|X(t) = j = 1 j (k) 1t jλ 1 + (Nj)λ 0 as t → 0 and we conclude the generator of (X(t), t ≥ 0) as the matrix Q from the statement of this lemma.