Feedback optimal control of dynamic stochastic two-machine flowshop with a finite buffer

Article history: Received 1 May 2010 Received in revised form 6 June 2010 Accepted 7 June 2010 Available online 11 June 2010 This paper examines the optimization of production involving a tandem two-machine system producing a single part type, with each machine being subject to random breakdowns and repairs. An analytical model is formulated with a view to solving an optimal stochastic production problem of the system with machines having up-downtime non-exponential distributions. The model developed is obtained by using a dynamic programming approach and a semi-Markov process. The control problem aims to find the production rates needed by the machines to meet the demand rate, through a minimization of the inventory/shortage cost. Using the Bellman principle, the optimality conditions obtained satisfy the Hamilton-JacobiBellman equation, which depends on time and system states, and ultimately, leads to a feedback control. Consequently, the new model enables us to improve the coefficient of variation (CVup/down) to be less than one while it is equal to one in Markov model. Heuristics methods are used to involve the problem because of the difficulty of the analytical model using several states, and to show what control law should be used in each system state (i.e., including Kanban, feedback and CONWIP control). Numerical methods are used to solve the optimality conditions and to show how a machine should produce. © 2010 Growing Science Ltd.  All rights reserved.


Introduction
In this paper, we examine a tandem two-machine system producing a single part with finite-size internal buffers.Each machine is subject to random breakdowns and repairs.The system can have four different states: two machines fail; two machines work simultaneously; the upstream machine fails while downstream machine is working, and the downstream machine fails while upstream machine is working.The goal of the study is to formulate a model which consists in minimizing the expected discounted cost of inventory/shortage in deterministic horizon in order to find the production rates of a stochastic system.This section presents a literature review, the motivation for using the semi-Markov process, and the contribution of this paper.

Literature review
At a decision making level for the operation of manufacturing system, one of the most configurations studied is a flow-shop system or transfer line (i.e., including a specified number of machines in series).Naturally, each machine is subject to random breakdowns and repairs (making them failureprone machines).Other states characterising a machine include setup time, changing demands, preventive maintenance, etc.Thus, the number of discrete states of a system will grow as the number of machines increases.Indeed, consider a flow-shop system consisting of M machines in which each machine can be in two states (up and down); we therefore have 2 M distinct states, so it is difficult to determine the performance of the system when it is modeled as a discrete-space Markov process with large state spaces.In practice, the optimal production planning of stochastic manufacturing lines (i.e., with failure-prone machines) constitutes an extremely difficult problem Fong and Zhou (2000).Obviously, no exact analytical model could be obtained for a system with the length of the machines.On other hand, one of the characteristic features of a stochastic dynamic of a flow shop is the fact that the inventory of semi-processed parts in buffers between any two machines, known as internal buffers, must be nonnegative Dallery et al. (1989), Sethi et al. (1994).Some papers, such as Kimemia and Gershwin (1983), Akella and Kumar (1986), Bielecki and Kumar (1988), Perkins and Srikant (1997), and Shu and Perkins (2001) , have covered these features.The first version of the problem of production planning for a single machine producing a single part type with two states (up and down) was studied in Akella and Kumar (1986), Bielecki and Kumar (1988).Both these papers presented an exact solution which could find the production rate and a hedging point.The hedging point is a buffer level at which each part type must be produced with a rate equal to its demand rate (u(t) = d(t)).This agreement is a threshold-type that can be considered as a Just-In-Time (JIT) method for solving the stochastic problem by maintaining an inventory level equal to the hedging point.In Perkins and Srikant (1997), and Shu and Perkins (2001) they then went on to consider the problem of a single machine system producing multiple part types.They used the decomposition method, in which the multiple-part-type problem is decomposed into a two part-type problem, as well as a graph technique for a linear switching curve problem.In the case of a two-machine flowshop, some authors have conducted studies on both deterministic and stochastic problems.Since the optimal production planning of a stochastic manufacturing system is difficult, Sethi et al. (1997) studied a system with a single-part-type using a hierarchical approach: the idea is to carry out the uncertainty in the machine's capacity which is averaged, and replace the more general stochastic problem with a limiting problem.In that paper, they show that the performance in the feedback control is better than in the Kanban control.In the literature, a hierarchical control approach was introduced in Gershwin(2002), Gershwin (1989), and Lehoczky et al. (1991), and was based on the frequency of occurrence of different types of events (also called the time-scale control).On the other hand, a deterministic problem of two-machine flowshop systems was studied by Fong and Zhou (2000).These authors although gave an exact solution whose optimality conditions satisfy the Hamilton-Jacobi-Bellman equation, they did not involve the real problem of manufacturing systems that is stochastic rather than deterministic.Bai and Gershwin (1996) used the heuristic method to obtain sub-solution controls in N-machines and in a single part-type system with the objective of long-term average cost minimization.Presman et al. (2002), and Sethi et al. (2005) studied the N-machine flow shop whose profit function is minimized by the average cost.As stated above, so far, there is no exact solution for failure-prone machine systems with the large of the transfer lines.The simulation method therefore represents a significant advantage in terms of analysis of the performance of the system, as can be seen in Lavoie et al. (2009), Kenne and Gharbi (2001).Other papers focus on the performance parameters of transfer lines (i.e., lines including production rate and average buffer levels); that is the case in Dallery et al. (1989), Ciprut et al. (2000), Kim andGershwin (2008, 2005), and Tan and Gershwin (2009), where long lines are decomposed into two-machine lines (flowshop system) in the case of identical machines.This technique is called the decomposition method, and through it, the system becomes simpler and behaves like a buffer or work-in-process inventory between upstream and downstream machines.In most of the works mentioned above, the state machines are characterized by Markov processes and the demand rate is constant.This feature was developed from the formalism of several pioneers such as Rishel (1975) and Davis (1984).Both these authors used the Markov chain to formulate a stochastic model in continuous time as a Piecewise deterministic system (PDS).Moreover, the Markov framework with machines having exponential distributions of uptime and downtime has a coefficient of variation (CV up and CV down ) equal to 1 and breakdown and repair rates equal to constant.As the results in Li and Meerkov (2005), and Enginarlar et al. (2005), the performance of the average number of parts produced (PP) by the last machine depends mostly on the CV up / down : if the CV up / down decreases to less than 1, the performance PP does increase and the sensibility of the PR assumes values within the 6% range.Indeed, the CV up/down is less than 1 if the breakdown and repair rates are functions of time, as indicated in Li and Meerkov (2005).That means the machine lifetime must obey the non-exponential distribution as in Grabsky (2003).

Motivation for using the Semi-Markov process
The simultaneous use of the semi-Markov process and the two-machine flowshop system is motivated by the following three factors.
1. From a practical point of view, the lifetime of a machine is described by a more general random process, as stated in Grabsky (2003).That means machines often have up-down time distributions which could be non-exponential, and characterized by a coefficient of variation (CV up/down ), often less than 1 (see Li and Meerkov (2005), Enginarlar et al. (2005).Thus, the machines may be referred to as aging over time without any restriction while using exponential distributions, as can be seen in the literature.
2. The study of transfer lines is based on a two-machine line (flow-shop system) because no exact analytical solution exists for longer lines, and brute force numerical techniques are unsatisfactory with sizes of the state spaces Gershwin (2002).The performance parameters of a two-machine system, such as the production rate and the average buffer levels, depend on the work-in-process (WIP) inventory and throughput time (lead time or cycle time), and the performance is good if the WIP inventory and lead time are optimized Bai and Gershwin (1996).That leads to an optimal production problem respecting the minimizing of the total cost of inventory/backlog over deterministic time.Moreover, the appreciation of the performance of the system influences the coefficients of variation, CV up/down (see Enginarlar et al. (2005)) and as a result, an optimal production control problem with semi-Markov jumps should be formulated.
3. The time and non-exponential distributions issues can be considered by extending the dynamic programming method using semi-Markov jumps.Hence, a unified model, including production, is developed in this paper and the optimality conditions obtained are then solved to obtain the optimal control policy.

Contribution of this paper
The purpose of this paper is to present a new model for the optimal stochastic control of a failureprone two-machine system in a finite horizon, with semi-Markov jumps and a discounted rate.This model is based on the dynamic programming approach, and adopts the assumptions of Rishel (1975).However, unlike Rishel, who generated an optimal control with Markov jumps and constant transition rates, we use semi-Markov jumps, whose transition rates and probabilities are time-dependent.Using the Bellman principle, the optimality conditions satisfy the Hamilton-Jacobi-Bellman equation which appears in this paper.The new model and related optimality conditions are applied to a real word manufacturing system involving log-normal, Weibull, and gamma distributions, which are in turn used to represent the machine's operating and down times with a CV up/down of less than one.This paper also proposes a solution for the new model with heuristic and numerical approaches.
The next sections are organized as follows: Section 2 presents the problem formulation, and Section 3, the optimality conditions.The dynamics of the system is given in Section 4. The hedging point policy is analyzed in Section 5. Section 6 presents the heuristic method for optimal feedback control, while Section 7 and 8 present two practical case studies.Finally, Section 9 presents the conclusion.

Problem Formulation
We consider a dynamic stochastic flow shop consisting of a tandem two-machine system devoted to producing a single product, as shown in Fig. 1.The machines are subject to random breakdowns and repairs.Each machine has a finite number of states (modes), denoted as i ∈ I = {0,…, m}.Consider the number of parts in the buffer between the first and the second machines, called the work-inprocess (WIP), as x 1 (t), and the surplus level of the finished goods as x 2 (t).The number of parts in WIP cannot be negative and the buffers usually have limited storage capacities such as 0≤ x 1 (t) ≤ B; B is the upper bound on the WIP.If the surplus level x 2 (t) > 0, we have inventories; however, if x 2 (t) < 0, then we have backlogs.
Let u 1 (t) and u 2 (t) be the production rates of the first and second machines, respectively.Accordingly, the maximum production capacities of these two machines are denoted as r 1 and r 2 .We assume a varying demand d(t), which is random variable, as the input.Let χ(t) be the mode of a given machine at time t.It is described by a semi-Markov process with the state space I = {0,…, m} and their transition probabilities from state i to state j, as follows (Becker et al. (2000)): where S n is the time of the next transition and S n-1 , the time of the last transition (with S 0 = 0) with respect to t.The dynamics system contains two different parts; the first is the continuous part, and the second is the stochastic part.The dynamics of the continuous part of the process is described as follows: Let S = [0, B]×ℜ 1 ∈ℜ 2 be a state constraint domain.Then, let x(t) = (x 1 (t), x 2 (t))' ∈ S and 0 ≤ u j (t) ≤ r j , j = 1, 2. For simplicity: x j (t) = x j for j = 1,2, and x(t) = x for t ≥ 0. Let Υ(t, x) be the constraint domain of control as follows: The equation ( 2) can be written as follows: This stochastic differential equation ( 4) is the hybrid system.And the stochastic dynamics of the system is described as follows: If the system enters a state k then a number of independent times is T k with a distribution function F k (t) and the probability density function f k (t), for k = 0,1,...m.The system will go to state j, if the realization of T k is the smallest of all these variables, and the sojourn time in state i just be the smallest realization.Then, the derivative of the semi-Markov transition probability p ij (t) is given as: Let Φ k denote the σ-algebra generated by the random process and the number of independent random times τ k as follows: We now define the concept of admissible controls.
Υ is admissible with respect to the initial state vector For more information and a discussion of this concept, the reader is referred to Sethi et al. (1997) ( ) be the surplus cost, c 1 is the unit inventory cost of the internal buffer, − 2 c the unit surplus cost of the finished product in the external buffer, and − 2 c the unit backlog penalty of the finished product.
Our objective is to find an admissible control u(t, x)∈Α(t,i) that minimizes the following cost function: where ρ > 0 is the discount rate, E u is the mathematical expectation taken with respect to the measure induced by the control law u(t, x), T is deterministic horizon (also deterministic time).The function ( 7) is called the surplus cost function.
For the manufacturing system, the following assumptions are made in developing the control strategy: (H.1) Assume 0 1 > c and . This means that holding costs typically increase as the "value added" increases.
(H.2) The manufacturing Lead Times are considered only on processing time while setup time, transfer time, and queue time are neglected.All operating machines start their operations at the same time.
(H.3)The demand rate is considered for both varying and constant variables.This optimization problem falls within the framework of the optimization system with semi-Markov jumps called stochastic optimal control problem, in which machines' life times obey the nonexponential distribution.In the next section, we establish the optimality conditions described by the Hamilton-Jacobi-Bellman (HJB) equation as candidate of the optimal control problem.

Optimal Feedback Control
In this section, our analysis covers the construction of an optimal feedback control structure that satisfies (2), (3), and ( 7), and determines the production rate u(t,x) with the minimum cost function described in ( 7).Moreover, it is closely related to the idea of a feedback control in which the control variable u(t,x) is chosen based not only on the time t but also on the state x(t).Let be the initial date.Let v i (t, x) denote the value function, i.e.: ( ) ( ) ( ) ( ) Using dynamic programming, the value function in ( 8) is generalized to the following theorem: Theorem 3.1 The stochastic control problem satisfies the system of partial differential equations: (9) at time t, the initial and boundary conditions are satisfied: In equation ( 9), the terms ( , ) i t v t x and ( , ) i x v t x denote the gradient of the value function with respect to time t and state variables x, respectively.
Proof.The proof of this theorem is presented in Appendix A.
Remark 3.1.(i) The system of partial differential equations ( 9) is the well-known HJB equation; (ii) It depends not only on the state variable of the system, but also on their time variation because of the dynamics of semi-Markov decision processes such as p ij (t).Hence, in order to characterize the optimal control, we review the concepts, and the following results represent some properties of the value function v i (t, x) that are needed in order to address the main results on feedback control analysis.
, there exists a constant C 1 , such that the value function satisfies: , such that the value function satisfies the following condition: Proof.The proof of this theorem is presented in Appendix B.
As we defined important measures, we finally considered the stochastic optimal control of a twomachine flowshop in ( 8) with the initial condition (x(t), χ(t)) = (x, i).For this, we established the following verification theorem and requirements which meet the HJB equation ( 9).In addition, v i (t, x) has a unique solution that is equal to the minimum expected total cost among appropriately defined classes of the admissible control law of the system.
(ii) If there exists an admissible system ( ) almost everywhere in t with the probability 1, then ( ) ( ) Proof.The proof of this theorem is presented in Appendix C.
The optimality conditions established in (9) lead to a feedback control.In practice, the feedback control is indispensable to handle the inaccuracies and uncertainties (including stochastic phenomena) that are present in design process, and to make full use of the capacity of the equipment (see Engell (2007)).

Dynamic System
This section describes the dynamics of the manufacturing problem and explicitly relates the HJB equation to the control structure.Let us assume that each machine is subject to random failures and repairs and has two finite states χ i (t): i = 1,2 (i.e., operational and unavailable state) with probability distributions F i (t) and G i (t) to applicable to each state, respectively.Computing of the value function v i (t, x) and the production rate u i (t, x) includes all states.
be defined by the following expression: Thus, the transition graph of finite states is shown in Fig. 2.
To determine the probability and transition probability distributions of the two-machine system, we will replace two-state machines (state 0 and state 1) by four-state machines (state 0, state 1, state 2 and state 3).The conventional implementation of state machines is based on the selection of successor states and the execution of each related action.The machine states are described as follows: State 0: the system is not operating (down); State 1: the system corresponds to the first machine, which is operating and available to produce under the limited buffer level 1 0 ( ) x t B ≤ ≤ ; State 2: the system corresponds to the second machine, which is operating and available to produce if the buffer level x 1 (t) > 0; and State 3: the system is equivalent to the deterministic two-machine flow shop.In this case, we assume that all transition times are not exponentially distributed.
Using the reliability theory in Ross (2003), the dynamics of the two-parallel-machines model is described as similar to that of one equivalent machine as follows: Let P 3 (t), P 2 (t), P 1 (t), and P 0 (t) be the probability that both machines are operational, only the second machine is operational, only the first machine is operational, and neither of them is operational, respectively.This results in twelve transition probabilities, as shown in Fig. 2. The derivative of the semi-Markov transition probability p ij (t) is defined by equation ( 5) where i, j = 0,1,2,3.The calculation of P i (t), and p ij (t) is presented in Appendix E.
In Fig. 2., at state 3: the future state of the system may be that both machines are "down" (3→0) or may be that either the first machine is "up" while the second is "down" (3→1) or the second is "up" while the first is "down" (3→2).At state 2: the future state of the system may be that both machines are "down" (2→0) or may be that either the first machine is "up" while the second is "down" (2→1) or both of them are "up" (2→3).At state 1: the future state of the system may be that both machines are "down" (1→0) or may be that either the first machine is "down" while the second is "up" (1→2) or both of them are "up" (1→3).At state 0: the future state of the system may be that both machines are "up" (0→3) or may be either that the first machine is "up" while the second is "down" (0→1) or the second is "up" while the first is "down" (0→2).
The value function of two-machine flowshop is described by the HJB equation ( 9) with four states as follows:

Hedging Point Policy with Feedback Control
In this section, we describe the hedging point policy whose solution leads to the deterministic problem and bang-bang control characteristics.It is based on the HJB equation ( 9), which is linear in production rates and satisfies Bellman principle of optimality.The solution of the first-order partial derivative in ( 9) is not simple.To interpret the value function v i (t, x), the concept of viscosity solution is often used.For more information and discussion of the concept of viscosity solution, the reader is referred to Sethi et al. (2005) and to Fleming and Soner (2006).However, in this paper, we use a heuristic method in order to overcome the solution of the multivariable problem in ( 9).The idea is to divide the multivariable problem in (9) into two different problems with each one having a single control variable, and corresponding to Akella and Kumar's optimal solution as in hedging point policy Akella and Kumar (1986).
Further simplification of the equation ( 9) is addressed by determining a control u(t,x) through the following linear program: subject to equations ( 2) and (3) above.The optimal feedback control ( 21) is designed to drive the system to the hedging point.If the system state is χ(t) = 0, at which all machines are down, we must have u(t,x) = 0. Whenever the system state is χ(t) = i, the linear program in (21) presents a real-time feedback controller, and the production rate is calculated at every time instant with χ(t) ≠ 0 either according to varying demand or to constant demand.Obviously, since (21) is linear in u(t,x), we obtain the following systems: The point-x space at which the gradient of v i (t, x) is equal to zero is called the Hedging point z*(.).The optimal control problem (23) was established by Kimemia and Gershwin (1983), then Akella and Kumar (1986) established the optimal production rate 2 * ( ) u t as follows: where the hedging point, 2 * ( ) z t is determined by minimizing the following objective function: with respect to x 2 (t), 0 ≤ t < T (see Akella and Kumar (1986), Bielecki and Kumar (1988)).
Applying a similar analysis to the optimal control problem ( 22), the optimal production rate ( ) Then the optimal production rate ( ) t u * 1 may be given as follows: where ( ) ( ) Proof.The proof of this proposition is presented in Appendix D.
In equation ( 26) the variable 1 * ( ) z t is the hedging point of the WIP on the first machine.From the maximum production capacities (r 1 > r 2 ), we assume that the hedging point of WIP x z > ; here, there is no need to produce because the buffer level is high enough.However, when B x z ≤ < 1 * 1 and * 2 2 z x < , there is need to produce on the second machine only.As a result, the control variable in this second zone is (0, r 2 ).The control strategy at the hedging point ( ) .Finally, in zone four, the control is (r 1 , 0).

System Behaviour under the Optimal Policy
In practice, it is difficult to determine an optimal control with all four discrete states.While the Kanban control is only considered when the system is deterministic, the CONWIP control is applied to systems with constant buffer levels (see Bonvik (1996)).In the system presented in this paper, we have both stochastic dynamics and a finite buffer level, and so we therefore intend to apply the heuristic control for each state as shown above.
6.1 Analysis of state 1 of the system The control structure of the system is conditioned by machine 1.Only the first machine is operational while the second one is down.The behaviour of the surplus trajectory depends on x 1 (t) at time t.The first machine is blocked when x 1 (t) = B. When x 1 (t) < B, the first machine is ready to produce and the characteristic of the time saved is as follows: ( ) This characteristic time depends on the control policy u 1 and the current WIP x 1 (t), as in Fig. 4. It means the time saved in which the first machine can only produce a number of parts under-bound B.
Using (28), when x 1 (t) = 0, the minimum time saved is given by: If we decide on the control at time t (t is current time), the real time of the minimum time saved is determined by: Example 6.1 Consider the system at time t; assume B = 10 parts, r 1 = 0.2 part/time unit, x 1 (t) = 0, then: min 1B ST = 10/0.2= 50 time units and real B ST 1 = t + 50.It means that after 50 time units from t, the first machine has already produced 10 parts with maximal capacity u 1 = r 1 and it stops at 50 + t because the WIP is equal to upper bound B = 10.
At the hedging point ( )

Analysis of state 2 of the system
The behaviour of this state is presented in Fig. 5.The control structure of the system is conditioned by machine 2. Only the second machine is operational, while the first is down.The behaviour of the surplus trajectory depends on x 1 (t) at time t.Because the first machine is down, the second machine is starved when x 1 (t) = 0.When x 1 (t) > 0 the second machine is available to produce and satisfy the demand d(t) at time t.Its time saved characteristic can be written as follows: ( ) The production with the hedging point policy in (18) may be adopted, but it must depend on its time saved characteristic.Under a similar condition as in state 1, when x 1 = B, the minimum time saved and its real time are given by: The time saved at the hedging point is: , for i = 1, 2, refer to the time saved in which the machine can only produce in interval [0, ST i (.)] with any policy u i ; (iii) Meanwhile, the terms ( )

ST
, for i = 1, 2, refer to the minimum time saved that the machine can produce a number of parts i i r ST .min ; (iv) The processing time of states 1 & 2 depends implicitly on the size of the buffer between the two machines (B) and the current WIP x 1 (t).

Analysis of state 3 of the system
At this state, both machines are up, and the optimal control problem becomes a deterministic problem.Consequently, the optimal feedback control is considered.We then have the following three cases: a) If the buffer x 1 = B is full, the choice of u 1 depends on the capacity of the second machine.Here, the way to go is to produce with u 1 = 0 because the second machine has an amount of time saved min 2B ST to produce ( ) parts.We can approximate the optimal policy for the second machine by using the model as in Akella and Kumar (1986): This policy corresponds to the Kanban control, where the first machine is instructed to stop production.
b) If the buffer x 1 = 0 is empty, it must produce u 1 (.) = r 1 because the second machine needs 1/r 1 time units before it can continue to produce with any policy u 2 (.), then dx r , the second machine is called a starved machine, and the production is only determined after 1 1/ r time units.c) When the buffer x 1 (t) is neither empty nor full, the feedback optimal control in equations ( 18) & ( 20) is applied.Here, the policy may correspond to the CONWIP control, and the proper production path consists of responding to actual demands u 1 (t) = u 2 (t) = d(t) (i.e., corresponding to hedging point policy).

Analysis of state 0 of the system
This state corresponds to the case where both machines are down: dx 1 (t)/dt = 0, dx 2 (t)/dt < -d(t) < 0 and the buffer level becomes constant.

Simulation Results with constant demand
This section aims to illustrate the validity of the results of the proposed model by using the numerical method with constant demand rate d(t).To that end, we consider a problem with a two-machine flowshop system producing a single-part-type (in Section 2).The dynamics of the system was described in Section 4, and it has four discrete states.
The up-downtimes are distributed according to one of the following three probability density functions, referred to as reliability models: (a) Weibull, i.e., ( ) ( ) ( ) This distribution is denoted as L(α, σ).
(c) Gamma, i.e., ( ) This distribution is denoted as G(α, β o ).We ensure that as the failure rate increases, the functioning rate decreases, and the coefficient of variation, CV, is less than one.We then present the set of up-down times used in this example in Table 1, which also shows their coefficients of variation, CV (which take values less than one and are equal to CV W , CV L, and CV G with CV W = 0.93, CV L = 0.95, and CV G = 0.57), the MTTF (Mean Time to Failure) and the MTTR (Mean Time to Repair).The parameters of the system are as follows: -Maximal production rate r 1 = 0. -Demand rate d(t) = 0.145.We use the numerical method based on the Kushner and Dupuis (2001) approach because it is very difficult to solve the HJB equation with an analysis model.Let ∆x k > 0 and ∆t > 0 denote the lengths of the finite difference intervals of the variables x and t, respectively.The first-order partial derivatives of the value function in equations ( 17)-( 20) are replaced by the following expressions: (37) For details of this method, the reader is referred to Kushner and Dupuis (2001).The results of this example are illustrated in Fig. 6 to Fig. 11 with given internal values of -20, 20], and t ⊂ (0, 500).

Interpretation of the results for case A in Table 1
This corresponds to case A in Table 1 above.Fig. 6 and Fig. 7 represent the production rates u 1 (t,x 2 ) and u 2 (t,x 2 ) versus surplus level x 2 and time t at x 1 = B = 20 parts.Fig. 8 and Fig. 9 represent the production rates u 1 (x 1 , x 2 ) and u 2 (x 1 , x 2 ) versus WIP x 1 and surplus level x 2 at t = 205 time units, this time t is chosen arbitrarily from within the time interval (0, 500).Simulation results correspond to the hedging point policy as in equations ( 24) and ( 26).In Fig. 6, the production rate of M 1 u 1 (t,x 2 ) is equal to zero, which corresponds to zones 1 and 2 in Fig. 3 (i.e., x 1 ≥ * 2 z ).In Fig. 7, the production rate of M 2 u 2 (.) is equal to maximum at (r 2 = 0.225) while the surplus level x 2 is less than zero (zones 2 and 3 in Fig. 3), and is equal to d = 0.145 when x 2 = * 2 z =0.95 parts.However, this rate is equal to zero when the surplus level is more than 0.95 parts over time (0, 500).In Fig. 8, the production rate u 1 (.) is equal to maximum (u 1 = r 1 = 0.25) when the WIP x 1 = 0, and is equal to zero if , which corresponds to zones 1 and 2 in Fig. 3. Fig. 9 presents the production rate of M 2 (u 2 (x 1 ,x 2 )) versus x 1 and x 2 .The optimal policy at the hedging point * ( ) We also obtain the hedging point on the first machine ( )  Simulation results show that the optimal control law is similar to the bang-bang control problem when the surplus level varies in time t < T < ∞.Note that the hedging point policy * 2 ( , *) u t d = z is valid over time.The value of the optimal production rate u 2 (.) is greater than zero when the system is in state 3 (i.e., both machines are up), and is equal to zero when the system in states 0 and 1 (i.e., machine M 2 is down).On other hand, the value of u 1 (.) is expressed analogically as u 2 (), but it is equal to zero when the system in states 0 and 2. When the system is in state 0, 1, or 2 the heuristic policy described in sub-section 6.1 and 6.2 is used.As results, the optimal policy agrees with the analytical model developed in Akella and Kumar (1986).

Interpretation of the results for case B in Table 1
The results of this case are similar to the case when the up-downtime obeys log-normal and Weibull distributions, as in Fig. 10 and Fig. 11 because we use the same MTTF and MTTR.Both cases A and B in Table 1 (i.e., where log-normal and gamma distributions are used for machine uptimes) and in Fig. 8 and Fig. 10 show that the machine must produce at a maximum production rate u 1 = r 1 when the WIP level is equal to zero.

Simulation Results with varying demand
This example is to illustrate the optimum cost values for different demand scenarios with the varying demand rate d(t) for which data is given in Table 2.The parameters of the system are the same as in constant demand rate with case A in Table 1.Results of this example are shown in Fig. 12 and Fig. 13 as the optimal production rates for M 1 and M 2 , u 1 (.), u 2 (.) versus the time and the surplus level at x 1 = B = 20.At x 1 = B = 20 parts, the production rate for the first machine shown in Fig. 12 is equal to zero for every time t and x 2 .In Fig. 13, the production rate of the second machine u 2 (.) is equal to the maximum value when the surplus level x 2 is less than zero, and is equal to zero when the surplus level is greater than the hedging point value z 2 (t).This value z 2 (.) is equal to 0.95 when u 2 (.)= d(t) over time (0, 500).The production rate u 2 (.) fluctuates in time because of the varying demand d(t) with the bang-bang control, as in Fig. 13.

Summary and extensions
In this paper, the optimal stochastic production problem for a two-machine flowshop, single-product manufacturing system has been considered, with machines subject to random breakdowns and repairs.Using Markov properties, we have formulated a new model in the form of a stochastic control problem by adopting Rishel's assumptions to model discrete machine states, which are characterized by semi-Markov jumps, and by using a dynamic programming approach to make decisions at different stages over time.The objective was to find the production rate for upstream and downstream machines while minimizing surplus costs by using a semi-Markov process (i.e., Markov properties).The optimality conditions were established using Pontryagain's principle, and led to the HJB equation.In effect, the production control is the feedback control, for the control variables of two production rates are linear over time.
The heuristic approach presented seeks to improve the complexity of the HJB equation when the system has stochastic control variables and it makes the problem deterministic.We also provided an analysis of the hedging point policy for the feedback controller.We applied our proposed model to a real world manufacturing system with machines having Weibull, log-normal, and gamma distributions.In what follows, we discuss the other extensions in our model, which is very important.
While the classical Markov model (see Rishel (1975), Davis(1984)) has been considered as a stochastic optimal control with homogenous Markov jumps, the Boukas proposition (see Boukas (1988)) gave an analysis of stochastic problems with non-homogenous Markov jumps.Obviously, both homogenous and non-homogenous Markov jumps have constant transitions.That led to a model that is not time-dependent even through the control problem is considered in continuous time.However, we have extended to the stochastic problem in the continuous-time optimal problem with semi-Markov jumps, i.e., the transition rates between the machine states as well as the transition probabilities are time-dependent.Hence, the optimality conditions do not depend only on the system states, but also on the time t.This first extension can enable us to consider the failure and functioning rates as being functions of time instead, and to thereby improve the coefficient of variation CV for the up-downtime distribution which is less than one.This is very different from the Markov framework, and can lead to a high system performance (see Li and Meerkov (2005)).A rich body of works exists in the literature, examining semi-Markov processes, and these go back fifty years.They include the following: A detailed theoretical analysis of semi-Markov processes is described in Howard (1971); Glynn (1989) considered a generalized semi-Markov process (GSMP) of discrete events.Abbad (1991) presented the semi-Markov control problem (SMCP) using an infinite horizon approach with discounted rewards and showed that the SMCP and the Markov control problem (MCP) are of the same ergodic class.Recently, D'Amico et al. (2005) and Janssen and Manca (2007) presented a generation of applied semi-Markov processes which can apply to economic and financial issues.They showed that a semi-Markov process is the renewal process.In our model, we used definitions of semi-Markov processes found in Becker et al. (2000) and the assumptions of Rishel (1975) to characterize the discrete events of the system, such as machine breakdowns and repairs.
The second extension considers the control problem in deterministic horizon with discount rate.This extension is neither similar to the Rishel formulism nor to the Boukas extension; Rishel considered the problem in a finite horizon, without a discounted rate while Boukas considered it in an infinite horizon, with a discounted rate.Consequently, the proposed model looks at a control of the system in order to meet either constant or varying demand.

Future Research
This new model can be applied to a large-scale system (job shop) with machine maintenance and setup problems, using the Just in Time (JIT) concept.

A. Proof of Theorem 3.1
To prove this theorem, we use Rishel's assumptions (see Rishel(1975)).For that, we consider a system with two different events, as follows: (1) Given χ(t) = i, the probability that there are no jumps of χ(t) at time t in the interval [0, T]: (2) Given χ(t) = i, the probability that there is the first jump of χ(t) at time t in the interval [0, T]: Next, we consider the process in the finite interval [0, T] and consider events that have χ(t) exactly m jumps, m = 0,1,…,for all t ≤ T. Assume that T is bounded, with probability 1; thus event χ(t) has more than a finite number of jumps in [0, T], has probability 0 (see Rishel (1975) Using equation (A.1) for the probability of no jump from i to i at s:  Using equation (A.2) for the probability of the other jumps from i to j, the terms in equation (A.3) can be written by induction, starting with: Let h(x,u,s) be a continuous function defined by  4), ( 5) and ( 6).We may split the integral at any value of small increment Δt > 0 to obtain: Since h(x,u,t) is a continuous function, using the Taylor series on its first variable, the integral in (A.9) is approximately h(x, u, t)Δt.Therefore we have: The result of (A.12) holds for the present value function ( ) ) ( , t t v i x for deterministic horizon optimal control problems defined by equation ( 8).(c) The HJB equation (A.12) derived from Bellman principle of optimality obtains on time scales.

B. Proof of Theorem 3.2
This proof has been extended to the optimal control problem with discounted cost from results of Chapter III in Fleming and Soner (2006).Therefore, to prove that ( , ) The coefficient C 1 depends on initial value x(t) such that Α is convex function for every (t,x).This proves (i).Now we proceed to prove that the value function ( ) satisfies the Lipschitz condition for every (t,x).First of all, the ( ) must satisfy the variable x.For any control ( ) ( ) , let: , we can rewrite: 1 Therefore, the Lipchitz condition is satisfied for every (t, x).This also proves (ii).

C. Proof of Theorem 3.3
To prove this theorem, we recall Definition 2.1.Let (Ω, Φ, P) be a probability space for t ≤ s ≤ T, and (t, x) be initial date  (ii) In the proof of (i), equality now replaces inequality in (C.4).These complete the proof of Theorem 3.2.

Fig. 2 .
Fig. 2. Transition graph of two-machine system with four states

Fig. 3 .
Fig. 3. Hedging point policy which results in minimum objective function (25) being used.In zone three ( the control policy should be set to rapidly reduce the shortage while keeping the upper-zero inventory in the buffer of the first machine.When B

Fig. 4 .Fig. 5 .
Fig. 4. Characteristic of the time saved of M 1 Fig. 5. Characteristic of the time saved of M 2 rate is equal to the demand rate.The time saved at the hedging point is also called the Hedging time d 25, r 2 = 0.225 -Unit inventory cost of the internal buffer c 1 = 0.5 -Unit surplus cost of the finished product in the external buffer 2 1 c + = -Unit backlog penalty of the finished product and 2 2 c − = -Discounted rate ρ = 0.65.

Fig. 10
Fig. 10Optimal Production Rate for u 1 (.) on M 1 versus x 1 and x 2 at t = 205 time units

Fig. 12
Fig. 12Optimal Production Rate for u 1 (.) on M 1 versus t and x 2 at x 1 = B are two ways to get results: by using the boundary condition
. Let Αbe any admissible feedback control with the initial vector x(t) = x.From ( ) ( )