Mix flexibility optimisation in hybrid make-to-stock / make-to-order environments in process industries

When there are sequence-dependent changeovers on a shared resource in hybrid make-to-stock/make-to-order (MTS/MTO) environments in process industries, coordination mechanisms for mix flexibility optimisation are critical for minimising changeover penalties of time and cost. Through a case study, Markov decision process (MDP) is used to determine the optimal policy for scheduling production on a filling line. Simulations are performed to investigate the penalties incurred when sub-optimal policies are utilised to satisfy customers. The results show that while the optimal policy leads to the minimal penalty in terms of rewards, it does not allow for production of all MTS and MTO products to meet customer requirements due to internal and external uncertainty. Sub-optimal policies must also be employed. The results of the simulations point to scheduling actions that would lead to low, moderate and high penalties. The practical implications of this study indicate that practitioners can use the MDP to evaluate the impact of their scheduling decisions on performance metrics on a shared resource in hybrid MTS/ MTO environments. Subjects: Production Systems; Manufacturing & Processing; Operations Management ABOUT THE AUTHOR Dr. Shellyanne Wilson is a graduate of the University of theWest Indies, where she read for a BSc in Chemistry and Management, and a MSc in Production Management. She completed her PhD at the Institute for Manufacturing (IfM), Cambridge University, in Manufacturing Strategy. Her research interests include operations strategy, competitiveness and value chain analysis. In the Department of Management Studies, she lectures Quantitative Methods, Production and Operations Management, and Operations Planning and Control. Prior to joining academia, Dr. Wilson worked in the manufacturing sector in Quality Management in a multinational consumer goods company and Manufacturing Management in a large food manufacturer in Trinidad and Tobago. PUBLIC INTEREST STATEMENT As consumers, we enjoy having the option to choose from among a wide range of colours, styles, models, and flavours. Manufacturers are therefore tasked with supplying us with this assortment, by organising their production resources to either produce products concurrently or sequentially. This research paper focuses on the latter, where sequence-dependent changeovers among products produced on the same resource is considered, with the aim of finding a production planning policy that optimises the use of the resource in terms of minimising the cost and downtime incurred due to changeovers. The results show that the implementation of the planning approach leads to production planning policies that range from optimal, where changeover time and cost are kept to a minimal; to suboptimal, where penalties with respect to time and cost can be classified as low, moderate or high. Practitioners can use this approach in environments where products are both standard-orders and custom-orders. Wilson, Cogent Engineering (2018), 5: 1501866 https://doi.org/10.1080/23311916.2018.1501866 © 2018 The Author(s). This open access article is distributed under a Creative Commons Attribution (CC-BY) 4.0 license. Received: 30 March 2018 Accepted: 14 July 2018 First Published: 23 July 2018 *Corresponding author: Shellyanne Wilson, Department of Management Studies, The University of the West Indies, St. Augustine, Trinidad and Tobago E-mail: shellyanne.wilson@sta.uwi.edu Reviewing editor: Wenjun Xu, Wuhan University of Technology, China Additional information is available at the end of the article


PUBLIC INTEREST STATEMENT
As consumers, we enjoy having the option to choose from among a wide range of colours, styles, models, and flavours. Manufacturers are therefore tasked with supplying us with this assortment, by organising their production resources to either produce products concurrently or sequentially. This research paper focuses on the latter, where sequence-dependent changeovers among products produced on the same resource is considered, with the aim of finding a production planning policy that optimises the use of the resource in terms of minimising the cost and downtime incurred due to changeovers. The results show that the implementation of the planning approach leads to production planning policies that range from optimal, where changeover time and cost are kept to a minimal; to suboptimal, where penalties with respect to time and cost can be classified as low, moderate or high. Practitioners can use this approach in environments where products are both standard-orders and custom-orders.

Introduction
The decisions making up a company's manufacturing strategy determine how the company will achieve its manufacturing objectives of cost, quality, delivery and flexibility. In the area of flexibility, product variety is one industrial feature which prevails across the multiple industrial sectors that make-up the manufacturing industry, where companies must meet the challenge of producing expanding product ranges in a cost-effective manner, and further, companies must also meet set quality standards and agreed upon delivery dates.
Mix flexibility directly addresses this product variety challenge, as this manufacturing flexibility type relates to the ability of a manufacturing system to produce a variety of products within a specified time period, without the need to modify existing manufacturing facilities. When there is sequential production of a company's product mix (Dixon, 1992), as in the case of production on a shared resource (Wilson & Platts, 2009), the ease with which the manufacturing resource can change between various product types, also known as mix response flexibility, becomes a significant concern for practitioners as it impacts many performance metrics such as productivity, throughput, and machine utilization.
As per the familiar refrain "What gets measured, gets managed", the ability to determine the mix response flexibility requirements, by way of changeover times, could therefore contribute to effective production planning and sequencing on shared resources, as practitioners could determine mix response flexibility demands of planned production sequences, particularly when product changeovers are sequence-dependent.
The need for measuring manufacturing flexibility types, however, has been debated over a number of decades. For example, while Primrose and Verter (1996), in a study of Flexible Manufacturing Systems (FMS) concluded that it was not necessary to measure the flexibility of an FMS, Georgoulias, Papakostas, Mourtzis, and Chryssolouris (2009) proposed a toolbox with a number of flexibility evaluation measures that was labelled as a "significant asset" (p.441) for production engineers and decision makers.
Specifically as it relates to the measurement of mix response flexibility, studies have primarily been theoretical quantitative approaches focused on assembly-type operations, with a product range of four products, where the changeovers were not sequence-dependent. However, in process industries, production sequences are an important consideration for production planning, where the optimisation of these sequences could result in higher profits (Mrad & Alfares, 2016). In hybrid make-to-stock/make-to-order (MTS/MTO) environment, characterised as having multiple items, with multiple setups using limited capacity and congestion effects (Rajagopalan, 2002), production systems are even more difficult to schedule and control, and so, production planning is even more complex Teunter 2016, 2017) This paper therefore intends to contribute to the mix response flexibility literature and hybrid MTS/MTO literature, by focusing on the optimisation of sequence-dependent changeovers in hybrid MTS/MTO process industry environments. Further, this paper goes beyond a theoretical treatise and instead, takes a practitioner approach by studying mix response flexibility via the use of Markov decision process (MDP) in a case study of a chemical manufacturer, with a product mix of 14 MTS and MTO products.
The paper proceeds as follows. Section 2 gives an overview of mix response flexibility, and focuses on mix flexibility in sequence-dependent changeovers and in process industries. Section 3 reviews production planning in hybrid MTS/MTO environments. Section 4 introduces the proposed approach of using MDP in mix flexibility optimisation. Section 5 details the case study and discusses the results. Lastly, the Section 6 ends with conclusions.

Mix response flexibility
Flexibility provides a firm with the ability to proactively change to meet its business objectives, and to reactively adapt to internal and external uncertainty encountered. Mix flexibility is the manufacturing flexibility type that enables a company to produce its range of products. Therefore, on the one hand, mix flexibility is employed proactively by way of a firm's decision on the variety of products it offers to the marketplace; and on the other hand, mix flexibility is employed reactively in short term production planning and control decisions regarding the quantities and timing of the products manufactured, where production plans are monitored and adjusted to correct for unplanned events, which would also include systemic and operational disruptions (Finke, Singh, & Schonsleben, 2012).
Mix flexibility can be examined in terms of two dimensions: mix range and mix response. Mix range flexibility considers the breadth of products offered by a firm, and should include both the actual number of products, as well as the degree of heterogeneity among the products. Mix response flexibility considers the ease by which changes among the range of products could be effected (Koste & Malhotra, 1999).
Mix response flexibility's importance to a company depends on how mix flexibility is achieved. At the firm level, if there are multiple plants, the product range can be produced utilising the different plants, where a single plant could be responsible for a single product. In this case, mix response flexibility is of no importance since there is no need to change from one product to a second product. Similarly, at the plant level, if independent resources are utilised to produce a company's range of products, as in the case of concurrent production, mix response flexibility is again of no importance since the production resource is dedicated to a singular item. If, however, a shared resource is used to produce multiple products as in the case of sequential production, mix response flexibility becomes an important factor as it determines the time and cost for a changeover to be effected.
In manufacturing firms, there is widespread use of shared resources in the production of a company's product mix. On these shared resources, there are typically multiple setups, ranging from simple to complex changeovers. The complexity of the changeovers depends on a number of factors, inclusive of the degree of heterogeneity among the products, the nature of the type of change that must be undertaken on the shared resource, and the characteristics of the shared resource itself. These setups can represent significant downtime for the manufacturing system, and so, where possible, practitioners strive to find ways to reduce the need for changeovers, and strive to minimize the time taken for the changeovers.
As mix response flexibility concerns the ease by which a change is made, it is not surprising that the first measures used for its quantification were cost and time. The time measure, however, emerged as the more popular approach, with researchers such as Bateman, Stockton, and Lawrence (1999);Van Hop (2004) and Wahab (2005) providing insight into the area. Bateman et al. (1999) built on the prior work of Chryssolouris and Lee (1992), where the latter studied mix response flexibility by considering the sensitivity to change (STC), which examined the changeover time and the probability of a changeover occurring. Bateman et al.'s (Bateman et al., 1999) contribution was to consider all sequences of products by proposing a measure referred to as Mean Sensitivity to Change (MSTC). Van Hop's (2004) mix flexibility response measure attempted to assess both capacity, which considered how economical it was for a system to change from one state to another; and capability, which considered the number of states the system can perform. Wahab's (2005) approach considered the product mix flexibility response (PMFR) for both a single machine, and a system, made up of multiple machines. The PMFR measure is based on three factors: the difference between products regarding tooling requirements; the range of operations that can be performed by the machine; and the efficiency of the machine.
Each of the three models provides a static measure of mix response flexibility of a manufacturing machine or manufacturing system. But, how would these models assess the varying mix response flexibility needs based on varying production planning needs? Further, while the three models appear highly suited to assembly type operations, how would they perform in the process industry and in hybrid MTS/MTO environments? Lastly, all three models are illustrated using the product range of four products, where changeovers are not sequence-dependent. How applicable are these models for situations with larger product ranges, where changeovers are sequencedependent?

Mix response flexibility and sequence-dependent scenarios
The issue of sequence dependence has not always been considered in mix flexibility literature. For example, in Bateman et al. (1999), the duration of setup i (dur i ) is constant for every setup of Product i, as shown in the validation of the model for four products produced on a single system: J, K, L and M, where the setup duration times were 7, 5, 10 and 6 min respectively. However, for sequencedependent changeovers, setup times often vary based on the order of the products to be produced on the shared resource. At the very minimum, the setup duration for a product depends on the immediately preceding product, and hence, could vary from one setup of the product to its next setup. Figure 1 gives an illustration of a hypothetical scenario, where there are five products making up a company's product mix. The five products belong to two different product families. Within each product family, the sole difference among the products is the label; while the difference between the two product families is the product size. The setup duration to change among products within a product family is 20 min, and the setup duration to change among products in different product families is 40 min.
So, as shown in the production sequences in Figure 1, when Product C immediately follows a product from its product family, such as Product A, the setup duration for this label change is 20 min, and when Product C immediately follows a product from another product family, such as Product Z, the setup duration for this product size change is 40 min.  For a production run where there are four setups of product size changeovers (s) or label changeovers (l), there are 16 possible production sequences, as shown in Table 1, which gives the expected value of the setup duration of 120 min.

PRODUCTION SEQUENCE 1
Hence, because the setup duration for a product may vary based on the production sequence, consideration should be given to sequences in production schedules, as mix response flexibility, by way of changeover times, will impact throughput.
An additional consideration for sequence-dependent changeovers is the probability of occurrence. For sequence-dependent changeovers, however, practitioners often employ routines where similar products are grouped together to minimise changeover times (Wilson & Ali, 2014). As a result, the probability of occurrence of products in a sequence is often based on the occurrence of the immediately preceding product, and would vary based on the production sequence.

Mix response flexibility and process industries
According to APICS, process industries are defined as industries where value is added via mixing, separating, forming, and/or chemical reactions. Examples include food and beverage, tobacco, chemicals, paper and pharmaceuticals. Lager, Samuelsson, and Storm (2017) and Fransoo and Rutten (1994) provided a comprehensive treatment of the characteristics of process industries by way of describing the nature of their raw materials, the associated material quality variability and the resultant potential variations in bills of materials; product flow and yield variability; and process flow characteristics.
Production planning is seen as being particularly challenging in process industries. For example, Feng, Zhang, Wu, and Yu (2011) attributed this complexity of the process industry production planning to the high costs associated with raw materials, equipment maintenance and energy needs of process industries. Similarly, Van Dam, Gaalman, and Sierksma (1998) outlined a number of challenges with respect to the production planning constraints of process industries, which include the nature of the product, by way of shelf lives and the cleaning required to prevent contamination; and the nature of the process and packaging equipment, which, due to their high capital intensity, are required to maintain high utilisation rates.
Hence, with respect to mix response flexibility for shared resources in process industries, in addition to the technical machinery and equipment adjustments, changeovers also may involve cleaning or flushing-out operations. Sequencing of products, because of sequence-dependent changeovers, therefore plays a critical role in mix flexibility achievement in process industries.
There are a number of well-established priority rules that have been employed to sequence jobs in industry. These priority rules Of the six priority rules listed, the LSU rule speaks directly to the relationship between the sequence of jobs and the goal of minimising setup or changeover time. Here, companies group similar products together in a sequence of jobs, where the differences between sequential products present low cost, time and effort for setups. So, for example, in a product mix where there are three dimensions of product variety: labels, package sizes and product types, produced on a shared resource, the LSU rule would attempt to group all the products of the same product type and package size, but with different labels together, as the label changeover would most likely represent the simplest changeover. The grouping of the products by package size or by product type depends on the nature of the equipment and the products in question. In some cases, the changeover routine to change from one package size to another size may be very complex and more time-consuming than changing from one product type to the next type. In such a case, under the LSU rule, the sequence routine would group products of the same package size together, even of different product types, prior to making a change to a second product size. On the other hand, if the changeover from one product type to a second product type requires a time-consuming changeover process, all products of the same type, even of different package sizes, are grouped together prior to making a product type change.
Another sequencing approach, specifically for process industries, was proposed by King (2009), which he referred to as product wheels. In studying the applicability of lean manufacturing tools in process industry settings, King (2009) adapted the heijunka or production smoothing tool to develop product wheels, which he defined as: ". . .a visual metaphor for a structured, regularly repeating sequence of the production of all the materials to be made on a specific piece of equipment, within a reaction vessel, or within a process system." (King, 2009, p. 206). While the goal of this tool is to group similar products in a production schedule, and thereby improve the production planning process through the reduction of the number of time-intensive changeovers, Wilson and Ali (2014) found that the product wheel is not a mathematical optimisation technique, but, instead, it is a heuristic technique that requires judgement and experience to develop optimal sequences.
Even when sequencing routines such as the approaches outlined above are utilised to minimise changeover time are planned in theory; in reality, factors such as demand uncertainty and patterns, stock levels, and internal uncertainties such a machine breakdown lead to planned sequences that do not allow for setup time minimisation.

Production planning in hybrid MTS/MTO production systems
In describing production planning and control activities of MTS and MTO products, Soman, van Donk, and Gaalman (2004) stated that MTS planning focuses on forecasting demand and meeting customer orders, while MTO planning focuses on order execution. Similarly, Beemsterboer, Land and Teunter (2016) argued that the objectives of MTS planning are to prevent stockouts and limit inventory holding costs, while the objective of MTO planning is to meet prespecified delivery due dates. The definition of the hybrid MTS/MTO production system is therefore viewed as an intermediate option, between the two extremes of MTS and MTO (Zhang, Kim, Springer, Cai, & Yu, 2013).
With MTS strategies suitable for high volume and low variety products, and MTO strategies suitable for low volume and high variety products, larger supply chain issues such as mass customisation, postponement and modularization have been studied in the hybrid context (Almehdawe & Jewkes, 2013). Production planning in these hybrid environments is challenging. For MTS products, the focus on forecast-based MTS planning can result in high inventory holding costs, while the focus on the flexible-order based MTO planning can result in production fluctuations, long lead-times and order rejections (Rafiei, Rabbani, & Hosseini, 2014).
Of particular relevance to this paper is the approach utilised by Beemsterboer, Land and Teunter (2016), where in a study of a two-product hybrid system to determine when to schedule the MTS item and the MTO item, MDP was utilised. While the MDP approach is detailed in Section 4, important elements of Beemsterboer, Land and Teunter's (2016) study include the action states of production of MTS item, production of MTO item and idle time; and system states as inventory levels, number of accepted MTO orders and number of outstanding late orders.

Proposed Markov decision process approach for mix response flexibility in sequencedependent hybrid MTS/MTO process industries
In this paper, we propose the use of the MDP as an appropriate sequential decision model for optimising mix response flexibility in sequence-dependent hybrid MTS/MTO process industries, where the objective is to determine a policy that leads to performance optimisation. In the sequencing of production on a shared resource, all the required elements for a MDP model are present, as detailed in Puterman (2005):

set of decision epochs
The decision epochs refer to discrete points in time (t) in which the decisions are to be made. For the mix response flexibility scenario, t represents the point when a product changeover is effected, where t = 1, 2, 3, . . .

• A set of system states
The set of system states (X t ) describes all the conditions of the system at discrete points in time. For this mix flexibility scenario, the system states represent the products made on the shared resource. Depending on the size of the product mix and the degree of product heterogeneity, the states could be individual stock-keeping units (skus), product families, product family categories, or some other way of classifying products.

• A set of available actions
The set of available actions (A t ) refers to decisions that are to be taken at discrete points in time. In this scenario on a shared resource, the actions could entail changeover (CO) decisions. For a shared resource, with four changeover types: CO1, CO2, CO3 and CO4, A t can be therefore represented as follows: A t = {CO1, CO2, CO3, CO4}, t = 1,2,. . .

• A set of state and action-dependent transition probabilities
The set of transition probabilities is the set of conditional probabilities of being in a future state given a current state. In this scenario, the transition probabilities represent the probabilities of occurrence of products that immediately follow on in the production sequence, and could be determined using historical production data, where P ij = conditional probability of being in state j in the future given the current state i; and the matrix of one-step transition probabilities can be represented as follows: P ¼ P αα P αβ P αγ P αδ P βα P ββ P βγ P βδ P γα P γβ P γγ P γδ P δα P δβ P δγ P δδ 2 6 6 4 3 7 7 5 • A set of state and action-dependent immediate rewards The set of rewards is the utility measure of being in the various states, where positive values represent income and negative values represent costs. In this scenario, cost, in the form of penalties could be used to evaluate the sequence of decisions taken, where R(α, a, β) is the immediate utility of moving from product α to product β, under action a. Further, for this mix response flexibility example, the penalties could include a Changeover Time Penalty (P T ), which deals with the downtime associated with the changeover duration; and a Production Run Penalty (P R ), which deals with either the penalty for lost sales due to under-production or the penalty for carrying excessive inventory due to over-production (see Figure 2).
Ultimately, the MDP, through an evaluation of all the rewards accrued through the actions taken, will identify the policy that produces the optimal reward.

Applying MDP to a case study
The following sections describe the application of the MDP to the scheduling of production in a chemical manufacturing firm, where an optimal scheduling policy is determined using the Multi-Level Hierarchic Markov Processes (MLHMP) software system, which is detailed in Kristensen (2010Kristensen ( , 2005.

Case study background
ChemCo Limited is a chemical processing and packaging company, involved in the manufacture of bleach, caustic soda and chlorine gas. Bleach for domestic usage is its main product, and it is produced as 14 skus. Five of these skus are MTS, while the remaining nine skus are MTO. This bleach product mix comprises five bottle sizes, packaged under 4 different labels. The product mix, along with processing times per case and the notation used to represent each product in the case study, is given in Table 2.
All 14 products are produced on the same bottle filling line. Because ChemCo Limited produces one type of bleach, the only changeovers required on the bottle filling line involve label changes to be placed on the bottles to signify the specific sku; and bottle-size changes, where a different size of bottle is required for filling. Naturally, once there is a bottle-size change, there is also a label change to signify a new product. Since the label changeover during a bottle-size changeover is performed as an internal event, for the purposes of this paper, we will focus on the time for the bottle-size changeover as it is the longer of the two changes. The changeover time from one label to a second label typically amounts to 10 min, while the changeover time from one bottle-size to a second bottle-size amounts to 60 min.

The set of system states
To determine the product family classification for the system states, the demand volume and demand variability were determined. For demand volume, historical records of weekly demand were analysed. For demand variability, the coefficient of variation (CofV) is determined as follows: Then, as per D'Alessandro and Baveja (2000) and Wilson and Ali (2014), the 80/20 rule was applied, where products with CofV values in the upper 80% were identified as products with high demand variability, and products with CofV values in the lower 20% were identified as products with low demand variability.  Figure 3 shows the resultant product family classification in a 2 × 2 matrix, which plots demand variability against demand volume. Products with low demand volume and low demand variability were classified as γ system state, and products with high demand volume and low demand variability were classified as α system state. Products with low demand volume and high demand variability were classified as δ system state, and products with high demand volume and high demand variability were classified as β system state.

The set of available actions
As the changeovers on ChemCo's bottle filling line are label changeovers and bottle-size changeovers, the following four actions capture all possible changeovers: Label change after completion of optimal order quantity (l o ), Label change after completion of sub-optimal order quantity (l s ), Size change after completion of optimal order quantity (s o ), Size change after completion of sub-optimal order quantity (s s ),

The set of transition probabilities
To determine the set of transition probabilities, which represents the probabilities of occurrence of products that immediately follow on in the production sequence, historical production data was analysed.
The state transition function for ChemCo is given in Figure 4.

The set of rewards
The set of rewards used for this mix response flexibility scenario is based on the penalties described in Section 4. The penalty for a label changeover is deemed to be lower than the penalty incurred for a bottle size changeover, as the setup duration for the former is 10 and 60 min for the latter. Additionally, while there is no penalty for changeovers after an optimal production run, penalties are incurred when the production run is under or over the optimal production quantity. Figure 5 gives ChemCo's reward function.

Optimal policy
For this mix response flexibility scenario on ChemCo's bottle filling line, an optimal policy was found using the discounting criterion of optimality. The results, using both policy iteration and value iteration, are shown in Table 3, where the optimal policy for α is l o , the optimal policy for γ is l o and the optimal policy for δ is s o . Figure 6 provides a visual representation of the optimal policy for ChemCo.

Simulation of sub-optimal policies
Because of factors such as the composition of the product mix, demand patterns, and internal and external uncertainties, the exclusive use of the optimal policy derived via the MDP would be not feasible to meet all customer requirements for products produced on the ChemCo bottling line. As  such, sub-optimal policies would inevitably be employed to meet customer requirements. Using the simulation feature on the MLHMP, various combinations of sub-optimal policies were investigated to determine the penalties that would be incurred by ChemCo.

Low penalty sub-optimal scenarios
We identified five scenarios where sub-optimal policies resulted in low penalties, where low penalties are classified as amounting to less than 30 for any of the combinations of states and actions (Table 4)

Moderate penalty sub-optimal scenarios
We identified 10 scenarios where sub-optimal policies resulted in moderate penalties, where moderate penalties are classified as being between 30 and 50 for any of the combinations of states and actions (Table 5)

High penalty sub-optimal scenarios
We identified 10 scenarios where sub-optimal policies resulted in high penalties, where high penalties are classified as being over 50 for any of the combinations of states and actions (

Discussion
The MDP optimal policy result is not surprising, as the rewards resulting from the actions for the three states: l o for the α state, l o for the γ state, and s o for the δ state represent the lowest penalties that would be incurred. Intuitively, production planners would discern that these actions would involve the lowest changeover downtime. However, as noted above, the exclusive use of the optimal policy is not feasible if customer requirements are to be met. The simulation results for the various scenarios therefore proved useful in terms of understanding the impact of sub-optimal actions.
For the α state, the simulation results show that once the s s action is utilised in any combination with the γ state and δ state, the penalty incurred will be high; and so, for the α state, making a bottle-size change after a sub-optimal quantity should be avoided as far as possible. Similarly, the s o action carries heavy penalties, resulting in either moderate or high penalties. The l s and l o actions are more forgiving, resulting in either low or moderate penalties. Figure 7 summarises the results of the 25 simulated  scenarios for the sub-optimal actions where the α state is involved. For the s s action, all of the seven scenarios with this action resulted in high penalties. For the s o action, three of the seven scenarios resulted in high penalties, while the remaining four scenarios resulted in moderate penalties, For the l s action, five of the seven scenarios resulted in moderate penalties, while the remaining two scenarios resulted in low penalties. For the l o action, one of the four scenarios resulted in moderate penalties, while the remaining three scenarios resulted in low penalties.
For the γ state, the simulation results also show that if the s s action is to be utilised, then, in order to minimise the penalties, the optimal policy for the α state and δ state should be utilised as far as possible. However, while the s s action results in high and moderate penalties, all other actions result in high, moderate or low penalties. The practitioner therefore can consider a range of actions when having to decide from among sub-optimal actions. Figure 8 summarises the results of the 25 simulated scenarios for sub-optimal actions where the γ state is involved. For the s s action, four of the seven scenarios with this action resulted in high penalties, while the remaining three resulted in moderate penalties. For the s o action, three of the seven scenarios resulted in high penalties, three scenarios resulted in moderate penalties, and the final scenario resulted in low penalties, For the l s action, two of the seven scenarios resulted in high penalties, three scenarios resulted in moderate penalties, while the remaining two scenarios resulted in low penalties. For the l o action, one of the four scenarios resulted in high penalties, one resulted in moderate penalties, while the remaining two scenarios resulted in low penalties.
For the δ state, the s s action, combined with the optimal policy of l o for both the α state and the γ state will result in a low penalty. Further, for the δ state, when the s s action is combined with the s o and s s actions for both the α state and γ state, high penalties result. Therefore, all far as possible, if the s s action is to be utilised, then, in order to minimise the penalties, the optimal policy for the α state and γ state should be utilised. Figure 9 summarises the results of the 25 simulated scenarios for sub-optimal actions where the δ state is involved. For the s s action, five of the 10 scenarios with this action resulted in high penalties, four scenarios resulted in moderate penalties, while the remaining scenario resulted  in low penalties. For the s o action, five of the 15 scenarios resulted in high penalties, six scenarios resulted in moderate penalties, and the remaining four scenarios resulted in low penalties, One of the most useful outputs of the simulation, therefore, was the results obtained via the moderate sub-optimal scenarios. In this category of scenarios, all products in the company's product mix can be manufactured to meet customer requirements.

Conclusion and future work
The ability to measure mix response flexibility may be key towards being able to optimise its achievement. However, its measurement remains elusive. In particular, the three mix response flexibility measurement models: Mean Sensitivity to Change (MSTC), Total Expected Mix Response Flexibility (F) and PMFR, cannot adequately measure mix response flexibility on a shared resource with sequence-dependent changeovers in hybrid MTS/MTO environments.
To meet this shortcoming, this paper explored the applicability of the MDP for sequence-dependent changeovers on a shared resource, where we were able to determine the optimal policy for scheduling actions for each product family. Because the exclusive use of the optimal policy would not allow for all of customer requirements to be met, we also perform simulations, where we examined the penalties that would be incurred when sub-optimal actions are required to meet customer requirements.
The practical implications of this work suggest that the MDP could be used to analyse mix response flexibility for sequence-dependent changeovers on shared resources, and provide practitioners with a guide regarding the decisions made in the scheduling of products on a shared resource. For instance, for this particular case study, we found that for products with high demand volume and low demand variability, planners should avoid making bottle-size changeovers after sub-optimal quantities.
The scope for future research for the usage of MDP for analysing mix response flexibility in hybrid MTS/MTO environments includes considerations into the states and the rewards. In this work, for the states, we used product families categorised based on demand volume and variability. Alternative means of defining states may be used. Likewise, in this work, for the rewards, we determined the penalty values informally. In future work, more quantitative means may be used to formally compute penalty values.

Funding
The author received no direct funding for this research.
Author details Shellyanne Wilson 1 E-mail: shellyanne.wilson@sta.uwi.edu 1 Department of Management Studies, The University of the West Indies, St. Augustine, Trinidad and Tobago.

Citation information
Cite this article as: Mix flexibility optimisation in hybrid make-to-stock / make-to-order environments in process industries, Shellyanne Wilson, Cogent Engineering (2018), 5: 1501866.