Data pooling for multiple single-component systems under population heterogeneity

the


Introduction
The importance of effective maintenance activities is growing due to the development of complex industrial systems and increasing reliance on these systems (de Jonge and Scarf, 2020).Advancements in data collection and analysis techniques due to Industry 4.0 provide more opportunities for data-driven maintenance policies (Tortorella et al., 2021).Even though the data is collected and stored easier than in the past, there are still challenges in the integration of Industry 4.0 opportunities into maintenance.One specific challenge for data-driven maintenance policies is the scarcity of failure data (Louit et al., 2009).The amount of failure data is scarce especially at the beginning of the life cycle of a new system.In order to increase the number of data points, it is possible to collect data from similar systems, which is referred to as data pooling (Gupta and Kallus, 2022).We approach this problem from the perspective of a maintenance service provider of multiple systems at different locations.Examples of such systems include healthcare equipment located in different hospitals or milking robots located at different farms.These systems are largely identical.lithography systems are introduced, some of the components are used for the first time, therefore there is not any lifetime data at the beginning of the lifespan.Martinetti et al. (2017) describe a case where maintenance service responsibilities of a specific type of train are transferred from the component supplier to the Netherlands railways.However, the data obtained from the component supplier is fragmented at the beginning of the lifespan.This forms another reason for uncertainty with respect to the lifetime distribution of the components.
When historical data is not available at the beginning of the lifespan for a certain component, the estimation of the lifetime distribution is mainly based on expert opinions and technical data obtained from suppliers (van Wingerden, 2019).However, obtaining detailed data from suppliers can be costly.Suppliers may only share partial information because of strategic reasons (Martinetti et al., 2017).It is also not straightforward to obtain the exact lifetime distribution from technical specifications.These problems can lead to heterogeneous expert opinions on the true lifetime distribution.The difference in the characteristics of the components is referred to as population heterogeneity in the literature.In this paper, we assume that a component is always supplied from the same population (see de Jonge et al., 2015;Dursun et al., 2022).Please note that in the literature, the population heterogeneity may also imply that there is an unknown mixture of populations for the components, where in each replacement, a component is supplied from any of these populations (see van Oosterom et al., 2017;Abdul-Malak et al., 2019).In practice, the characteristics of the components can be heterogeneous, representing the varying quality from different suppliers, production at different manufacturing lines, or different causes of failure (Jiang and Jardine, 2007).For example, parts that are printed with two different printing options can have different reliability levels (Lolli et al., 2022).
In this paper, we assume that there are two different expert opinions regarding the lifetime distribution, where one of them is the true lifetime distribution.We represent this with two type of populations: a weak and a strong population.The assumption is that the components always come from the same population.However, the true population type is unknown.We resolve this uncertainty regarding the true population type by pooling the data collected from different systems.
The concept of population uncertainty represents one type of failure-model uncertainty.In the literature, another type of failuremodel uncertainty is parameter uncertainty, where it is assumed that the parameters of a failure-model are unknown (see Drent et al., 2020; van Staden et al., 2022;Deprez et al., 2022;Walter and Flapper, 2017;Fouladirad et al., 2018).There are two approaches to resolve the failure-model uncertainty by collecting historical data.The frequentist approach only uses the historical data to estimate the unknown parameters without assuming any prior information on these parameters.On the other hand, the Bayesian approach allows incorporating a prior belief and can update the belief with the available data (Powell and Ryzhov, 2012).
In our paper, an initial belief on the true type of population is available.This belief is updated in each time period by using Bayes' rule with the data pooled from multiple systems.The Bayesian learning approach is suitable for our problem because we consider systems at the beginning of their lifespan and assume that the historical data does not exist or is very scarce.Bayesian learning has been used for multiple maintenance problems, see van Oosterom et al. (2017), Droguett andMosleh (2008), de Jonge et al. (2015), Dayanik and Gürler (2002), Elwany and Gebraeel (2008), and Walter and Flapper (2017).In this paper, we build a discrete-time partially observable Markov decision process (POMDP) model to find the optimal replacement policy for a single component that occurs in multiple identical systems with the objective of minimizing the total cost during the whole lifespan.This is a sequential learning problem for which we apply a solution approach (POMDP) that balances the trade-off between exploration and exploitation optimally (see Dursun et al., 2022 for more details).Exploration means deliberately taking possible costly actions with the aim of resolving uncertainty now, so that cost reduction can be obtained in the future.Exploitation aims to minimizing the short term cost without considering the effect of an action in the long term (see Powell and Ryzhov, 2012;Dezza et al., 2017).
We position our research in the area of optimal learning with (partially observable) Markov decision process models under failure-model uncertainty.To the best of our knowledge, the number of studies in this area is limited; see van Oosterom et al. (2017), Abdul-Malak et al. (2019), Drent et al. (2020), Dursun et al. (2022), van Staden et al. (2022) and Drent and van Houtum (2022).A classification of these studies is provided in Table 1.Here, we classify the papers first according to the source of failure-model uncertainty, which can be either parameter uncertainty or population heterogeneity.Then, we classify the papers according to the number of systems that they consider, which can be either a single system or multiple systems.Finally, we classify the papers based on the type of their failure-models.By a ''timeto-failure'' failure-model, we mean that there are only two degradation states (i.e., good-as-new and failed) where the time until failure can follow any probability distribution.On the other hand, a ''degradation'' failure-model refers to a situation with more than two degradation states (i.e., not just good-as-new and failed but also intermediate states such as defective) and a Markov chain modeling the transition from one degradation state to the next.
Our paper comes closest to the paper of Dursun et al. (2022) that is also about population heterogeneity under the assumption of a timeto-failure model.However, in this paper, we consider multiple systems with data pooling, while Dursun et al. (2022) only considers a single system.Among the works that consider parameter uncertainty, Drent et al. (2020) also focus on an age-based replacement policy for a single system and follow a Bayesian approach to learn an unknown parameter of a specific form of probability distribution for the lifetime of components.Similar to our work, Drent and van Houtum (2022) apply data pooling from multiple systems in a Bayesian way to resolve the failure-model uncertainty.However, the source of uncertainty in their failure-model is parameter uncertainty.By focusing on the case with parameter uncertainty and a time-to-failure model, van Staden et al. (2022) minimize the expected maintenance costs of a system over its finite contract period and prescribe scheduled preventive maintenance interventions based on a frequentist 'first predict' then 'optimize' approach.
Another related study is Deprez et al. (2022).This paper is not positioned in Table 1 because its analysis is not based on optimal policies.However, Deprez et al. (2022) is related to our work in its objective to build a myopic-type data-driven maintenance policy for multiple systems by using the historical data coming from different systems.Deprez et al. (2022) aim to optimize the number of preventive maintenance interventions for a machine during a finite time horizon.Different from our work, they assume system heterogeneity (i.e.data is pooled from machines operating in non-identical conditions) and adopt a frequentist estimation approach.In our paper, we resolve failuremodel uncertainty with Bayesian updating and focus on the optimal policy by building a POMDP model.
In this research, we investigate the potential benefit of learning the true population type from multiple similar systems at the same time.We define three policies in order to quantify this benefit and generate further managerial insights.Specifically, Policy I is the optimal policy for multiple systems with data pooling.Policy II applies the policy which is known to be optimal in a single-system setting with pooling the data from multiple systems (i.e.updating the belief regarding to the true population type by using all the data coming from multiple systems).Policy III also uses the optimal policy for a single-system setting but without data pooling (in this case, the optimal policy derived in Dursun et al. (2022) is used for each system).
We address the following research questions: (1) How does the structure of the optimal policy for a multi-system setting with data pooling (Policy I) look like and how does it differ from the optimal  4) What is the effect of the number of systems, the length of the lifespan, the coefficient of variation of the time-to-failure distribution, the cost of maintenance activities, and the initial belief on the costs?To the best to our knowledge, our paper is the first to study how the true population type can be learned optimally in a setting with multiple systems and population heterogeneity.In our numerical experiments, we show that cost reduction due to data pooling is up to 5.6% for two systems and up to 14.8% for 20 systems.The majority of the cost reduction is due to data pooling, and a relatively small part is due to optimizing the preventive replacement decisions for multiple systems jointly.As the number of systems that pool the data increases, the cost of maintenance converges to the cost of maintenance under perfect information about the true population type.Also, the reduction in the cost per system as a function of the number of systems is higher for low numbers of systems and becomes smaller as the number of systems increases.
The remainder of this paper is organized as follows.In Sections 2 and 3, the problem description and the mathematical formulation of the POMDP model are provided, respectively.Section 4 presents the benchmark policies that will be compared against the optimal policy of the POMDP model.Section 5 provides structural results for a special case and presents insights based on this special case.Section 6 presents the results and insights from our computational experiments.Finally, Section 7 concludes the paper.

Problem description
We consider  single-component systems.Let  ∈ {1, … , } denote the index of a system.It is known that the systems are taken out of service at the same time, and we refer to the time until that moment as the lifespan of the systems.The time horizon of the problem is set equal to the lifespan of the systems, and we let this time horizon consist of discrete time periods of equal length.Without loss of generality, we scale time such that the length of each time period is one time unit.The length of the time horizon is expressed in the number of time periods and is equal to  ∈ N, where N is the set of positive integers.Each system has a critical component which fails randomly and independent of the components in the other systems.If a failure occurs during the th time period after the installation of a new component, then the component is replaced at the end of that time period and we say the lifetime of the component is .We let  denote the corresponding discrete random variable for the lifetime of the component.At the beginning of each time period, an action is taken for each system which is either to replace the component preventively with a cost   or to do nothing with no cost.If a component fails before reaching the next time period, then it is correctively replaced at cost   .If a component is replaced preventively at the beginning of a period, the system starts that period with a new component.It holds that   <   because the cost of corrective replacement includes the costs associated with a breakdown in addition to the costs related to a replacement.When a system reaches the end of the lifespan, the maintenance activities are terminated and no cost occurs at this moment or later because all systems go out of service at that moment.
We assume that there are two populations for the components: a weak and a strong population.The components always come from the same population.However, the true population type is unknown.We let  denote the belief that the components belong to a weak population (i.e., the probability that the components always come from the weak population) and p denotes the initial belief at the beginning of the lifespan.
Let   denote the lifetime random variable for component type , where  = 1 refers to the weak type and  = 2 refers to the strong type.Therefore, under the belief variable , the lifetime random variable  satisfies  =
For each population, the time-to-failure distribution is assumed to be known.The objective of the decision maker is to determine the optimal replacement policy that minimizes the expected total cost over the time horizon.

Mathematical formulation
In this section, we provide a mathematical formulation of the problem described in Section 2.
Decision epochs: A decision is made at the beginning of each time period.We let  ∈ {0, 1, 2, … , } denote the number of time periods remaining in the time horizon.Note that  = 0 corresponds to the end of the time horizon (i.e., the moment the final time period is completed).
States: At each decision epoch, the state is described as follows:  ∈ [0, 1] is the current belief that the component population is of the weak type, and   ∈ {0, 1, … } denotes the age of the component in system .Let  represent the age vector for the components of all the systems:  = ( 1 , … ,   ).The state of the model is represented by (, ).
Actions: At each decision epoch and at any state, there are two possible actions for each system: 'do a preventive replacement' and 'do nothing'.These actions are denoted with   = 1 and   = 0, respectively, for system .Let  denote the action vector ( 1 , … ,   ), representing the actions for all the systems.Let  denote the set of all possible action vectors.(Please note that Dursun et al. (2022) formulate the replacement decisions as a scheduling problem; i.e., the actions describe when to perform the preventive replacement of a component at the moment that a new component starts its operation in the system.This way of describing the actions would lead to a much larger action space than the alternative way used in this paper.) State Transitions & Rewards: Suppose that the do-nothing action is taken for system  (i.e.,   = 0).The system starts the time period with the existing component, and there are two possibilities: either the component fails (denoted with   = 1) or it stays in the working condition (denoted with   = 0).The components in all systems are subject to the same time-to-failure distribution, but failures occur independently.Time-to-failure is represented by the random variable  () for system .In case of a failure, which occurs with probability ()   >   ) for population type , the component installed in system  is replaced correctively at cost   and the next period starts with a new component at age 0. If there is no failure, which occurs with probability  ( ()   >   + 1| ()  >   ) =  ( ()  >   + 1)  ( ()   >   ) for population type , the age of the component in system  increases by one.
Next, suppose that the replace action is taken for system  (i.e.,   = 1).The replacement is immediate at cost   , and the system starts the current time period with a new component at age zero.Similar to the case with the do-nothing action, the component either fails (denoted with   = 0) or it stays in the working condition (denoted with   = 1) in that period.If the component in system  fails, which occurs with probability  ( ()   = 1) for population type , the component is replaced correctively at cost   and the next period starts with a new component at age 0. On the other hand, if the component does not fail, which occurs with probability  ( ()   > 1) for population type , the age of the component in system  increases by one.
The updated age vector ŷ = ( ŷ1 , … , ŷ ) under action  will be ŷ = •( − ) with '•' denoting the element-wise multiplication (Hadamard product) of two vectors and  denoting a vector with  elements where all elements are equal to 1.We let  = ( 1 , … ,   ) denote the vector of observations for all the systems.Note that there are 2  possible distinct realizations of .Let  = { 1 , … ,  2  } denote the set of all these possible realizations.The probability that observation  occurs when the state is (, ŷ) is denoted by  (, ŷ, ): where the expression   ( ŷ, ), which represents the likelihood of observing  at the age vector ŷ for population type , can be calculated as for  ∈ {1, 2}.At the end of each period (i.e., after the realization of ), Bayes' rule can be used to update the belief variable .Specifically, the updated belief variable, which we denote as the function (, ŷ, ), is given by Finally, the age vector ŷ is updated to ( ŷ + )•.
Bellman Optimality Equations: Let   (, ) denote the optimal cost until the end of the time horizon with  time periods remaining in the time horizon given that the current state is (, ).It holds that  0 (, ) = 0 for all  ∈ [0, 1] and The Bellman optimality equations are given by for all  ∈ {1, … , }, where for all  ∈ [0, 1] and   ∈ {0, 1, … } with  ∈ {1, … , }.In the rest of the paper, the function   (, ) in Eq. ( 2) is also referred to as the value function.We refer to the resulting optimal policy for this multi-system setting with data pooling (i.e., observations collected from all systems are used to update the belief state) as Policy I. We denote the actions under this policy by  * I (, , ) = ( * I (,  1 , ), … ,  * I (,   , )) for a given (, , ).The algorithm to solve the Bellman equations is provided in Appendix B.
Remark 1. Please notice that we assumed two populations in this section and Section 2. It would also be possible to assume  distinct populations that our components might be coming from.Then the state description becomes (, ), where  = ( 1 , … ,   ) and ∑  =1   = 1.That is, the belief state becomes a vector instead of a single parameter (i.e., each point on the vector corresponds to a particular belief of having a certain population type).The Bayesian update function then also returns a vector: ) .

Benchmark policies
In this section, we provide the benchmark policies to compare against Policy I and each other in order to address the research questions introduced in Section 1.We start in Section 4.1 with the optimal policy when the true population type is certainly known.Sections 4.2 and 4.3 describe the benchmark policies obtained by using the optimal policy of a single system without and with data pooling, respectively.

Perfect information setting
We consider a setting where the true population type is assumed to be known, referred to as the perfect information setting.The difference between the cost obtained under the perfect information setting and the optimal cost under Policy I constitutes a base cost for the maximum amount the decision maker is willing to pay to resolve the population heterogeneity.We formulate an MDP model to determine the optimal policy in this setting.Note that the belief state is not needed anymore since the true population type is known, and only the ages of the components in the systems are considered as the state variables.Let  ()   () denote the optimal cost until the end of the time horizon with  time periods remaining in the time horizon given that the current state is  for population type .We present Bellman optimality equations for a single system because an optimal policy for each system can be calculated independently, and therefore, it holds that  ()   () = ∑  =1  ()   (  ).The Bellman optimality equations for a single system are given by where and  () 0 () = 0 for  ∈ {1, … , } and  ∈ {0, 1, …}.We introduce a function   ( p, ) defined as for p ∈ [0, 1], for all   ∈ {0, 1, …} where  ∈ {1, … , }.The function   ( p, ) can be interpreted as follows.Suppose that the true component type is unknown and an initial belief p is available at the beginning of the lifespan as in the original problem.However, different from the original problem, suppose that the true component type is immediately revealed to the decision maker just before the lifespan starts, and systems are operated in their entire lifespan by following the optimal İ. Dursun et al. replacement policy corresponding to that true population type.The function   ( p, ) represents the expected cost under this scenario just before the true population type is revealed to the decision maker.Consequently, the function   ( p, ) −   ( p, ) can be interpreted as the expected benefit of resolving the uncertainty in the true population type (or the cost of not knowing the true population type) at the beginning of the lifespan under the initial belief p and age .

Single-system optimal policy without data pooling (Policy III)
For convenience in presentation, we introduce the single-system optimal policy without data pooling (Policy III) before introducing the single-system optimal policy with data pooling (Policy II) as the optimal actions of Policy III will be used by Policy II.
For Policy III, we assume that the true population type is learned without data pooling.This means that each system is considered as isolated from others so that the replacement decision of the component in a system is independent of the other systems and the belief on the true population type is updated by only using the data collected from that particular system.Thus, each system can be analyzed separately and we formulate the value function for a single system.The age of the component in the system is , the action space is {0, 1} and the set of observation vectors reduces to {0, 1}.We define the Bayesian update function for this policy as follows: The Bellman optimality equations are given by for all  ∈ {1, … , }, where +  III −1 ( g(, , 0),  + 1) and  III 0 (, ) = 0 for all  ∈ [0, 1] and  ∈ {0, 1, … }.The optimal policy that is obtained via these Bellman equations is referred to as Policy III.We denote the optimal action under Policy III for state (, ) and the remaining number of time periods  with  * III (, , ).We also denote  * III (, , ) = ( * III (,  1 , ), … ,  * III (,   , )) as the optimal policy vector for  systems for a given (, ) under Policy III.

Single-system optimal policy with data pooling (Policy II)
In this section, we describe our benchmark policy (called Policy II) that is obtained by applying the optimal action of Policy III in each system independently from other systems, while updating the belief variable by using the data collected from all the systems.We let  II (, , ) denote the action taken by Policy II at state (, ) and remaining number of time periods .Thus, it follows that The value function under this policy is given as and  II 0 (, ) = 0 for all  ∈ [0, 1] and   ∈ {0, 1, … } for  ∈ {1, … , }.Notice that Policy II follows exactly the same actions as Policy III, however it uses the data collected from all the systems in a period to update the belief variable at the end of that period.

Structural results for a special case with deterministic lifetimes
In this section, we introduce a special case with a deterministic lifetime distribution for each population.For this special case, we can derive analytical results because in this setting the population type is learned perfectly after one 'do nothing' action for a component with an age that is one time unit less than the deterministic lifetime of a weak component (in that case, you find out in the upcoming time period whether you have a weak component or a strong component).We limit the number of scenarios such that at most two replacements would be needed for the weak and one replacement would be needed for a strong component type.The proofs regarding the propositions in this section can be found in Appendix A.
Assumption 1.We assume we have  (> 1) systems.A component that comes from the weak population fails at age  (≥ 4) and a component from the strong population fails at age 2.We let 2 <  < 3 − 3. The systems are newly installed, therefore,  =  at the beginning of time horizon.Finally, we assume that p = 0.5 and 2  <   .
We introduce a new policy to examine the effect of joint optimization on the total expected cost.
Definition 1 (Perfect Learning Policy).We define a 'perfect learning policy' as follows: when the initial components reach age  − 1 (i.e. at  =  − ( − 1)), we apply the 'do nothing' action for one of the systems and we apply the 'do a preventive replacement' action for all other systems.
By this policy, we learn the true population perfectly at  = − and we limit the risk of failure (i.e.high cost of corrective maintenance) of multiple systems to only one system (e.g.risking failure for only one milking machine).After that, the optimal policy is applied.
Proposition 1.The total expected cost under the perfect learning policy for  systems is [ 3 2  − 1 2 ]  + 1 2   .We see that the preventive cost part ([ 3 2  − 1 2 ]  ) increases linearly with the number of systems, but the corrective cost part ( 1 2   ) is a constant function of  and it is shared among all systems.From  = − till the end of the lifespan, the optimal policy can be followed for either the weak or the strong population.
Proposition 2. The total expected cost under Policy III for  systems is 2  .
The total expected cost under Policy III increases with the number of systems.Proposition 3. It is never optimal to apply the 'do nothing' action to more than one system when the initial components reach age  − 1.
Proposition 3 shows that risking a failure is not desirable for more than one system (e.g.risking the failure of two milking machines) under this special case.  .This implies that exploration is only beneficial for large enough values of .As  goes to infinity, this relative difference goes to 1 3 .For very large values of , the relative cost increase, when using the non-optimal Policy III instead of the optimal Policy I (the perfect learning policy in that case), can become 1 3 .

Numerical results
In this section, we present our numerical study to address the research questions introduced in Section 1. First, we determine a base instance which we introduce in Section 6.1.We address research question (1) in Section 6.2 by showing the structure of Policy I and compare it against the structure of Policy III. in Section 6.3, we compare the total expected cost per system under each policy for a test bed of 36 instances in order to answer research questions (2) and (3).Finally, we answer research question (4) in Section 6.4 by executing a sensitivity analysis to study how the total expected cost per system and the relative cost difference between policies change with respect to the input parameters , , p,   and .

Base instance
For our numerical analysis, we assume that the lifetime random variable of each population type has a right-truncated discrete Weibull distribution.This distribution is derived from the well-known continuous Weibull distribution with scale parameter  and shape parameter  by first truncating it at a value  and then by defining the probability mass function . We assume that the shape parameter is the same for both population types but the scale parameters are different.The scale parameter  is equal to 6 for the weak type and it is equal to 12 for the strong type.For the shape parameter, we select  = 5.The value of  is chosen equal to 22; it holds that exp[− (∕)  ] becomes negligible for  > 22 under all the selected values of  and .The mean and coefficient of variation of the lifetime of the weak type are equal to 6.009 and 0.215, respectively.For the strong type, these values are equal to 11.518 and 0.221.This completes the description of the lifetime distributions for the base instance.We set the other input parameters as  = 2,  = 75,   = 0.1,   = 1 and p = 0.5 in the base instance.Notice that for   and   , only their relative difference matters.Hence, we can choose   = 1 during all experiments w.l.o.g.

The structure of Policy I and its comparison against Policy III
In this section, we answer the first research question by investigating the structure of the optimal policy (Policy I), and comparing this against the structure of Policy III.Let  * I (, , ) denote the optimal action under Policy I at state variables  and  and remaining lifespan .
In Fig. 1, we see how the optimal policy structure changes with respect to  and  1 for some fixed values of  2 .The optimal policy for the first system is not affected by the age of the second system for most values of  and  1 (the same behavior has also been observed for all other values of ).As  increases, the optimal action may become 'replace' because the time-to-failure is stochastically smaller for the weak population.Similarly, as  1 increases, the optimal action becomes 'replace'.This is because of the increasing failure rate of the timeto-failure distribution, making a failure more likely as the age of a component increases.
In Fig. 2, we fix the value of  and observe the change in the optimal policy with respect to  1 and  2 .Note that Fig. 2(b) corresponds to the base instance with initial belief 0.5.We see that the optimal action is almost symmetrical for the two systems.For small values of the age of a component, the optimal action is 'do nothing'.After an age limit, it becomes 'replace'.Additionally, for larger values of  the 'do nothing' area becomes smaller and the 'replace' area becomes larger.
In Fig. 3, the change of the structure of optimal actions of Policy I with respect to  is shown for a particular belief state (i.e., when the belief state is equal to 0.5 at each  value).We observe that the optimal actions are the same for  = {50, 60, 75} (see Fig. 2 for  = 75).For  = 20, only a small area differs due to the end of lifespan effect.
In the implementation of our solution approach for the base instance, we note that there are 158,739,675 combinations of the variables , ( 1 ,  2 ) and  (see Appendix B for our solution approach including the details on the discretization of the belief space).Only in 1.9% of these combinations, the action under Policy I differs from the action under Policy III (the cost difference between these policies will be provided in Section 6.3).In order to visualize at which states the two policies differ, we introduce a metric defined as In Fig. 4, we visualize this metric, representing the number of times the Policy I and Policy III take a different action in a particular state  and  over all the possible  values.That is, if the metric at state (, ) is equal to zero, it means that the actions of Policy I and III are the same for all  values at this particular state.In Fig. 4, we see that the part of the state space where the actions of Policy I and Policy III are different is limited to only a limited number of states.In particular, the actions are different around specific age values which constitutes an age-limit between 'do nothing' and 'replace' actions and this limit seems to be different for Policy I and III.

Comparison of the expected cost per system under Policy I against Policy II and Policy III
In this section, we address research questions ( 2) and ( 3) by comparing the expected cost per system associated with each policy against each other.For this purpose, we generate a test bed of 36 instances.For this test bed, we consider the parameter values as follows: p ∈ {0.25, 0.5, 0.75},  ∈ {75, 150},  ∈ {3, 5},   = 1, and   ∈ {0.05, 0.1, 0.2}.We continue to use the truncated discrete Weibull lifetime distribution with the scale parameters as in the base instance.

We let 𝐶 I = 𝑉 𝐿 ( p,𝟎) 𝑛
and  II = II  ( p,)  denote the expected cost per system under Policy I and Policy II, respectively, starting with a new set of components at age zero.For notational convenience, we also introduce  III =  III  ( p, 0) to denote the expected cost under Policy III.Note that  III is not normalized with respect to , as it already represents the expected cost for a single system.In Table 2,  Table 2 shows that, for 34 out of 36 instances, the relative difference  rel II−I between Policy I and Policy II is less than or equal to 0.20%.For the two remaining instances, it is less than or equal to 0.80%.This quantifies the maximum benefit that can be obtained via jointly making the replacement decisions for multiple systems.The relative difference  rel III−II between Policy II and III is less than or equal to 1% for six out of 36 instances.It is greater than 1% and smaller than 2% for 19 out of 36 instances, and it is greater than 2% for the remaining instances.Policy III is up to 5.6% costlier than Policy II.This is the maximum benefit that can be obtained due to data pooling in this test bed.Due to the small difference in the expected cost per system between Policy I and Policy II, the statistics for  rel III−I are similar to  rel III−II .Considering that the structures of the policies are mostly similar for Policy I and Policy III, we can conclude that the reduction in costs by using Policy I instead of Policy III is mostly due to the data pooling rather than jointly making the replacement decisions for multiple systems.

Sensitivity analysis
In this section, we perform a sensitivity analysis around the base instance to answer research question (4).We show how the costs and relative cost differences are affected with respect to varying values of the input parameters , ,   , p and .When we study the effect of a specific input parameter, we assume all the other input parameters are the same as in the base instance.
Effect of the lifespan.In Fig. 5(a), we show how the relative difference in costs changes with respect to .We observe that the relative differences  rel III−I and  rel III−II are the largest for 50 ≤  ≤ 75, and the relative difference  rel II−I is the largest for 35 ≤  ≤ 50.The relative difference between policies gets smaller for the values of  larger than 75.This is intuitive because a long lifespan leads to learning the population type accurately, and beyond some point, the effect of not knowing the true population type disappears (i.e., policies converge to the policy under perfect information setting).
To better see this, in Fig. 5(b), we compare the expected costs under Policy I and Policy III against the cost under perfect information.For a long lifespan such as  = 1000, the costs of both policies approach the cost under perfect information.This means both policies (with and without data pooling) learn the true population type well for a sufficiently large lifespan.We further see that the cost of Policy I  converges to the perfect information cost earlier than Policy III.This can be interpreted as the effect of data pooling under Policy I.
Effect of the coefficient of variation of the time-to-failure distributions.We consider that the shape parameter  of the time-to-failure distribution (which is common for both population types) takes values from the set {3, 3.5, 4, 4.5} in addition to 5, which was the value of  in the base instance.As the shape parameter  increases, the expectation of the time-to-failure distribution increases a little bit while its variance (and hence coefficient of variation) decreases significantly.Table 3 provides the distributional properties of the time-to-failure distributions corresponding to the selected parameters.Fig. 6 shows how the shape parameter  affects the expected cost per system under each policy and the relative cost differences between policies.As  increases (i.e. the coefficient of variation decreases), we observe that the expected cost per system under all policies decreases and the relative difference between the costs of Policy I and Policy III increases (the relative difference between the costs Policy II and Policy III behaves similarly).This also shows that the benefit of data sharing is higher as  increases.The relative difference between the costs of Policy I and II continues to be small and does not vary much.
Effect of initial belief.We choose p ∈ {0, 0.025, … , 0.975, 1} for the sensitivity analysis with respect to the initial belief p. Fig. 7 shows how the expected cost per system associated with each policy and the relative differences between them change with respect to the initial belief on the true population type.We observe that the expected cost per system increases for all the policies as p increases.This is due to the fact that the expected number of failures is higher when the components come from the weak population rather than the strong population.Therefore, it results in more replacement activities and increases the expected cost per system.The relative cost difference between policies III-I and III-II is the largest when the uncertainty for the true type of population is high (i.e., for p around 0.5).This is when the benefit obtained by resolving the population-type uncertainty early (via data pooling) is also high.
Effect of the cost of preventive maintenance.We choose   ∈ {0.05, 0.1, 0.2, 0.25} to study the effect of preventive replacement cost   on the expected costs (see Fig. 8).Naturally, the expected cost under each policy increases as the cost of preventive replacement increases.Fig. 8 also shows that the relative difference in costs between policies II and III and between the policies I and III also decreases as the cost of preventive replacement increases (excluding the case   = 0.05).The instance with   = 0.05 is an exception because policies require a larger lifespan to learn the true population type when the preventive replacement cost is that low (i.e., the preventive maintenance is so inexpensive that the components are always replaced preventively in an early phase of their lifetime, preventing to distinguish between weak and strong population with the historical data).Thus, we end up with three policies that perform very closely.
Effect of the number of systems.We compare the costs associated with the three policies in Fig. 9 for  ∈ {2, 3}.We observe that the expected cost per system under Policies I and II slightly decreases when we go from  = 2 to  = 3.On the other hand, the relative cost differences between policies I and III and between policies II and III increase in this case.
In order to better see how the number of systems affect the decrease in expected cost per system for Policies I and II (the decrease in cost is because of increased data pooling with higher number of systems under these policies), we increase  up to 20.As expected, the state space of the POMDP becomes too large to be able to efficiently solve the Bellman equations characterized in Section 3 in order to obtain Policy I. Therefore, we do not report the performance of Policy I from now on for  greater than 2. Instead, we use the optimal policy under the perfect-information setting (see Section 4.1) as a benchmark to quantify the maximum benefit that could have been obtained by Policy I. Notice that the optimal expected cost in the perfect-information setting can be interpreted as a lower bound on the expected cost under Policy I. Thus, it allows us to quantify the maximum reduction in expected cost per system via data pooling.
We let  PI denote the expected cost for a single system in the perfect-information setting at the beginning of the lifespan and at belief state p and component age zero, i.e.,  PI =   ( p, 0).Furthermore, we introduce the notation   =   −  PI and  rel  =   − PI  PI 100% to denote the difference and the relative difference between the expected cost per system under policy  ∈ {I, II, III} and the expected cost for a single system in the perfect-information setting, respectively.In Table 4, we list these differences for the test bed of Section 6.3.
The relative difference  rel I between Policy I and the perfect information setting is greater than 1% and less than 3% for 13 out of 36 instances.For the remaining instances, it is greater than 3%.The largest   relative difference between the Policy I and perfect information setting is 11.3% at instance 20.This means that the population heterogeneity leads to an 11.1% increase in expected cost (i.e., this can be interpreted as the cost of not knowing the true population type).If Policy III is used instead of Policy I, this additional cost can be up to 15.8%.
The expected total cost of Policy I and II are close to each other in all numerical experiments.However, the size of the discretized state space that belongs to the POMDP model associated with Policy I becomes much larger than the one that corresponds to Policy II as the number of systems increases.We evaluate the expected cost per system under Policy II for  ∈ {5, 10, 15, 20} via simulation and compare it against the expected costs per system under Policy III and the perfect information setting.For this purpose, we perform 5000 simulation runs and report the results in Table 5.We only report the average value from these simulations because the half width of the 95% confidence interval (CI) built around the average value is not greater than 0.00 for any instance.
In Table 5, we see that as the number of systems increases the expected cost per system comes closer to the cost under perfect information.When there are 20 systems, for five out of 36 instances, the relative difference  rel III−II is more than 2% and less than 3%.For 17 of these instances, it is greater than or equal to 3% and less than or equal to 5.3%.For the remaining instances, it is between 5.6% to 14%.That is, the largest benefit obtained by data pooling is for 20 systems and equal to 14%.For 20 systems, when we compare the cost of Policy II against the cost under perfect information, we see that relative difference  rel II is less than 1% for 27 out of 36 instances.It is greater than or equal to 1% and less than or equal to 1.7% for the remaining instances.That is, the cost under data pooling from 20 systems differs from the cost under perfect information no more than 1.7%.As the number of systems increases, the total expected cost per system decreases for Policy II and gets closer to the cost under perfect information.This shows that even though when the replacement decisions are made independently for multiple systems, data pooling is effective in cost reduction.
Note that the maximum cost reduction that can be obtained by data pooling is  III −  PI .We refer to this as potential cost reduction.The actual reduction obtained by data pooling is  III −  II .We introduce the performance measure , defined as    around 70% of the potential cost reduction.This is on average above 90% for 20 systems.
Finally, we show the effect of the number of systems on the expected cost per system in Fig. 11, where all input parameters are the same as in the base instance but the value of  varies.We report the costs for  ∈ {2, 3, … , 20}.In Fig. 11(a), we visualize how the expected cost per system decreases for Policy II as a function of .Clearly, the marginal cost reduction due to data pooling is non-increasing in .In Fig. 11(b), we visualize the value of the percentage of the potential cost reduction achieved by data pooling.
There are two main insights on the benefit of data pooling.First, the Bayesian updates are the same for all policies, therefore the number of data points required for learning the true population would also be same.However, by pooling data from multiple systems, we can obtain the same number of data in a shorter time span.Second, as the number of systems increases, the cost of learning (exploration) per system decreases (similarly to the special case analyzed in Section 5).Therefore, we exploit the learned information regarding the population type by a higher number of systems.

Conclusion
We have studied the optimal replacement policy for multiple singlecomponent systems with a fixed lifespan under population heterogeneity.For this purpose, we built a POMDP model with Bayesian updating.We investigated the benefit of data pooling and jointly making the  component replacement decisions on the expected cost per system.As a benchmark to the optimal policy, we introduced two other policies that allows us to quantify these benefits.We further introduced a policy in the setting where the true population type is known to serve as a bound on the maximum benefit that can be gained by data pooling.For a test bed with 36 problem instances for two systems, the maximum reduction in total expected cost per system that is obtained via joint optimization is 0.80% and the maximum reduction in total expected cost per system that is obtained via data pooling is 5.6%.These results indicate that data pooling is more effective reducing the expected costs than jointly making the replacement decisions.Considering the computational complexity of the POMDP model for multiple systems and with data pooling, applying the policy by considering the optimal policy of a single system but with data pooling is a favorable approach in practice with many systems.We investigated how the reduction in expected cost per system increases as the number of systems increases.For 20 systems, the maximum cost reduction obtained by data pooling is up to 14% (the upper bound for the cost reduction in this particular instance is 15.8%).In future research, the trade-off between data pooling and decoupling the systems (i.e., making the replacement decisions independently) can be investigated for the same problem but with heterogeneous systems (i.e., where systems are not identical).
Investigating the effect of the number of populations on the obtained insights is another possible extension for the current problem.
For our numerical experiments,   = 0.00025 is selected due to fact that the change in approximation compared to   = 0.00005 is sufficiently small (for instance, it is less than 0.3% for   = 0.1,  = 5,  = 200,  = 0 and  = 1).Notice that the rounding error accumulates through backward recursion, therefore, the error is largest for  = 200.

Fig. 2 .
Fig. 2. Optimal action  * I (, , 75) with  = ( 1 ,  2 ).Proposition 4. (i) The relative difference of Policy III with respect to the perfect learning policy is (+1)  −  (3−1)  +  .(ii) If ( + 1) >     , then Policy I is the same as the perfect learning policy.Otherwise, Policy I is the same as Policy III.Proposition 4 shows that the perfect learning policy is cheaper than Policy III if and only if ( + 1) >
we denote the difference  II −  I with  II−I and the relative difference  II − I  I 100% with  rel II−I to compare Policy I and II.Similarly, we define  III−I =  III −  I and  rel III−I =  III − I  I 100% to compare Policy I and III, and we define  III−II =  III −  II and  rel III−II =  III − II  II 100% to compare Policy II and III.

Fig. 4 .
Fig. 4. The number of times Policy I and Policy III take a different action for a particular state  and ( 1 ,  2 ) over all the possible  values.

Fig. 5 .
Fig. 5. Effect of the lifespan length  on costs;  = 75 corresponds to the base instance.

Fig. 6 .
Fig. 6.Effect of the shape parameter  on costs;  = 5 corresponds to the base instance.

Fig. 7 .
Fig. 7. Effect of the initial belief on costs; p = 0.5 corresponds to the base instance.

𝐶
III − II  III − PI 100%, to quantify the percentage of the potential cost reduction obtained by data pooling.We calculate  for each instance based on the average cost obtained from simulation.In Fig.10, we show the distribution of these  values for  ∈ {5, 10, 15, 20}.On average, data pooling from five systems achieves İ.Dursun et al.

Fig. 8 .
Fig. 8. Effect of the preventive maintenance cost on the costs;   = 0.1 corresponds to the base instance.

Fig. 9 .
Fig. 9. Effect of the number of systems  on costs;  = 2 corresponds to the base instance.

Fig. 10 .
Fig. 10.Percentage of the potential cost reduction obtained by Policy II.

Fig. 11 .
Fig. 11.Effect of the number of systems on the expected cost per system for Policy II (left) and on the percentage of the potential cost reduction obtained by Policy II (right).

Table 1
Literature review on (partially observable) Markov decision models under failure-model uncertainty.

Table 2
Comparison of the policies for the instances in the test bed.

Table 3
Properties of time-to-failure distributions (with  equal to 22).

Table 4
Comparison of the costs of Policy I, II and III against the cost under perfect-information setting ( = 2).

Table 5
Comparison of the costs of Policy II and III against the cost under perfect-information setting ( > 2).