Integrated optimization of maintenance interventions and spare part selection for a partially observable multi-component system

Advanced technical systems are typically composed of multiple critical components whose failure cause a system failure. Often, it is not technically or economically possible to install sensors dedicated to each component, which means that the exact condition of each component cannot be monitored, but a system level failure or defect can be observed. The service provider then needs to implement a condition based maintenance policy that is based on partial information on the systems condition. Furthermore, when the service provider decides to service the system, (s)he also needs to decide which spare part(s) to bring along in order to avoid emergency shipments and part returns. We model this problem as an infinite horizon partially observable Markov decision process. In a set of numerical experiments, we first compare the optimal policy with preventive and corrective maintenance policies: The optimal policy leads on average to a 28% and 15% cost decrease, respectively. Second, we investigate the value of having full information, i.e., sensors dedicated to each component: This leads on average to a 13% cost decrease compared to the case with partial information. Interestingly, having full information is more valuable for cheaper, less reliable components than for more expensive, more reliable components.


Introduction
Many operations in industrial and public organizations heavily depend on the functioning of expensive and technically complex capital goods that have a long life time and are used in the primary processes of their users. Examples include lithography equipment in the semiconductor industry, medical imaging machines in hospitals, and radar systems on vessels. Unexpected downtime of capital goods can lead to a significant loss of revenue and it can negatively affect health, safety, and the environment. Therefore, capital goods typically require a lot of maintenance to ensure high availability and reliability, which accounts for a significant part of the overall life cycle costs.
Condition based maintenance (CBM) is a maintenance policy that determines the optimal maintenance moment based on condition monitoring information such as vibration, temperature or power consumption. Applying CBM should help to reduce costs, increase systems' reliability and maximize components' useful life. In some cases, CBM achieves savings of more than 50% on the maintenance costs [45]. Due to its promises, CBM has attracted attention in most industries and it has led to growing attention by researchers from diverse disciplines. Examples include the studies addressing CBM optimization problems in the context of power generation systems [6,37] and heavy vehicles [10]. Recent reviews on CBM are [2,21,22,35]. We review the relevant literature for our problem in Section 2.
In most of the literature, it is assumed that an installed sensor gives information on the condition of one component. However, in practice it may not always be technically or economically possible to install sensors dedicated to each component, which means that the exact condition of each component cannot be monitored, but a system level failure or defect can be observed. In this case, it is difficult to decide when to perform a maintenance intervention. Furthermore, it is difficult to decide which spare parts to bring in case systems are dispersed in the field. This holds, for example, for industrial printers or manufacturing equipment that is serviced by the original equipment manufacturer. For instance, Océ-Technologies B.V., one of the global leaders in industrial printing, faces this problem. Océ has equiped its VarioPrint i300 (VPi300) printers with sensors that allow Océ to collect and analyze data from the printer remotely [13]. Some of this data is related to the condition of components, such as the temperature level in the maintenance box, clogging levels in the filter and ink-heads. Observing a high temperature level in the maintenance box implies a defect in the system, which is caused by either a chiller, a roller, or a safety valve (see 9 for more details). After observing a defect, i.e., a high temperature level, the time-to-failure depends on the component(s) that is (are) defective. As a service provider, Océ needs to predict the exact state of the system from the current observation and the past data, in order to decide when to intervene for maintenance and which spare parts to bring to the machine.
A similar problem can be observed in water purifying systems being used in public water utility companies [44]. For water purifying systems, recirculating gravel filters (RGFs) are identified as the key component. Typically, the condition of this component is not directly observable but can be revealed through an inspection. The level of turbidity is recorded at key stages of the water treatment process to guarantee high water quality, as well as to track the condition of the RGF in use. When the ratio of outgoing to incoming water turbidity is close to zero, the RGF is likely to be in good condition. However, when the ratio is close to one, the RGF is likely to be in a poor condition [44]. The poor condition might appear due to a lack of chemicals used in the RGF, filter clogging, or a mechanical problem in the RGF itself. The actual condition of the RGF should be inferred from the observed turbidity level, and when maintenance is performed it should be decided which equipment to bring to maintain the RGF.
Since the service provider takes the spare part selection decision without knowing the exact deterioration level of each component, there is always a risk of bringing the wrong spare parts to the customer. When the service provider needs a component that has not been brought to the customer, this component can be delivered via an emergency shipment, against a high cost. If the service provider brought a component that was not required, it is returned afterwards. It may seem that no costs are incurred in that case, but the more parts are being carried around, the more parts need to be on stock, which does incur costs. The spare parts selection decision is thus a crucial decision, next to the maintenance timing decision.
Although in practice there exists a need for CBM strategies addressing an integrated maintenance and spare part selection decision for partially observable multi-component systems, we are not aware of any literature on this topic. Our aim is to fill this gap and our main contribution is thus as follows. We propose a partially observable Markov decision process (POMDP) formulation to solve the joint problem of maintenance timing and spare parts selection. The objective is to minimize the expected total discounted cost over an infinite planning horizon. We employ a grid-based solution method [27] to derive the optimal policy. We then perform a numerical experiment in which we compare our policy with two maintenance policies that are often used in practice: a corrective (upon failure) or preventive (upon defect) policy. We show that using the optimal policy instead of the corrective and preventive policies leads to average cost decreases of 15% and 28%, respectively. We observe that the corrective policy is very costly when the corrective maintenance cost is high and/or the deterioration characteristics of the components in the system are significantly different from each other. The preventive policy leads to significant additional cost when the emergency order cost and/or the replacement costs are high. We next consider the case that we are able to observe each component's deterioration level exactly. Through a numerical experiment, we compare the optimal policy of this full information model with the optimal policy of our original model. The results show that having full information leads to, on average, a 13% cost decrease compared to the case where the service provider has only partial information on the components' deterioration levels. Interestingly, having full information is more valuable for cheaper, less reliable components than for more expensive, more reliable components. This is important to know for reliability engineers in the design phase of a system when making decisions on which sensors to install.
The rest of this article is organized as follows. Section 2 reviews the related literature and contextualizes our contribution. Section 3 contains the model formulation. Section 4 gives the results of the numerical experiment and the managerial insights derived from it. Finally, the conclusions and future research directions are provided in Section 5.

Literature review
There exists a lot of literature on CBM; we referred to some review papers in the previous section. Here, we only review the most relevant papers on CBM: single-component models with partial observability and multi-component models. As the joint optimization of CBM and spare part inventory decisions are beyond our scope, the studies in that research stream are not included in our review (we refer to the review paper by Van Horenbeek et al. [41]).
As one of the early studies within the stream of single-component CBM models, [32] address a system monitored with a sensor giving the decision-maker partial information about the system state. The authors reveal that the optimal inspection and replacement policy for the system is in the class of modified monotonic four-region policies [33]. extend the model of [32] by considering an action set including minimal-repair and failure-replacement actions. [28] investigates the problem of scheduling both perfect and imperfect observations and preventive maintenance actions for a multi-state, Markovian deterioration system with self-announcing failures. [3] study maintenance and operation policies that maximize the overall effectiveness of a single-component system with respect to availability, productivity, and quality [19]. address an availability maximization problem for a partially observable deteriorating system subject to random failures, employing a continuous-time Markov model. In [12], the problem of finding the optimal maintenance policy for partially observed systems is addressed, where only a limited number of imperfect maintenance actions can be performed. The authors prove the existence of an optimal threshold-type maintenance policy. Flory et al. [15] develop a condition-based maintenance policy for a deteriorating system with a partially observable environment, where the degradation rate is influenced by the operating environment. Van Oosterom et al. [43] examine a system having multiple spare part types that cannot be distinguished by their exterior appearance but deteriorate according to different transition probability matrices. Abdul-Malak et al. [1] extend the model in [43] by removing some of the restrictions on the systems time-to-failure distribution and considering both repair and replacement actions. Jin and Yamamoto [20] propose a non-stationary partially observable Markov decision model to study the optimal maintenance policy for an aging system with imperfect inspections. Van Oosterom et al. [42] examine the problem of finding the optimal maintenance policy for a safety-critical system and its deteriorating sensor. Nguyen et al. [31] focus on the interest of adjustment of inspection quality in CBM optimization.
The stream of multi-component CBM models consists of only a limited number of papers. Barbera et al. [8] introduce a CBM model considering exponential failures and fixed inspection intervals for a two-component system in series, and derive the optimal solution minimizing the long-run average cost of maintenance actions and failures. Barata et al. [7] employ a Monte Carlo simulation approach to determine the optimal maintenance schedule for continuously monitored deteriorating systems with non-repairable, single-components and multi-component repairable systems. Marseguerra et al. [29] formulate an optimization model with availability and net profit criteria to investigate the optimal CBM policy for a multi-component system, and they come up with a solution algorithm combining Monte Carlo simulation and genetic algorithms. Castanier et al. [11] introduce a stochastic model based on a semi-regenerative process to study the optimal maintenance scheduling of a two-component series system subject to continuous deterioration. Tian and Liao [39] deal with the problem of determining the optimal maintenance policy for a multi-component system whose components are economically dependent, using a proportional hazards model. Hong  investigate the influence of dependent stochastic degradation of multiple components on the optimal maintenance decisions. They conduct an analysis related to the effect of different risk attitudes of the decision maker to the selection of the optimal policy. Zhu et al. [46] study the optimal maintenance policy for a multi-component system with a high maintenance setup cost. The authors evaluate the cost-saving potential of the optimal policy by comparing it with a failure-based and agebased policy. Arts and Basten [5] study a similar problem, but with minimal repairs, allowing them to exactly evaluate policies. Keizer et al. [23] develop condition based maintenance policies for k-out-of-N systems subject to both redundancy and economic dependencies. Li et al. [25] examine a system whose components are both stochastically and economically dependent, using a Lévy copula modeling approach. Özgür-Ünlüakın and Bilgiç [34] assess the performance of two different maintenance optimization procedures for a Markovian deteriorating system under partial observations in a finite discrete time horizon. For a system with homogeneous components that follow the same stochastic degradation process, [24] examine a maintenance scheduling problem where all units in the system are renewed simultaneously. Eruguz et al. [14] extend the model of [32] by considering a setting in which the system contains multiple components. The authors model the system as an infinite horizon POMDP under the discounted cost criterion, but without a spare parts selection decision.

Model description
In this section, we describe our problem and formulate a partially observable Markov decision process.

Problem definition
We consider a system that consists of N critical components. The system is operational as long as all critical components are functioning. The critical components are subject to deterioration during the time that the system is operating. They deteriorate according to a discretetime discrete-state space Markov chain (see, e.g., Giorgio et al. [16], Neves et al. [30], Si et al. [38], and 26, for a detailed discussion on how and why Markov chain models are employed to represent the component degradation and how the necessary parameters are estimated). For each component, there exists a predetermined defect level and a failure level. The component is referred to as being non-defective when its deterioration level is strictly less than the corresponding defect level; the component whose deterioration level is at or above the corresponding defect level but is strictly smaller than the corresponding failure level is called defective. The component is referred to as failed when its deterioration level is at the corresponding failure level.
There is a single sensor on the system that provides partial information about the condition of the system: Sensor information does not indicate the condition of the components, but indicates that a defect or a failure exists in the system. If at least one component is defective (has failed), a defective (failure) signal is observed. When the system has neither a defective nor a failed component the sensor displays a non-defective signal. The exact state of the components can be observed only through a complete and perfect inspection.
As the service provider cannot observe what the exact deterioration level of each component is, she introduces a belief state to determine the maintenance intervention moments. The belief state is a probability measure to estimate the current state of the system based on the signal being observed through the sensor. It is updated after each new signal observation. The belief state evolves according to a discrete-time continuous-state Markov process as the sensor signals directly depend on the components' deterioration levels.
The sequence of events in each period is as follows: At the beginning of each period, the service provider observes a new signal coming from the sensor. Using this new observation, she updates her belief regarding the components' deterioration levels. The service provider then decides whether or not to perform maintenance. She definitely performs maintenance when the sensor displays a failure signal. When the sensor displays a non-defective or defective signal the service provider may choose to intervene preventively. In case maintenance is performed, next the spare part selection decision is taken. Finally, costs are incurred.
The preventive and corrective maintenance interventions take a negligible time. For each corrective maintenance intervention being performed after a failure signal, she pays a fixed corrective maintenance intervention cost. Additionally, for each preventive maintenance intervention, she incurs a fixed preventive maintenance intervention cost. Typically, corrective costs are higher than preventive costs.
When performing a preventive or corrective maintenance intervention, the service provider observes the exact deterioration levels of all components through inspection and replaces each defective or failed component in the system with a new one. Each part replacement incurs a replacement cost. Non-defective components are never replaced even though they may not be as good as new.
For each component in the system, there is always a sufficiently large number of spare parts on stock. When deciding to perform a maintenance intervention, the service provider should also decide on which components to bring to the customer. If a component that has not been brought to the customer but needs to be changed, she employs an emergency procedure to immediately bring the necessary component to the customer. The service provider pays an additional emergency order cost for the relevant component. If a component has been brought to the customer but is not used in the maintenance, the service provider takes the component back to use in another maintenance intervention. She incurs an additional return cost for the relevant component.
The service provider seeks to find the optimal policy, i.e., to decide when to perform maintenance interventions and which spare parts to take along, that minimizes the expected total discounted maintenance cost over an infinite time horizon.

The POMDP model
The set of components is represented by = N {1, 2, , }. Each component has a finite number of deterioration levels. The deterioration level of component i is represented by , the state numbers are ordered to reflect the deterioration level in an ascending order, i.e., state 0 represents the perfect working condition and state F i represents the failed condition.
For each component i , there exists a predetermined defect level, Δ i where 0 < Δ i < F i . Based on the corresponding defect levels, the system components are classified into three groups: non-defective, defective, and failed. Component i is classified as being non-defective, defective or failed when its deterioration level is

Core states
The set of core states consists of all possible states that the system can be in, i.e., = S i i is a product of totally ordered sets where the i th element in the vector represents the deterioration level of component i.
The components deteriorate according to a discrete-time discretestate space Markov chain with an | |-by-| | dimensional one-step transition probability matrix Q. More specifically, the element q s s , in the transition matrix describes the one-step transition probability from s to s′. In order to avoid technical complications, all transition probabilities are assumed to be stationary over time.

Remark 1.
Consider the special case that the deterioration process of each component i evolves according to an independent discretetime discrete-state space Markov chain with an is as follows: Note that it is possible to apply the proposed model to a system with stochastic dependence among the components since we consider an arbitrary matrix Q to model the components' deterioration processes.

Observations
The system is periodically monitored through a sensor providing only partial information on the components' degradation levels. The possible outcomes coming from the sensor are denoted by θ ∈ Θ where = {0, 1, 2} and 0, 1, and 2 represent non-defective, defective, and failure signals, respectively. To visualize how the sensor works, an illustration for a system with two components is given in Fig. 1.
The system sensor works perfectly, as a result of which each core state s can be matched with one of the observation states. The set of core states can thus be defined as three disjoint sets: Note that = and = . This structure also implies that the conditional probability of monitoring signal θ given that the core state is s equals 1.

Belief states
The set of all possible belief states composes the state space of the problem. We denote the belief state by = ( , . , ) 1 2 | | where π s represents the probability of the system being in core state s . As each core state leads to a unique observation signal θ, the set of belief states can be described with three disjoint sets Π θ . That is, for a given signal θ ∈ Θ, one can define a unique set such that: where + is the set of non-negative real numbers. We can describe the belief space as = . For a given belief state , the probability of observing θ ∈ Θ in the subsequent period is: If the observation being made in the subsequent period is θ ∈ Θ, belief state is updated to T( , ). The s′ th argument in the vector

Actions
At the beginning of each period, the service provider observes a signal coming from the sensor and decides whether or not to visit the customer for maintenance. When the sensor displays a failure signal, she definitely performs a corrective maintenance intervention; when the sensor displays either a non-defective or defective signal the service provider may choose either to perform a preventive maintenance intervention or not. The possible maintenance actions in belief state are thus described as follows: In Eq. (8), the decisions of performing and not performing maintenance are represented by = a 1 and = a 0, respectively. If the service provider prefers not to perform maintenance, she will not change any component in the system. That is, the spare part selection decision is irrelevant. On the other hand, if the service provider decides to perform a maintenance intervention, she needs to determine which spare parts to take along to the customer. We denote the spare part selection decision by a binary vector = g g g g ( , , , ) where g i is 1 if the corresponding component is brought to the customer and 0 otherwise. Accordingly, in case the service provider decides to perform a maintenance intervention, there exist 2 N different options regarding the spare part selection decision: When performing a preventive or corrective maintenance intervention, the exact deterioration levels of all components are revealed through perfect inspections. Each defective or failed component found in the system is replaced with an as-good-as new component. It is possible to use our model to capture structural dependencies among the components, with several basic changes in the action set and the component replacement rules.

Cost functions
The service provider incurs a fixed preventive maintenance intervention cost C p for each maintenance intervention being performed after a defective or non-defective signal whereas she incurs a fixed corrective maintenance intervention cost C c for each maintenance intervention being performed after a failure signal, with C c ≥ C p . So, the fixed cost function for the maintenance intervention actions is: Since we consider fixed costs for maintenance interventions, economic dependence is captured with our model. The service provider pays a replacement cost, C , i r when she replaces component i with a spare part. While performing a maintenance intervention, the service provider may need a spare part that has not been brought to the customer. In such a case, she employs an emergency order procedure with zero lead time to immediately bring the spare part and incurs an emergency order cost, C e . She also pays a return cost, C b , for each spare part that has been brought to the customer but is not used in the maintenance. Accordingly, the variable cost function is defined as: .
In Eq. (11), {.} is an indicator function that returns 1 when the given condition holds and 0 otherwise.

Value function and operators
Let V ( ) n be the value function denoting the minimum expected total discounted cost using the optimal policy when there are n ≥ 0 periods left. We set the initial value function V ( ) 0 to 0. We describe the operators of which V ( ) n is composed. Operator Γ 0 denotes the action of not performing maintenance: We consider a discount rate γ with 0 < γ < 1 so that any cost incurred in a subsequent period is discounted by this factor. Let s ( ) 0 denote the core state after the inspection and replacement actions are performed in state s . So, the ith element in this vector can be defined as follows: Let u s ( ) be the | | dimensional unit vector with 1 on the s ( ( ))th element. If the system is in core state s , the belief state becomes u s ( ) after the maintenance intervention. So, the optimal spare part selection action when the service provider decides to perform maintenance is : Operator Γ 1 denotes the maintenance intervention action: Using operators Γ 0 and Γ 1 , the value function can be expressed as: Our problem has a finite action space, includes strictly positive and bounded costs, and is discounted. From the standard argument of the theory of contraction mapping, the problem given in Eq. (16) converges to a solution function V ( ) as n tends to infinity and there exists an optimal deterministic stationary policy for the considered problem [see, e.g.,Ohnishi et al. [32], Puterman [36]. Thus, the problem can be solved by a successive approximation procedure such as value iteration.

Numerical experiment
This section summarizes our numerical experiment to assess, first, how system characteristics affect the value of using the optimal policy and, second, the value of having full information. For the analysis, we employ three different benchmark policies that are introduced in Section 4.1. Section 4.2 presents the setup we considered. Sections 4.3 and 4.4 report our numerical results for 2-component systems. Section 4.5 illustrates our approach for 3-component systems.

Benchmark policies
To examine the impact of system characteristics on the value of using the optimal policy, we consider two naive benchmark policies, a corrective (CP) and a preventive policy (PP). Under CP, the service provider performs maintenance only upon observing a failure signal; otherwise, she does nothing. Under PP, the service provider intervenes for maintenance when she receives a defective signal from the system. Under both policies, the service provider determines which spare parts to bring to the customer by solving the spare part selection decision problem as she does in the original problem formulation.
We employ the grid-based solution method proposed by [27] to obtain the optimal policy and to evaluate CP and PP. We note that our solution algorithm suffers from the curse of dimensionality (see Appendix A). Developing an efficient algorithm to solve large problem instances is not in the scope of our paper. Therefore, we limit ourselves to 2-component and 3-component problem instances in our numerical experiment.
To analyze the impact of system characteristics on the value of having full information, we consider a case where the service provider has sensors that provide information about the exact deterioration level of each component in the system, the full information policy (FI). Since the components' exact deterioration levels are completely observable, the FI case can be formulated as a standard Markov decision process and solved with the value-iteration algorithm [see, e.g., Puterman [36]].
Solution algorithms have been coded in C++ and are run on a supercomputer with QEMU Virtual CPU clocked at 2.30 GHz with 6 cores, and a total RAM capacity of 8.00 GB for 2-component problem instances and with 12 cores and a total RAM capacity of 20.00 GB for 3component problem instances.
We use the following performance indicators, respectively, in order to asses the value of using the optimal policy compared with a particular benchmark policy and the value of having full information: where TC m P is the total discounted cost obtained for problem instance m with the use of the corresponding benchmark policy P, TC m FI is the total discounted cost obtained for problem instance m with the use of the full information policy, and TC m O is the total discounted cost obtained for problem instance m with the use of the optimal policy with a single sensor. These indicators imply that for comparisons with CP and PP (for comparisons with FI), the higher RD the higher the value of using the optimal policy (the higher the value of having full information).

Setup
In this section, we present our setup for 2-component systems. The conversion to 3-component systems are explained in Section 4.4.
We set the preventive maintenance cost as = C 100 p and the corrective maintenance cost parameters as in Table 1. We consider a return cost of 30. We evaluate three alternatives for the emergency order cost, see Table 2. With the considered setup, we ensure that emergency order and return costs do not exceed the preventive or corrective maintenance costs, as expected in practice. Emergency order costs are nonstrictly higher than return costs as they include costs of delaying the maintenance or replacement. We consider three alternatives for each component's replacement cost so there exist nine different combinations, see Table 3. We set the replacement cost parameters such that they reflect different possible cases in practice: These costs can be both higher and lower than the preventive and corrective maintenance costs and the emergency order costs. We set the discount rate as = 0.95.
For each component, we consider failure levels = = F F 3 1 2 . The defect levels considered are given in Table 4. First, we consider the case where the components deteriorate independently according to discretetime discrete-state space Markov chains. We use Remark 1 to construct the transition matrix Q. For each component, we consider three different alternatives representing reliable, fair, and unreliable components. To avoid repetitive instances, we treat only the combinations given in Table 5. The considered deterioration rates are based on our observations from real-life cases.
Overall, to examine the value of having full information and value of using the optimal policy under the non-correlated degradation processes, we perform a full factorial experiment with × × = 5 3 2 3240 4 3 different instances. We thus provide a comprehensive numerical experiment that well represents systems in practice. The results related to this numerical experiment are reported in Tables 7-12. Second, we examine how the correlation between the components' deterioration processes affects the value of using the optimal policy and the value of having full information. We consider five different correlation coefficients ρ: 0, 0.1, 0.2, 0.4, and 0.8, as shown in Table 6. Since, in practice, it is unusual to observe a negative correlation between the components' deterioration processes, we only consider positive correlation coefficients.
We develop a procedure to form the transition matrices in such a way that the specified correlation coefficients are obtained (see Appendix B). The marginal transition matrices for each component and the correlation coefficient are utilized as inputs. The procedure allows the correlation coefficient to be set to a specified level, by fixing the marginal transition matrices. As such, we are able to observe the direct effects of the correlation coefficient.
The alternatives for the component transition matrices presented in Table 5 are given as the input variables to the procedure. If the difference between the components' deterioration rates is large, it is not always possible to create a transition matrix having a high correlation coefficient. More specifically, for Reliable-Fair and Fair-Unreliable (Reliable-Unreliable), we cannot define a transition matrix having a correlation coefficient ρ > 0.4 (0.2); Table 6 shows all combinations of correlation coefficient and marginal transition matrix that we thus incorporate. For each given combination, we consider a full factorial experiment. We thus generate 3240 problem instances for ρ ∈ {0, 0.1, 0.2}, 2700 for = 0.4, and 1620 for = 0.8. The results are reported in Table 7 and in Fig. 2.

Value of using the optimal policy
Our numerical experiments show that when the components have independent deterioration processes, the cost decreases obtained by using the optimal policy instead of CP and PP are on average 15% and 28%, respectively. The minimum and maximum cost decreases achieved with the use of the optimal policy instead of CP are 0% and 80%; the minimum and maximum cost decreases obtained with the use of the optimal policy instead of PP are 0% and 74%. Moreover, the cost differences remain relatively stable for different correlation values (see Table 7). When correlation increases, components' degradation processes get similar, leading to better performance for all policies (see Fig. 2). Table 8 shows that if the cost ratio of preventive maintenance to corrective maintenance is low, the benefit of using the optimal policy instead of CP increases, while the benefit of using the optimal policy instead of PP decreases. However, even with very high corrective maintenance cost, PP still gives an average additional cost of 8%. This is a result from low defect levels: When the defect levels for both components are 1, PP performs preventive maintenance interventions in states (1,1), (1,0), and (0,1). However, many of the maintenance interventions performed in these states are unnecessary because in states (2,1), (1,2), and (2,2), the components are still functional. Therefore, under the optimal policy, the preventive maintenance interventions are performed when the probability of being in these states are positive, i.e., the corresponding belief state elements are greater than zero. When the defect levels for both components are 2, the difference between the optimal policy and PP does converge to 0. This also explains why the additional costs of using PP instead of the optimal policy decreases when the components' defect levels increase, which Table 9 shows.
By definition, changing the components' defect levels does not affect the cost of CP. On the other hand, since increasing defect levels leads to a decrease in the number of defective states that the system can be in, the service provider's belief on the system state becomes more accurate. This yields a significant cost reduction for the optimal policy. As a result, the relative cost difference between the optimal policy and CP increases, as shown in Table 9.
In order to avoid high emergency order costs due to ineffective spare part selection decisions, the optimal policy performs preventive maintenance interventions when the service provider is almost sure which components are defective. In this case, the optimal policy resembles CP. Therefore, when the emergency order costs increase, the cost difference between the optimal policy and CP decreases whereas the cost difference between the optimal policy and PP increases Table 10.     Table 11 shows that the positive impact of employing the optimal policy instead of CP decreases when the replacement cost for one of the components increases. As the replacement cost increases, the service provider would prefer not to implement preventive maintenance interventions. It causes the optimal policy to resemble CP, leading to a decrease in the cost difference between these policies. Moreover, the positive impact of using the optimal policy instead of PP increases as the replacement cost for the unreliable component increases. Under PP, the preventive maintenance interventions are performed in case of a defective signal, leading to a large number of component replacements. Therefore, the relative cost difference between PP and the optimal policy is higher for expensive components.
In Table 12, as the difference between components' deterioration characteristics increases, the positive impact of using the optimal policy instead of CP increases whereas the positive impact of using the optimal policy instead of PP decreases. With an increase in the difference between the components' deterioration rates, the risk of going to a failure state after receiving a defect signal increases. The optimal policy avoids this risk by performing preventive maintenance interventions in early defective states. So, the optimal policy resembles PP and the cost difference between these two policies decreases. The same effect also leads to an increase in the cost difference between the optimal policy and CP.

Value of having full information
When the components deteriorate independently, having full information leads to, on average, a 13% cost decrease compared to having partial information with the minimum and maximum cost decrease of 0% and 51%, respectively. Moreover, the existence of a correlation between the components' deterioration processes affects the value of having full information. There exists a slight downward trend in the value of having full information with an increase in the correlation coefficient (see Table 7). As the correlation increases, both components will have increasingly similar deterioration characteristics. This makes the system similar to a single component deteriorating system so that the service provider's beliefs regarding the system condition get more accurate. As a result, having more information about the components' deterioration levels does not bring a lot of value to the service provider to plan maintenance interventions. Table 8 shows that the positive impact of having full information increases as the corrective maintenance cost increases. When the ratio between the preventive and corrective maintenance costs is close to 0, failures are very costly. Having more information on the components' deterioration levels would help the service provider to exploit each component's lifetime and maintain the components just in time.
We observe that the benefit of having full information decreases with an increase in the components' defect levels (see Table 9). In this case, the number of defective states in the system decreases and hence, system condition information gets more accurate. Therefore, having more information about the components' deterioration levels does not bring a lot of value to the service provider to plan maintenance interventions. Table 10 shows that the benefit of having full information increases when the emergency order cost increases. Having full information about the system would help the service provider to improve spare part selection decisions and to avoid high emergency order costs.
As shown in Table 11, the positive impact of having full information decreases when the replacement cost for one of the components increases. With an increase in the replacement cost, performing preventive maintenance interventions is getting more expensive thereby decreasing its benefit for the service provider. In such a case, the service provider prefers not to perform preventive maintenance interventions frequently. Thus, having more information about the components' deterioration levels would not bring a lot of value. Table 12 shows that as the difference between the components' deterioration rates increases, the value of having full information increases first, and then decreases. More specifically, for both components, the mean times to failure (defect) are almost the same when they have similar deterioration characteristics. Therefore, joint maintenance Note: Each transition matrix defined above is a 3-by-3 matrix. Except for the elements specified above, all other elements are zero.

Table 6
Alternatives for the correlation between the components' deterioration processes.
Correlation Coefficient (ρ)  interventions are cost-effective, i.e., the service provider can save on fixed maintenance costs substantially. Moreover, since both components are likely to fail/defect in the same period, the service provider would bring both components to the customer when performing a maintenance intervention. In this case, component returns are unlikely. When the difference between the components' deterioration rates increases slightly, the likelihood of performing a joint intervention decreases, leading to an increase in fixed maintenance costs. Besides that, the risk of bringing the wrong component to the customer increases. With more information on the components' deterioration levels, the service provider is able to avoid these risks and to reduce the relevant costs. As a result, the value of having full information increases when the components' deterioration characteristics shift from Reliable-Reliable to Reliable-Fair or from Unreliable-Unreliable to Fair-Unreliable. When the difference between components' deteriorating rates increases considerably, performing joint maintenance interventions is not costeffective anymore. The is mainly because the mean times to failure/ defect are very different for the two components. This leads to an increase in fixed costs. However, since, most of the time, the unreliable component is the reason for performing maintenance interventions, the service provider does not have difficulty in selecting the correct spare part. It implies that the risk of bringing the wrong component to the customer decreases. In such a case, having full information on the components' deterioration levels does not significantly help the service provider to reduce the relevant costs. Therefore, the value of having full information decreases when the components' deterioration characteristics shift from Reliable-Fair to Reliable-Unreliable or from Fair-Unreliable to Reliable-Unreliable.

Illustrative examples for 3-component systems
We extend our numerical experiment to 3-component systems in order to illustrate the impact of components' reliability and cost differences on the value of the optimal policy and the value of information. In particular, we consider three systems: • System 1: identical, unreliable, cheap components; • System 2: identical, reliable, expensive components; • System 3: non-identical components.
Input parameters we consider are in accordance with the setup given in Section 4.2. System 1 has Unreliable components with Low replacement costs. System 2 has Reliable components with High replacement costs. System 3 consists of non-identical components, i.e., one Unreliable component with Low replacement cost, one Fair component with Medium replacement cost and finally, one Reliable component with High replacement cost. Degradation processes of the components are uncorrelated. Each system has three deterioration levels, i.e., = = = F F F 2 1 2 3 which implies that the optimal policy is either CP or PP. We consider High corrective maintenance cost and Medium emergency order cost as expected in a realistic setting.
For . Setting the grid resolution at = M 10, we obtain more than 13 Table 8 The impact of fixed corrective maintenance cost on the optimal policy performance under non-correlated degradation processes.  Table 9 The impact of defect levels on the optimal policy performance under non-correlated degradation processes.

Table 10
The impact of emergency order cost on the optimal policy performance under non-correlated degradation processes.

Table 11
The impact of replacement cost on the optimal policy performance under non-correlated degradation processes.  Table 13 shows that the optimal policy is PP when components are cheap and unreliable (System 1) and CP when components are expensive and reliable (System 2). This can be explained by the trade-off between preventive and corrective part replacements. When components are non-identical, PP is optimal due to the existence of a cheap and unreliable component in the system (System 3).
We observe that the value of information is low (2.02%) when components are expensive and reliable. Since the optimal policy is either CP or PP, the value of information stems from optimizing spare parts selection decisions. When components are expensive, replacement costs dominate the cost of emergency shipments and returns. Hence, having full information does not bring significant benefits.

Conclusion and future research
We study an integrated maintenance and spare part selection decision for a partially observable multi-component system. The components deteriorate according to a discrete-time discrete-state space Markov chain. There is a single sensor on the system, which does not indicate the condition of each component, but it indicates if a defect or failure exists in the system. The service provider needs to infer the exact state of the system from the current condition signal and the past data, in order to decide when to visit the customer for maintenance and which spare parts to take along.
For this problem, we propose a POMDP formulation and employ a grid-based solution method to find the optimal policy. We conduct an extensive numerical experiment to assess how system characteristics affect the values of using the optimal policy and of having full information. On the basis of this experiment, we provide both researchers and practitioners a new understanding of how the performance of the optimal policy changes compared to two slightly naive policies, a corrective (CP) and a preventive policy (PP). Specifically, we find that using the optimal policy instead of CP and PP results in average cost decreases of 15% and 28%, respectively. The results further indicate that having full information on the components' deterioration levels leads on average to a 13% decrease in the cost obtained with the partial information policy. We observe that the service provider needs less information to manage the system effectively when the deterioration characteristics (i.e., the reliability) of the components in the system are very similar to or significantly different from each other. We also find that as the correlation between the components' deterioration processes increases, the value of having full information decreases and the cost performances of all policies improve. Interestingly, having full information is more valuable for cheaper, less reliable components than for more expensive, more reliable components. This is an important insight for reliability engineers in the design phase of new systems.
Our model considers economic and stochastic dependencies among components in series, and it can easily be adapted to parallel and seriesparallel systems, as well as to capture structural dependencies among the components. For instance, for an n-component parallel system with a single sensor, the sensor might be such that it displays a defect signal when more than Δ components are failed (0 < Δ < n) and a failure signal when n components are failed. It would be possible to capture this problem with our model after re-defining the action set for the spare parts selection decisions (i.e., by including the number of spare parts to be taken along). Similarly, it is also possible to use our model for k-out-of-n systems. Additionally, our model is capable of capturing structural dependencies among the components if component replacement rules and spare parts selection decisions are adjusted accordingly. Hence, practitioners can use our model to study a very broad range of real-life maintenance problems and to derive insights.
Our work can be extended in several ways. First, the uncertainty in components' reliability as well as the imperfectness in the relation between sensor information and components' actual condition can be incorporated into our problem. This requires a thorough understanding of the system and physical failure behaviour of the components [see Tinga and Loendersloot [40]]. Second, efficient heuristics are required in order to deal with the curse of dimensionality. Such heuristics could be based on machine learning and artificial intelligence algorithms, i.e., Q-learning, reinforcement learning, and neural networks [see Andriotis and Papakonstantinou [4], Jansen et al. [18], Özgür-Ünlüakın and Bilgiç [34]]. Third, we assume that there exists a sufficiently large number of spare parts on stock at all times. Extending this work to a setting with inventory decisions would allow us to examine the impacts of inventory decisions on the system. Fourth, we assume that the service provider replaces all defective components in the system when performing a maintenance intervention. This may in practice not be an Table 12 The impact of transition matrices on the optimal policy performance under non-correlated degradation processes.  effective way to reduce the cost because the service provider can further utilize some of the defective components for a while and change them in the subsequent maintenance interventions. Incorporating such a decision into the current model would be challenging because of its computational complexity. However, modeling this and developing a fast algorithm to solve this model would be an interesting topic for future research.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.  2 . To determine these probability values, we need to construct four unique equations. The first one comes from the fact that the sum of each row should be equal to one. The procedure should allow the correlation coefficient to be set at a specified level while keeping the marginal transition matrices the same. Therefore, the second and third equations are set in such a way that each joint probability distribution we find should give back the marginal distributions we use as inputs in the procedure. The fourth one is set by using the correlation formulation. For all s 1 ∈ S 1 ∖{F 1 }, and s 2 ∈ S 2 ∖{F 2 }, the explicit forms of the equations are given as follows, respectively. The vector on the right in the above equation represents the unknown variables and gives us the corresponding row in the joint transition matrix.

P P E[ , ]
s s 1 2 1 2 is the covariance of the corresponding row in the joint transition matrix and its explicit form is: By solving Eq. (B.7) for each row with the given marginal transition matrices and correlation coefficient, we can describe the rows of the joint transition matrix one by one thereby constructing the whole matrix. Thus, we can create a joint transition matrix having the specified correlation coefficient and satisfying the given marginal transition matrices.