A framework for designing a degradation-aware controller based on empirical estimation of the state–action cost and model predictive control

Controlling the machine’s state of health (SoH) increases the accuracy of the remaining useful life estimation and enables the control of the failure time by keeping the system operational until the desired maintenance time is reached. To achieve system reliability through SoH control, the system controller must consider the impact of its actions on other parameters, such as degradation. This article proposes a structure for designing degradation-aware controllers for systems with available physical models. A system using this approach can learn autonomously, irrespective of the system’s physical structure and degradation model, and opt for control actions that enhance the system’s reliability and availability. To this end, first, a method is proposed to compute the cost associated with the actions taken by the controller. Second, a new cost function is introduced that incorporates the costs associated with degradation into the cost function utilized in model predictive control. In the third step, dynamic programming and deterministic scheduling are used to calculate the optimal action based on the defined cost function. Finally, the proposed control method is validated through simulation, demonstrating its ability to effectively manage machine degradation and achieve optimal performance according to production and maintenance plans.


Introduction
In the conventional manufacturing paradigm, the optimization of production and maintenance activities have each been approached separately [1][2][3].Nevertheless, the advent of intelligent industry has brought about a paradigm shift, wherein the joint optimization of production and maintenance in the manufacturing process encompasses diverse criteria and constraints [4][5][6][7].Accomplishing this joint optimization is challenging due to the dynamic nature of the manufacturing process and the intricate interdependencies among various parameters.For instance, the degradation rate and maintenance schedules are influenced by factors such as the production rate and quality, while the final product quality can be affected by the chosen maintenance policy [8][9][10].Nonetheless, the significance and necessity of this research field are underscored by the substantial 40% increase in profit achievable through the implementation of joint optimization strategies [11].
Two key tasks must be undertaken to achieve a high level of reliability and certainty in the joint optimization of production and maintenance.First, production and maintenance activities should be planned considering the desired production cost and relevant constraints (high-level decision-making).Second, control over the production process itself is crucial to ensure the continuous availability and reliability of manufacturing machines throughout planned production On the other hand, certain studies have explored enhanced flexibility in various production parameters, leading to the introduction of more adaptable optimization techniques.For instance, [16] presents a method that dynamically adjusts performance levels and mission abort decisions based on real-time degradation data to mitigate system failure risks in safety-critical systems.Similarly, [17] proposes a hybrid predictive maintenance strategy that integrates the degradation model with performance adjustments, emphasizing the importance of optimizing production efficiency while considering equipment health.In another study, [18] introduces a condition-based maintenance approach that leverages real-time data to optimize the balance between production rates and system reliability, providing a practical solution for managing degradation in continuous production environments.Finally, [19] explores the impact of adaptive control strategies on machine health, demonstrating that dynamic adjustments in performance can significantly extend machine lifespan while maintaining production quality.Through these diverse approaches, the ultimate aim is to achieve reliable control and effective degradation management in complex manufacturing systems.
Considering the recent advancements in the integrated optimization of production systems, it becomes evident that two primary approaches exist for achieving the joint optimization of production and maintenance and effective management of reliability.The first approach, illustrated in Fig. 1(a), lacks flexibility, as it assumes known and fixed values for all production variables.Consequently, this approach yields a single optimal production point based on factors such as production quality, rate, and maintenance costs.In contrast, the second approach, depicted in Fig. 1(b), allows for variations in the production plan, quality, or speed.This flexibility provides a certain degree of control or freedom in optimizing production and maintenance processes.However, considering the dynamic nature of the production environment, along with ever-changing priorities, costs, and conditions, achieving the most flexible optimization necessitates adaptable production planning supported by field machine actions, as demonstrated in Fig. 1

(c).
There are three main challenges in supporting the joint optimization's decisions by field machine actions.First, most state-of-art methods rely solely on mathematical and statistical models for the system.The use of mathematical and statistical models severs the link between the system's structure and its degradation as two distinct physical phenomena and disregards the physical interpretability of the results, which is necessary for control purposes.Second, existing approaches attempt to fit the degradation of the system into predetermined mathematical or physical models, which is a strong assumption, as degradation patterns may vary across different parts of the system or may not conform to recognizable mathematical forms [20,21].Third, by employing machine learning methods, the connection to the system's underlying physics is not established, resulting in the loss of valuable information regarding the specific type of degradation and the eventual fault type [22][23][24][25].These three challenges prevent the establishment of the link between the system's structure and its degradation, treating them as separate entities.Consequently, while these methods and their corresponding assumptions reduce the implementation costs by avoiding the need for costly physical modeling, they impede the establishment of causal relationships through physical reasoning.This approach leads to the latent assumption that the system's reliability results from unknown or, at best, partially known processes that can merely be approximated rather than enhanced or improved.Consequently, such systems cannot actively support critical decisionmaking processes, such as the deferral or scheduling of failures or the maximization of production rates based on the machines' health status.
It can be seen that a trade-off arises between the interpretability of the outcomes in terms of physical understanding and the generality of the techniques employed, impeding the seamless alignment between high-and low-level methods and hindering the online optimization of production and closely controlling the shop-floor action to follow the optimal integrated plan.
Given the issues arising from the contrast between high-and lowlevel approaches, the main question is whether it is possible to support decisions by controlling the degradation in machines so that they stay reliable and available and reach their desired maintenance type at the appropriate time.Moreover, a universally applicable method must be capable of detecting system degradation without forcing it to predefine mathematical models.In addition, the cost of implementing the model should be considerably lower than that of physical modeling while maintaining its physical interpretability, so the controller can use it to optimize the process.
In this article, an innovative approach is proposed for the design of systems or controllers that are degradation-aware.This method is designed to maintain output quality while effectively managing the lifespan of the machinery.The contributions are as follows: First, a novel methodology is introduced for calculating state-action costs (SAC), capable of estimating the degradation cost associated with each potential action of the machine.This approach is unique as it does not rely on preconceived models of machine degradation.Instead, SAC is empirically derived from actual run-to-failure data.Second, degradation cost is incorporated as a new element in the production cost function.This integration transforms traditional controllers into degradation-aware or cost-aware controllers, enhancing their operational efficiency.Third, a solution is proposed for optimizing the cost-aware cost function using model predictive control (MPC).The efficacy of this method is thoroughly assessed across various configurations and settings.Finally, a strategy for production planning is presented, recommending optimal production, taking into account factors like desired quality, rate, time, and reliability.
This approach represents a pioneering step towards integrating reliability scheduling and control in constrained production planning within manufacturing systems.These advancements will assist decisionmakers in forecasting machine degradation more accurately and planning production more reliably.Furthermore, the flexibility offered in managing various production parameters lays the groundwork for fully automated manufacturing.The remainder of the paper is organized as follows.The methods section discusses how the controller can be made aware of the impact of its actions on degradation.Next, the simulation section explains the models used to validate the method, followed by the results section, which presents the simulation outcomes, confirming the proposed method's effectiveness.Then, the advantages and disadvantages of the method are discussed.Finally, the conclusion is presented in the last section.

Methods
The structure of the method for controlling the high-cost actions of the system from the degradation point of view is shown in Fig. 2. In this section, this method is explained step by step.The first step involves exploring the MPC's structure and optimal control and is followed by a discussion of how the optimal control policy changes when the degradation control is included.The third step is the generation of the state-action cost.Subsequently, the controller's structure is modified to filter out high-cost actions to extend the machine's lifespan.Finally, an optimal control strategy based on the predictive degradation cost of the production plan is proposed such that the planned production is finished before the system requires maintenance.

Model predictive control
For a controller to be suitable for use as a degradation controller, it must fulfill two key criteria.First, it should possess a certain level of adaptability that allows the incorporation of the most recent production priorities into the control policy at each moment.Second, it should ensure stability while also being capable of efficiently handling multivariable models.
MPC is a type of controller that aims to minimize the difference between a system's actual and desired output.It achieves this by computing the optimal solution to a general cost function at each time step based on predictions of the system's input and output for a limited and defined period, known as prediction and control horizons.Considering the state-space model [26] { ( + 1) = () + () +  1 () where  is the system state(s),  represents system input(s),  is the control parameter,  and  are the nominal system's parameters,  is the output gain,  is the disturbances gain,  1 is the disturbance, and  2 is the measurement noise.Various cost functions can be used as the control objective for MPC.Among them, the quadratic cost function, a commonly utilized option, can be mathematically expressed as where   is the prediction horizon,   is the control horizon, ŷ is the predicted controlled output,  is the desired output,  û is the predicted control increment, and  1 and  2 are penalty matrices.The working principle of MPC can be seen in Fig. 3. MPC is designed based on the initial system model, incorporating nominal parameter values.In contrast to the continuous Riccati equation used by linear quadratic regulators to optimize the solution for the infinite horizon, MPC optimizes the solution for a finite horizon at every instant.
Various methods exist for implementing model predictive control [27].However, the most comprehensive strategy involves utilizing the state-space model.Denoting x( + |) for the state estimate at time  +  given the state at time , based on () and (), and considering  1 and  2 are unpredictable, the estimate of the future state is calculated using Here, ŷ is a linear function of x and .Thus, in principle, where   and   are known and Considering (2), the quadratic criterion to minimize becomes However, to include the effect of other costs (e.g., degradation), another term should be included in (2): where   is the degradation control horizon and f is the predicted cost of the degradation.In this manner, the control cost function incorporates an additional term in comparison to the conventional quadratic cost function.The newly formulated cost function evaluates its actions based on three distinct parameters: the control parameter error, which signifies the output quality or rate, or both, depending on the system; the input or energy usage; and the degradation cost, which quantifies the extent of deterioration inflicted upon the overall system health by a specific action applied under certain states.Considering that the  () represents the total cost of a particular state-action combination, which is composed of three mentioned costs, using relevant penalty matrices, the entire procedure can be fine-tuned during each cycle in accordance with manufacturing priorities and prevailing conditions.Thus, degradation-aware control is a control strategy designed to optimize system performance by incorporating the cost of degradation into the control process.Unlike conventional controllers that focus primarily on achieving desired performance and energy efficiency, a degradation-aware controller integrates an additional term into the cost function to account for the degradation of the system, as described in (7).This involves a control cost function that evaluates actions based on three distinct parameters: the control parameter error (indicating output quality or rate), the input or energy usage, and the degradation cost f , which quantifies the degradation imposed on the system's health by specific actions in particular states.By including a degradation control horizon,   , the controller predicts the degradation cost over a specified period, allowing it to make more informed decisions that balance performance, energy consumption, and long-term health of the system.

Optimality of control
Given that each action has its corresponding cost, optimizing the system according to the cost function that includes factors beyond the system's error and output, such as (7), requires the payment of the additional cost.Considering the limitations of the control systems, this cost can only be paid in one of the following forms: energy, output quality, output rate, or degradation sharing (load sharing).In other words, by adding another term to the optimization cost function, the optimal point of (2) is no longer the global optimum, and the third term in (7) has changed the optimality criteria; that is, with a high probability, the system output changes.This change in the output is the cost that is being paid for the lower degradation rate in the system, and the decision about which parameter to change as the payment for the cost of degradation control is defined using the relations between  1 ,  2 , and  3 quantities.
According to the formulation presented (7), an increase in the parameter  1 signifies a priority emphasis on the quality of the final product or the production rate, granting the controller the freedom to allocate additional resources and/or induce greater levels of degradation to the machine to enhance the output quality.Conversely, a decrease in the value of  1 indicates a lower priority assigned to the quality of the final product compared to considerations such as energy consumption and system degradation.Real-life scenarios where the output quality or rate can be manipulated to reduce energy usage and/or imposed degradation on the system are exemplified by systems where the control parameter of interest is of the nature of speed or rate (e.g., autonomous vehicles or drones).Notably, increased velocity in these systems corresponds to higher energy consumption and a greater degree of system degradation.As a result, decreasing the value of  1 enables the controller to deviate from attaining a precise output rate and instead adjust the input () to regulate the system's speed, effectively managing energy usage and/or imposed degradation.Fig. 4(a) illustrates the optimization of production costs when placing a high priority on both the production's quality and rate ( 1 is set to a high value).
An increase in the parameter  2 signifies a higher prioritization of minimizing input (energy) usage within the production plan.Consequently, the controller is permitted to generate lower-quality output and/or compromise the system's health to a greater extent in order to achieve reduced input consumption.Conversely, a decrease in  2 allows the controller to utilize more energy to compensate for other factors, such as output quality, output rate, or system health.To address output errors through increased energy consumption,  2 needs to be smaller than  1 .Similarly, to address system health concerns via an increase in energy usage,  2 should be smaller than  3 .This scenario exemplifies a real-life situation characterized by high energy costs, where a trade-off between output quality, output rate, maintenance costs, and energy consumption must be optimized.A graphical representation of this concept can be observed in Fig. 4(b).
The prioritization of system health is regulated by utilizing the parameter  3 .An increase in the value of  3 empowers the controller to modify the output rate, quality, and/or energy consumption, thereby ensuring the preservation of system health.Conversely, reducing the value of  3 permits the system to experience a greater degree of degradation to maintain the desired output quality, rate, and/or energy usage.In the context of systems such as manufacturing machinery control, an increase in  3 necessitates the manipulation of quality, rate, and energy in a manner that minimizes the system's degradation.This scenario is illustrated in Fig. 4(c), where the reduction of the production quality and rate, along with an increase in energy usage, has resulted in minimized maintenance costs.The practical situation in which system reliability can be maintained through the control of output quality and rate is exemplified by rolling mills, where a decrease in either output rate or quality results in a notable extension of the machine's operational lifespan.
This discussion on degradation control is crucial because, to benefit from its advantages fully, an accurate assessment of its disadvantages is essential.Notably, some situations exist where degradation control is impossible because none of the aforementioned costs can be paid.Finally, when penalties are configured, the SAC ( in ( 7)) should be calculated in the next step.

State-action cost calculation
The concept of SAC refers to the cost associated with a particular combination of the system's state and the action applied at that state.This cost represents the degradation or impact on the system's health due to the interaction between the state and action, which is critical in optimizing the overall performance and longevity of the system.The SAC quantifies how much each state-action pair contributes to the system's degradation over time, providing a comprehensive measure to guide decision-making in maintenance and operational strategies.The determination of SAC involves capturing data during system operation to evaluate the cost for each state-action pair, allowing the construction of a SAC table, or lookup table.This table serves as a reference to assess the degradation cost at any given instance, facilitating the optimization of the control process by solving optimization problems, such as the one described in (7).
In (7), the first term depends solely on system states, and the second term depends only on system input.In contrast, the third term is a function of the system's state(s) and input(s): This does not mean that in all cases, both of these parameters participate in the degradation; however, to keep the generality of the solution, this structure is considered for  .Consequently, to minimize the cost function, the cost of every state-action combination must be identified.The most general approach for identifying the SAC, regardless of the system structure and operating conditions, is to capture data while the system is in operation and determine the cost associated with each recorded state-action combination.Then this data can be stored as a SAC table (lookup table) and at each instance the degradation cost of the machine can be extracted from this lookup table.
Assuming the full health of the machine after each maintenance, Ψ is a desired parameter to calculate, and it is considered a vector that includes the degradation costs of all recorded state-action combinations where  is the system's state,  is the system's action,  is the stateaction combination,  is the number of all possible (combinations of) system states,  is the number of all possible (combinations of) system actions, and  is the number of all possible system state-action combinations.
After creating Ψ, it is feasible to produce the vector  for each runto-failure cycle.This vector includes the count of occurrences of each state-action combination ( 1 , … ,   ) in a single run-to-failure cycle where   is the number of times state-action   has appeared in runto-failure cycle  (or number of time that degradation cost of   has been imposed on the system).
Then, the matrix Ω, including the records of  run-to-failure cycles, can be generated Finally, the cost of each state-action from the degradation viewpoint can be calculated using the following optimization min where  can be any positive constant.Optimization ( 12) is a categorical regression that calculates the SAC as a linear and constrained regression.It considers any state-action combination as a unique category.
Conceptually speaking, in optimization (12a), the SoH of the system is assumed to be a source with the capacity of .This source is considered full after each maintenance and is considered empty after each failure.Using optimization (12), a particular cost is assigned to every state-action combination according to the amount that the action consumes from that source.This means that every action consumes some of the machine's health.Knowing that there is no possible stateaction that can improve the system's health, Eq. (12b) ensures that this consideration is considered in the calculation.
The precision of the SAC, as determined by Eq. ( 12), exhibits a significant reliance on the quantity of recorded data.Presuming a consistent operational condition for the machine, which is not a strong assumption given that manufacturing machines typically operate in stationary settings with highly stable conditions, Eq. ( 12) can be solved via the utilization of Lagrange multipliers () by solving a series of linear equations. [ Then, knowing that there are no state-action combinations that can improve the SoH of the machine in any way, and there are only actions that degrade the machine or not, it can be inferred that the estimation of the Ψ is unbiased and where Ψ is estimated using (12), and   is the real degradation cost of each state-action combination.
To implement this method, the states and actions should be quantized to construct Ψ in (9).The quantization level must be determined according to the desired degradation control accuracy.One possible constraint of using the method in ( 12) is its high dimensionality.If the desired output is restricted to a limited number of settings, then the number of state-action combinations is also limited.Nevertheless, if the desired output needs to cover a broad range with high-resolution quantization, then it is not feasible to construct Ψ.
One possible solution to this problem is to use function estimators.Several linear and nonlinear regression methods are available for this type of estimation.However, if the function is highly unsmooth, neural networks are the most practical option for this task.In this case, the recording of state-action combinations is restricted to a finite number.Then, the Ψ estimation is performed by training a function estimator using the recorded state-action combinations.In other words, the data obtained from ( 12) serve as the training set for the function estimator.Once the estimator is trained, it is utilized to compute the third term during the optimization of (7).

Dynamic programming
When all the terms included in (7) are known, the optimal input ( *  ) can be found.Given that the  (, ) function may not have a closed form since it is derived from discrete calculations or a function estimator like a neural network, it is highly likely to be unsmooth.Therefore, minimizing (7) requires a numerical approach.
Dynamic programming and the principle of optimality [29] provide that if  * = { * 0 ,  * 1 , … ,  * −1 } is an optimal policy to the general problem and if  *  is applied, state   occurs at time .As a result, the subproblem of minimizing the ''cost-to-go'' from state   to state   is and the truncated policy { *  ,  * +1 , … ,  * −1 } is optimal for the general problem [30].
Thus, using dynamic programming, the extended version of ( 6) can be minimized: where and f is already calculated in Ψ or by the function estimator trained based on the calculated Ψ.Thus, at each step, ( 16) can be calculated for all possible state-actions, and the optimal actions ( *  ) minimizes (16).
The stability of the system is the final concern within this section.This stage necessitates the consideration of three key points.First, the design process employed for the proposed model predictive degradation controller (MPDC) relies on recorded data derived from the already closed-loop system.Consequently, stability conditions or constraints have already been established.Second, subsequent to the computation of the variable   as indicated in Eq. ( 17), the minimization of Eq. ( 16) at each instance is a Markov decision process (MDP), whereby the controller selects the state with the lowest cost from all possible next states.Third, regardless of whether the optimization method in use is represented by Eq. ( 2) or Eq. ( 7), the fundamental principles governing the system remain unchanged, and a known input will invariably produce a known output, which means that the stability constraints do not change when switching from the previous controller to MPDC.In light of these three considerations, the stability conditions or constraints remain unaltered during the transition to MPDC.Consequently, as the existing controller has been verified as stable, it can be inferred that the MPDC will also possess stability because by using MDP and transferring the stability constraints from the previous controller as the high-cost states, the unstable states will never be chosen by the controller.It is noteworthy that techniques such as those presented in prior works such as [31,32] can be employed to facilitate the design of a stable MPC.Thus, for situations where the stability constraints for the existing controller are not accessible, the constraints defined in these methods can be used for designing a stable MPDC without alteration.

Production planning
By breaking down the entire production process into its individual actions and estimating the degradation cost associated with each stateaction in the system, it is feasible to approximate the degradation cost of a planned production.Assuming that the system has undergone maintenance and is now deemed to be in good health, the expected production until the next maintenance is known as where   is the  ℎ product to be produced and  includes  products to be produced until the next maintenance.
In the following stage, every product must be transformed into individual state-action combinations (Ψ).As a result, a production plan can ultimately be broken down into its component costs from a degradation perspective.
where    is the cost of the  ℎ state-action needed for the production of plan  and  identifies the cost of this action according to Ψ calculated in (12).Thus, the estimated cost of the production plan  can be obtained by summing all the  components that it comprises: Thus, if the production cost   is larger than , which has been optionally defined in (12), the production cannot be accomplished without the maintenance break.To maintain production until the desired maintenance time, which is after the production of  products, the desired degradation cost needs to be decreased by eliminating the highcost actions that contribute to system degradation.The set of actions that need to be filtered only depends on the relation of the   and .
For example, if   is calculated to be 1.15 for the  that is defined to be 1, then the top 15% of the actions, according to the degradation cost viewpoint, should be removed.Notably, achieving both an acceptable production quality and rate while extending the system's lifespan may not be feasible based on the specific characteristics of the system and production objectives.This is further discussed in the results section.

Simulation
Simulations are utilized to validate the proposed methods for SoH control.These simulations aim to demonstrate how the controller reacts to variations in the system's physical parameters due to degradation and manages the actions that lead to increased system degradation, thereby increasing the mean time to failure (MTTF).Therefore, the simulation method employed to mimic degradation closely resembles the actual degradation process in any system, and the machine chosen for this simulation is generalizable to any degrading machine.
The first machine used for this simulation is a DC motor.DC motors are used for different tasks, owing to their simplicity and reliability.However, one major degradation in these machines is the change in their internal parameters, especially resistance.The internal resistance of the DC motors changes for two reasons.First, an inter-turn short circuit occurs from damage to the insulation [33].Over time, this phenomenon reduces the internal resistance of the motor.The second source is the wear of the brushes [34].The gradual wearing down of the brushes and decrease in motor coil resistance change the internal parameters of the motor continuously.Over time, the actual output diverges from the desired output as the controller is created using the nominal parameters of the motor.At some point, this deviation exceeds a certain threshold, and the system is considered failed.
In some occasions where the system is critical, e.g., carving CNC machines or 3D printers, very small deviations significantly impact the final product and may lead to a considerable cost due to material loss.Thus, the precise prediction of the RUL to save cost on the spare parts by using them throughout their healthy lives while making sure that, during hours-long tasks, the system can stay reliable according to desired outcome is crucial predictive maintenance.
The second machine is additionally employed to verify and assess the effectiveness of the proposed methodology on a more complex system with considerably more state-action combinations, considering the number of states, inputs and their respective working ranges.The model used for this purpose is the hydraulically actuated arm (HAA) [35,36].Illustrated in the diagram of the hydraulic arm in Fig. 5, the electrohydraulic actuator is utilized to rotate the mechanical arm.The main degradation observed in these systems is the reduction in valve gain due to erosion of the flow nozzles [37].

DC motor model
The state space of the DC motor can be written as where   is the motor current,  is angular velocity,   is resistance,   is inductance,   is back-emf,   is torque constant,   is the motor inertia,   is the friction coefficient, and  is the input voltage [38].Also, the state space of the HAA can be written as

𝜃(𝑡) 𝜔(𝑡) ω(𝑡) 𝑞(𝑡)
where  is the arm angle,  is arm angular velocity, ω is arm angular acceleration,  is hydraulic valve displacement,   is servo motor gain,   is motor inertia,   is motor natural frequency,   is servo valve gain,   is the differential pressure coefficient, and  is the input voltage to servo valve.Parameters used for these simulations are mentioned in Table 1.

Degradation model
The degradation model solely serves to generate data from the simulation model and evaluate the proposed method's effectiveness.In reality, the mapping from the system's state-actions to degradation cost is accomplished based on the recorded data from the machine regardless of the degradation model.The degradation cost used for both simulation models is formulated as follows: where  is the number of states and  is the number of inputs.Then, for both models, the degradation model is considered to be the exponential degradation model [39,40]: where   is the degrading parameter, which is   (motor resistance) for the DC motor simulation, and it is   (hydraulic valve gain) for the HAA,  is the working cycle defined as the time it takes for the machine to produce one product,  is the length of the cycle, and values  1 ,  2 , and  3 can be found in Table 2.This formulation considers the degradation in motor resistance (for the first simulation) and servo valve gain (for the second simulation) to be dependent on the cumulative sum of the numerical derivative of the system's states and inputs.In this way, the greater the changes in system states and input, the more degradation occurs; in other words, the more the machine is used, the greater the degradation [41].However, for the HAA, the coefficient  3 is considered to be 4 −   , which means that the further the output from four, the more degradation will be imposed on the system.This mimics the real situation for systems, such as car engines, that are designed for an optimal working point, and the distance of the operating point to their optimal design point has a proportional relation to their degradation.In this way, both simulations can show the proposed method's flexibility and adaptability when applied to different systems sizes with different degradation characteristics.

Degradation in the systems with MPC
Degradation is described as a monotonic change in the system's signal(s) [42], and the model of the degradation may be identified by analyzing changes in the input () and output () values over time [20].Although gradual fluctuations are observed through  and , they are caused by changes in the system's characteristics brought on by deterioration.Figs. 3 and 6 illustrate the effects of degradation on MPCs.According to Fig. 3, at each instance (), the future outputs for the desired prediction horizon and inputs according to the control horizon are generated.Then, the optimal input for the defined horizon is chosen and applied to the system, after which the whole process is executed again for the next cycle.An issue arises when the controller uses a model designed based on nominal system values to predict future outputs based on future inputs.In this case, nominal values have changed over time, owing to degradation, causing an increase in the deviation between the predicted output and the actual output, as shown in Fig. 6.The deviation continues to rise until it surpasses the acceptable threshold, at which point the system's output quality is deemed inadequate and regarded as a failure.
The threshold for the deviation of the actual output and the desired output for both simulations is defined to be 5%, which means the failure function (  ) for the motor is where   () is the desired angular velocity at time , and failure function (  ) for the HAA is where   () is the desired arm angle at time .

Validation
In order to assess the effectiveness of the proposed methodology for calculating the SAC as referenced in Eq. ( 12), the root mean square error (RMSE) metric will be used.
where ŜAC(i) is estimated SAC using (12) for the  ℎ state-action pair, and SAC(i) is the actual SAC for the  ℎ state-action pair measured from the simulation model.
In evaluating the control methodology, both the MPC controller and MPDC will be subjected to process noise generated from  (0, 0.2).The comparison will involve examining the RMSE between the actual output and the desired output.Additionally, the settling time and degradation cost will be considered for comparative analysis.
For the visual clarification on the control method, a graphical illustration of SAC serves as validation for the obtained result.The state-action cost map (SACM) includes all the possible states of the machine, quantized in its working limits, on its -axis and all the possible inputs to the system on the -axis.In this way, the controller's response can be observed based on the cost function based upon which it is being optimized.The order in which the combination of the states (for the -axis) is generated may vary without affecting the result; however, the approach employed in this article for a system consisting of two states,  1 and  2 , involves where  is the number of quantized levels for  1 ,  is the number of quantized levels for  2 ,  is the number of possible unique states (all combinations of  1 and  2 ), and  is the number of quantized levels for .The value of each point in SACM is the normalized cost of that state-action combination computed according to (8).The main SACM is eventually a single map However, it can also be plotted as three plots, each presenting each term of (29).

Results
This section presents the results from the simulation conducted to assess the effectiveness of the proposed method.The first part of the section presents and analyzes the responses of the MPC approach for the DC motor.First, the system's responses without incorporating the degradation controller will be discussed.Then, the MPDC's responses will be shown and discussed.The last part of the first section demonstrates how the proposed method's capabilities can be utilized to enhance the performance of production planning and maintenance management systems.In the next section, to demonstrate the effectiveness of the proposed method when dealing with more complex systems, where the number of state-action combinations is very high, the results of applying the method to the HAA will be shown and discussed.

MPC responses of the DC motor
Fig. 7 shows the costs of various states and actions, along with the response of the MPC and the degradation costs of different step responses.Three step responses are used as the representation of three unique outputs, i.e., three different products of the system.These responses are the unit step response and steps equal two and three, as shown in Fig. 7(b).Fig. 7(a) (blue line) presents the actual degradation costs of the states and inputs of the systems.The relationship between the states and inputs with the cost is highly nonlinear.Moreover, although the costs of different states and actions are presented individually in the figure, the estimation of the degradation cost involves a multivariable mapping, and Fig. 7(a) only intends to illustrate the nonlinearity and irregularity of the degradation cost function based on the system's parameters.The blue line in Fig. 7(a) demonstrates a significant reduction in the computed cost of degradation for the steady states of the system (output equal to one, two, and three) compared to the other states.This observation aligns with the expectation that the employment of Eq. ( 12) leads to the steady state and its corresponding actions obtaining the lowest cost, as they are repeated frequently and have larger values of    in Eq. ( 10) (where     shows the number of times the state action of the steady state of the system is repeated in one life cycle).Consequently, the steady states and their corresponding inputs receive a lower cost from optimization in Eq. ( 12).This is also logically desirable because even if the steady states have very high degradation costs, the controller should only filter them with the consideration of the PPC, which can be included using the configuration of the differences between  1 ,  2 , and  3 .
For the next step, a deep neural network composed of one input layer, four fully connected hidden layers with 512 neurons and activation layers of the rectified linear unit, and a regression layer as an output layer is trained for mapping the state actions to their costs as the multivariate regression.The result of the regression is shown in Fig. 7(a) as a red line.As mentioned earlier, the lines in Fig. 7(a) are only generated for illustration, and the final function estimation is a multivariable regression.Fig. 7(c) illustrates three distinct step functions plotted on the SACM, wherein each function exhibits a varying cost from a degradation perspective.The degradation cost associated with each system state during transitions is represented by the shade of the underlying color of the state, as denoted by the symbol ''•''.The darker the ''•'', the lower the degradation cost associated with that state, while a lighter color represents a higher degradation cost.The three systems commence from their initial state, where the three lines converge in Fig. 7(c), and move toward their final steady state, passing through different states while imposing varying degrees of degradation on the system.Notably, the behavior of the MPC controller remains unaffected by the degradation cost of the states it passes through, as it is unaware of the associated costs.This method aims to enhance the controller's awareness of the underlying degradation cost and enable it to optimize its actions accordingly.

MPDC responses for the DC motor
Fig. 8 shows the responses of the MPC vs. MPDC for step responses with varying amplitudes.The MPDC has been designed to reduce degradation by 50% for all three step inputs.The results indicate that the effect of degradation control is less perceptible in the case of unit step responses compared to those with amplitudes of two and three.For step responses equal to two and three, the MPDC is observed to traverse darker regions in the SACM, which correspond to states with lower degradation costs.Additionally, for step inputs with amplitudes of two and three, the MPDC response (red curve) successfully eliminates the state action located at the bottom of the MPC response curve (blue curve), which has a high degradation cost.These eliminated states can be seen in the top graphs, where the maximum of the MPDC responses is considerably lower than that of the MPC responses.Notably, the distance between the point in the SACM does not necessarily reflect the actual difference between the two states.
Fig. 9 displays the MTTF of the machine and the final motor resistance prior to failure for both MPC and MPDC.
The simulation was conducted 100 times for both MPC and MPDC, with inputs generated from a uniform distribution  [1,3].The results reveal that the MTTF increases by over 50%.These results suggest that the optimization procedure outlined in Eqs. ( 12) and ( 16) successfully eliminates state actions that are highly influential on the degradation and loss of control.

Production planning
To validate the effectiveness of the proposed method for production planning, a production plan is generated for 55 products chosen randomly from a uniform distribution of  [1,3].The system is then simulated using MPC, and the degradation is defined in (23).The simulation results indicate that the system can only operate for 41 cycles, producing 41 out of the 55 desired products before it passes the maintenance threshold level and fails.
Furthermore, for the simulation purpose, the degradation control approach proposed in the previous section, which results in a 50% degradation, is assumed insufficient to achieve the desired product quality or production speed.Therefore, an optimal degradation control approach is sought, allowing the machines to operate at their maximum possible quality and speed performance until the 55th product is produced.Maintenance is executed after the 55th product without considering the SoH.
To address the problem, a simulation of the entire production process for  , consisting of 55 individual products, is conducted, during which all state actions necessary for the production plan are recorded.The value of   from ( 19) is subsequently generated.Based on the previously computed value of  from (12), the cost   from ( 20) is determined to be 1.33.Since  in ( 12) is assigned to equal 1, more than 33% of the actions with the highest associated costs need to be eliminated from the actions required for the production of product  .Two specific state-actions are identified within the SACM that are responsible for 33% of the degradation in the desired plan.These state actions are only used to produce the third product (a step function with an amplitude of three).Fig. 10 displays the system's responses, which are configured in accordance with the desired production plan.A comparison of the control strategies reveals that the MPDC, which is configured based on the production plan, yields outputs identical to MPC for products that do not affect the production plan as a result of their production degradation cost (products 1 and 2).However, the MPDC completely alters its policy toward the products with the degradation costs such that production can continue until the current plan is completed.This policy change is evident in the system's response for step equal to three, where the controller eliminates the region that imposes the highest degradation cost on the system and subsequently returns to its normal working condition.
Fig. 11 shows the cumulative degradation cost of different control policy methods, which is the summation of the second term in the right-hand side of Eq. ( 24) for one cycle.For a better illustration, the graph has been normalized.Corroborating the previous results and analysis of the dynamics of the response, it can be seen that eliminating undesired and high-cost state-action combinations has led to a considerable difference in degradation imposed on the system.Fig. 12 illustrates the deviation between the desired and actual output for a given production plan.The graph is simplified to enhance its legibility.The production plan involves three products (labeled 1, 2, and 3), each exhibiting a distinct deviation from the actual output.Notably, product 3 (corresponding to a step of three) experiences the most significant deviation, and its quality is the primary determinant of system failure in this simulation.As a result, only the quality threshold for this product is displayed.The graph indicates that the MPC output surpasses the failure threshold at the 41st cycle.On the other hand, the MPDC approach with a 33% reduction in degradation costs remains operational until the production of the final product (55th product) under the same production plan.Although the MPDC approach with a 50% reduction in degradation costs may continue working for several additional cycles, the resulting product quality and production rate may not align optimally with the desired production and maintenance plan.
Table 3 shows the parameters needed to compare the control quality for different control methods.It can be seen that degradation costs corroborate the level of the degradation control, and the costs paid for this degradation control are paid using settling time and output error.In the ''output RMSE'' section of the table, it can be seen that, generally, the MPDC has better control quality.This was expected, as the degradation is proportional to the system overshoots, and the MPDC controls that indirectly.However, it can be seen that the output error has increased in the MPDC proportional to the increase in the level of the degradation control as a cost for degradation control.On the other hand, in the settling time, it can be seen again that the MPDC has better performance due to overshoot control, but the settling time has increased for the MPDC with 50% degradation reduction.This was also expected, as the undesired state, identified during production planning, only exists in the production of the third product.Thus, eliminating that state-actions leads to an increase in the settling time (production rate).

MPDC and production planning for hydraulically actuated arm
The example of HAA is used to study the method's effectiveness in dealing with systems that have a higher number of state-action combinations.In comparison to the DC motor, the HAA manifests   a significantly higher number of state-action combinations.Table 4 shows comprehensive details of both simulations.In the case of the DC motor, all of the unique state-action combinations recorded in the six run-to-failure training sets are used to generate degradation costs set,  .However, for the HAA, the 160,000 most repeated state-action combinations from the first five run-to-failure records were used to generate  .Fig. 13 illustrates the responses of the MPC controller in controlling the HAA.The closed-loop step responses are depicted in Fig. 13(a).Additionally, the system's responses, when superimposed on the actual SACM, are presented in Fig. 13(b).
For this simulation, a fully connected deep neural network was used as a function estimator.The architecture of this network comprises one input layer and seven fully connected hidden layers, each containing 2048, 1024, 512, 256, 128, 64, and 32 neurons, respectively.These layers are interconnected with rectified linear unit activation functions.In the end, a regression layer is used as the output layer.
The accuracy of the SAC estimation, based on the number of runto-failure records used to train the function estimator, is illustrated in Fig. 14.This indicates that the accuracy of SAC estimation improves with an increasing number of run-to-failure records, as was expected according to (14).Additionally, despite the substantial number of stateaction combinations, the SAC estimation reaches an RMSE of around 0.06 after 40 run-to-failure cycles.
To study the effect of quantization and the amount of run-to-failure data at the same time, a production batch consisting of 100 products  randomly chosen from the distribution  [1, 3] is considered as the desired production and manufacturing optimization goal.The planning for production batches was conducted based on the different SACs estimated using various numbers of run-to-failure records and different quantization steps.Therefore, for each SAC estimated with a different number of run-to-failure records, 50 production batches were tested.Fig. 15 shows the effectiveness of the production planning based on the number of run-to-failure records used to estimate SAC, as well as different quantization steps.Three different settings are used for the quantization steps.For the first setting, the quantization steps are considered to be those mentioned in Table 4 for the HAA.Also, twice and five times bigger than the steps mentioned for the first setting are considered as quantization steps for the second and third settings, respectively.It can be seen that with state-action combinations quantized with the first setting, the method reaches high reliability by using only 40 run-to-failure records.Furthermore, it is observed that the system reaches very close to the global optimal production setting using 160 run-to-failure records.In this scenario, the system does not fail before production is finished, and the production rate is the fastest possible, allowing the system to reach its desired maintenance level at the intended time.Using the second setting, the system reaches near the global optimum with 160 run-to-failure records.However, to achieve acceptable reliability, more than 80 run-to-failure records are needed.Finally, it can be seen that with the third setting, when the quantization steps are five times larger than those of the first setting, the system never reaches the global optimum or a reliable production plan control.In this case, it can only improve the system lifespan by preventing the system from entering a very high-cost state-actions.

Discussion
Irrespective of the specific methodology used, the main concept of the entire procedure is to identify the state actions that have a more detrimental effect on the machine's health (compared to other state actions) and remove them during the machine's operation.This method can be implemented using various mathematical techniques  and control strategies, depending on the specific characteristics of each system.Nonetheless, the structure of the proposed approach has been formulated in a manner that enables its implementation across all forms of systems and degradation models.For instance, the SAC can be calculated for any system, provided that the system's inputs and outputs are accessible.Similarly, the model predictive control scheme can be applied to any system if the physical model is available.
Although the present article focuses on degradation awareness in the systems, the SAC can also be calculated for other crucial parameters, such as the battery's state of charge, energy cost, and cost of specific spare parts.Doing so makes an optimal compromise possible between system performance and the cost associated with the desired parameter.For instance, specific actions within the system may cause degradation to an expensive machine's component.In such cases, the SAC can be determined using the same formula outlined in (12), but with the parameter  in (10) now only representing the cycles leading to a failure of the particular component in question.Consequently, the entire process can be configured to achieve the desired balance between system performance and the degradation of the specific component.
The possible challenges posed by quantization in large or continuous state-action spaces are, in practice, more manageable than they may appear theoretically, using the following solutions.Firstly, degradation models in real-world applications typically exhibit a very low rate of change across a state-action cost domain.This characteristic implies that it is nearly impossible for two states that are close together to have significantly different degradation rates.Consequently, the quantization levels can be chosen effectively with the data in hand.Secondly, it is not necessary for all states and actions to be quantized using highest precision when the system is accessible.As discussed in this study, the quality of control is enhanced with the availability of more data.Thus, the presence of a SACM allows operators to optimize the quantization process and tailor it to their specific system without being overwhelmed by an excessive number of state-action combinations.To this end, states that have no effect on the degradation can be removed, and different quantization levels can be deployed for different intervals of the states that have degrading effect.
The contributions of this research extend beyond theoretical advancements, directly impacting the future of automated manufacturing.
By explicitly incorporating degradation costs into the control process, this methodology paves the way for more intelligent and adaptable manufacturing systems that can respond dynamically to changing operational requirements/conditions.This work not only improves the accuracy of predictive maintenance but also provides a framework for optimizing production processes in real time.These innovations lay a strong foundation for future developments in smart manufacturing, where system health management and production optimization are seamlessly integrated to achieve superior efficiency and reliability.

Conclusion
This article proposes a method for actively supporting high-level decisions by controlling machine degradation.The approach involves the calculation of the SAC of the machine, regardless of the degradation model or working condition.Subsequently, a new cost function that incorporates the cost of degradation, as well as input and error costs, is utilized to design an MPC controller.This enables the controller to take into account the impact of its actions on system degradation.The SAC is then utilized to balance system performance and degradation optimally.The proposed methods are validated through simulation and the use of SACM.
The methodology developed in this research not only addresses current challenges in manufacturing but also paves the way for future advancements in digital and intelligent manufacturing systems.By incorporating degradation costs into the control framework, this approach enhances the ability of manufacturing systems to operate autonomously and adaptively, responding to real-time data and varying operational conditions.This capability is crucial as the industry moves towards more interconnected and intelligent manufacturing environments, where systems must be capable of self-optimization and predictive maintenance without human intervention.The integration of degradation-aware control into the broader context of digital manufacturing ecosystems supports the evolution of smart factories, where machines can communicate, learn, and optimize their operations collaboratively, leading to improved efficiency, reduced downtime, and extended equipment lifespan.This research lays the foundation for these future systems, contributing to the ongoing transformation towards fully automated, intelligent manufacturing landscapes.

Declaration of competing interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Niclas Bjorsell reports financial support was provided by Region Gävleborg.Niclas Bjorsell reports financial support was provided by Swedish Agency for Economic and Regional Growth.

Fig. 4 .
Fig. 4. Total production cost according to different manufacturing priorities.

Fig. 14 .
Fig. 14.Estimated SACM for different number of run-to-failure records for HAA.

Fig. 15 .
Fig. 15.Accomplished number of products planned based on different SAC for HAA.

Table 3
Control quality parameters in DC motor.

Table 4
Simulation data.