Maintenance Strategy Based on Reliability Analysis and FMEA: A Case Study for Hydraulic Cylinders of Traditional Excavators with ERRS

A simple and effective mechanical parts maintenance approach with lower cost is urgently needed by the cost-sensitive manufactures for traditional excavators (HE). +is paper proposes maintenance strategy for hydraulic cylinders (HC) of HE with energy regeneration and recovery system (ERRS). Reliability analysis and FMEA of historical failure data are applied to make maintenance strategy. In this study, the failure data required for reliability analysis are collected from the manufacturers and users over two and a half years, Excel is used as statistical tool, Minitab is used for parameter estimation, and Kolmogorov–Smirnov test is used to reject or accept the hypothesis of the distribution model. +e reliability parameters R(t), Rset, and R(t) are determined and parameter β is the reference value for making countermeasures and maintenance policies properly for the failure modes of the newHC of HEwith ERRS.+e purpose of this paper is to make propermaintenance policies and tomaintain a high availability level and fulfill the user’s needs for HC, which also paves the way for further reliability study about ERRS.


Introduction
For their high-value creativity, hydraulic excavators (HE) are widely used in construction projects of houses, roads, and bridges, water conservancy works, and energy and mineral developments. By statistical analysis, the earthwork in the world finished with HE is up to 65～75%. As global energy crisis and environmental pollution increasing, kinds of energy-saving technologies are widely used in high energy-consuming HE. Many researchers focus on the hybrid power system applied in HE for recycling kinetic and selfgravity energy. e energy regeneration and recovery system (ERRS) based on the flow regeneration balance theory is popular among manufactures considering its assembling ability and controllability and it has been applied in the actual production of medium and large excavators.
According to the feedback from customers, the HE with ERRS has shown remarkable energy-saving effect but with higher fault rate in hydraulic system compared with the non-energy recovering system excavators, particularly on the hydraulic cylinders (HC). In the current economic situation, the competition between enterprises is more crucial than ever. Machine maintenance is directly related to manufacturing companies' competitive ability in terms of cost, quality, and performance [1][2][3]. Construction equipment is normally sold with a maintenance package or other maintenance services to ensure that the products maintain a high availability level and fulfill the user's needs [4][5][6]. In some enterprises, the software Minitab is applied to do reliability analysis and obtain reliability value based on sophisticated quality data offered by the maintenance crew or the failure information uploaded by the GPS or some realtime monitoring device installed on the machines, which can help to improve reliability of the machine. Modern maintenance approaches intend to lower failure rates for its direct impact on machine downtime and then improve productivity. ese modern techniques reflect a transition from corrective maintenance practices to proactive maintenance which has the advantage of solving problems before they come into place and replacing parts after a certain level of deterioration has been identified.
As one of the key parts of ERRS, it is significant to take proper maintenance strategy for reducing the failure frequency and improving the availability of HC. But it is challenging for equipment manufacturers to make maintenance schedules efficiently for HC of HE with ERRS newly developed since the products are often subject to harsh usage and inadequate daily maintenance care, which can easily lead to failures or downtime.
Researchers have proposed many maintenance methods, such as corrective maintenance (CM), which leads to high levels of system breakdown and high repair and replacement costs due to failures which occur suddenly [7]. Preventive maintenance (PM) is pertinent to that maintenance planning which requires a long-term strategy for executing maintenance actions within a predetermined interval, ensuring a system continues to fulfill its intended function [8][9][10][11][12][13][14][15]. Predictive maintenance (PdM) is an advancement on PM, which proposes measures for scheduling based on the condition of the monitored and prognosis of the future system and component maintenance [16][17][18][19][20]. Conditionbased maintenance (CBM) is an extended version of PdM, where the equipment is assessed by real-time continuous monitoring and periodic inspection, and maintenance actions are performed based on measurement of their condition and maintenance logistic [12,21,22]. Reliabilitycentered maintenance (RCM) is usually related to maintenance actions such as repairing, replacing, overhauling, inspecting, servicing, adjusting, testing, measuring, and detecting faults to avoid any failure that would lead to interruptions in production operations [23,24]. It is in a highly involved process that each piece of equipment must be analyzed and prioritized. A very mature maintenance team is required to master lots of existing data and analysis skills for making maintenance plan.
In recent years, condition-based maintenance (CBM) and CBM + optimized based on the concept of CBM have received sufficient attention in machinery condition monitoring and prognostics [25]. e limitation of using CBM and CBM + on complex equipment is that it costs huge when applied to the civil industry machines. So it is imperative to propose an approach which costs less and operates easily for manufactures. A reliability-centered maintenance (RCM) approach is proposed and applied in some situations. It is an industrial improvement approach focusing on identifying and establishing the operational, maintenance, and capital improvement policies that would most effectively manage the risks of equipment failure. It is an engineering framework used to estimate time-related parameters to increase the uptime of machine, which could provide information for managing and controlling the preventive maintenance of equipment and could result in potential cost increased, but it can reduce the amount of routine maintenance work by 40 ～70% if RCM is correctly applied [23]. RCM is so complicated that it is not the best choice for the enterprises which pursue high efficiency. Different policies with various features suit diverse situations and implementation stages. e basic purposes of maintenance policies are to reduce unplanned component or system breakdowns and to increase lifetime. e combination of different maintenance methods is used in practice to offset their weaknesses.
is paper aims to discuss the maintenance strategy for HC of HE with ERRS based on reliability analysis and failure mode and effect analysis (FMEA). FMEA is a forwardlooking risk-management technique that is widely used in various industries for promoting the reliability and safety of parts, equipment, systems, and services [26][27][28]. In this study, the failure data required for reliability analysis are collected from manufacturers and users over two and a half years.
e original data include the failure parts, failure times, failure modes, manufactures, delivery times, and other detailed information about the whole machine. In this study, failure times, and failure modes are the key information for reliability analysis and FMEA, Excel is used as statistical tool, and Minitab is used for parameter estimation, and Kolmogorov-Smirnov test is used to reject or accept the hypothesis of the distribution model. e reliability parameters R(t), R set , and R * (t) are calculated for making maintenance decisions, and β * is the reference value for making proper countermeasures and policies for the failure modes of the new HC.

Description of the Utilization System Based on Energy Regeneration and Recovery.
e energy regeneration and recovery system (ERRS) is based on the hydraulic accumulator balancing theory, in which the hydraulic accumulator (HA) is used for storing and releasing energy, the force caused by the accumulator's pressure acting on the boom always shows itself as a balancing weight for the load [29], and the flow regeneration can be realized via the check valve within the main reversing valve. e schematic principle of HE with ERRS is shown in Figure 1.
When the boom goes down, reversing valves 6, 11, and 12 are all linked on the left side, and the hydraulic oil (HO) is pumped into the rod cavity of main boom cylinder (RCMBC) 10 through the reversing valve 6; one branch of HO is carried into the two rods cavity of balance cylinders (RCBC) 9 via the reversing valve 12; HO in the piston cavity of main boom cylinder (PCMBC) 10 returns to the tank by the reversing valve 11; and the self-gravity potential energy generated during the boom down is accumulated into the hydraulic accumulator (HA) as hydraulic energy via valve 7.
When the boom goes up, reversing valves 6, 11, and 12 are all linked on the right side, and HO is pumped into the PCMBC 10 through the reversing valve 6; the accumulated HO is released into RCBC 9, HO in RCMBC 10 and RCBC 9 returns to the tank by the reversing valve 11 and valve 12, respectively. During the process of boom rising, the hydraulic energy accumulated in HA is released and supplied to the direction valve together with the oil transfered by the main pump.
us, the saved energy can be recycled and reused effectively [29,30]. e energy-saving in the 20-ton HE with ERRS achieves 41.6% during the typical working cycle by simulation analysis done by the other members of our research team [4]. e ERRS has been gradually used in medium and large types of excavators by various manufacturers. Figure 2 shows physical products of the 30-ton HE with ERRS which are more and more produced and used, with the main cylinder in the middle and the balance cylinders on both sides visible. For the big volume of the HA, it is embedded into the counterweight, invisible here. According to feedback, most of the users are satisfied with the energy-saving effect, apart from high failure rates of the new hydraulic system.
Although the HE with ERRS has been proven feasible and efficient by simulation and type tests, many problems still exist in the process of using, such as cracking on the boom, leakage of the main pump, fracture of the hydraulic hose, and kinds of failures about the boom and balance cylinders. In this paper, we mainly discuss the approach for the maintenance strategy of the hydraulic cylinders based on reliability analysis and FMEA, to reduce the occurrence of failures.

Proposed Methodology
Commonly used failure distribution models in reliability study are Normal, Weibull, Gamma, Logistic, Exponential, etc. e Weibull is a continuous distribution proposed in 1951, which is recommended in the reliability theory trend as the preferred life usage model. Procedures are usually based on the assumption that the failure data follow a Weibull distribution for its convenient mathematical properties. e well-known Weibull distribution is the most commonly employed model in reliability analysis for its ability to deal with small sample size and flexibility to approximate a wide range of statistical distributions [31,32], and its distribution function is introduced in detail in a few papers [33][34][35].
In actual production, it is common to choose a simple, effective but lower-cost approach. e best choice will be taken in different usage stages. Sometimes, the combination of various methods might also be used for maintenance decision-making, for example, an optimized conditionbased maintenance system by data fusion and reliabilitycentered maintenance [36]. To reduce maintenance cost and increasing enterprise profit, an optimized approach based on reliability analysis and FMEA for HC of HE with ERRS is described in detail in this section.

Maintenance Decision-Making.
e framework in this article is illustrated as follows: Firstly, the failure data about HC of the previous generation HE have to be collected from real operating environments to obtain the reliability estimation of HC. In this paper, the failure times and failure modes data offered by some companies are recorded by Excel, and parameters estimated and Anderson-Darling test are done by Minitab, and Kolmogorov-Smirnov test is used to reject or accept the hypothesis of the distribution model.
Secondly, the reliability of the old HC R(t) and the reliability of the new designed HC R * (t) can be determined from the failure times data. e reliability target R set is set based on the value of R(t) before designing the new HC. R set will be added 20 percent on top of R(t), which was described in detail in [37]. It should be noted that the R(t) and R * (t) represent the value of the reliability of old HC and new HC at some point, respectively, which are calculated in the same way with their historical failure data.
irdly, the maintenance decision is to be made by comparing the R * (t) with R set at effective working hours of the machine. As is shown in Figure 3, the dark spots from  top to bottom denote R set , R(t), and R * (t), respectively. If R * (t) < R set , the maintenance decisions should be made. e value of β is important reference for making maintenance decisions, and it could be determined by the failure times data of the part, which will be explained in the case study. As is shown in Figure 4, when β < 1, it is defined as the infant stage: failure occurs frequently in the initial operation period, so the failure rate is higher. en as the operating time increases, the failure rate drops rapidly. In this stage, the time interval in which failure behavior occurs is not sufficiently developed due to unknown influence. Main failure reasons are poor manufacturing method and procedures, poor debugging, poor workmanship and substandard materials, inadequate processes and human error, which are generally design problems or incorrect configuration. When β � 1, it is defined as the normal stage: failure rate is low and approximates constant. Some of the reasons for the occurrence of failures in this stage are undetectable defects, higher random stress than expected, poor maintenance abuse, and low safety factors. When β > 1, it is defined as the wear-out stage: quickly rising failure rate leads to scrap products; this third stage happens at the end of the equipment life cycle. Some of the principal reasons for occurrence of failures during this phase are inadequate maintenance, wear due to aging or friction, wrong overhaul practices, or corrosion [38][39][40][41].

Maintenance Strategy for HC.
In this article, the early failure records and maintenance data are used to evaluate the reliability of new HC, and then countermeasures are suggested according to β. It needs expertise to do FMEA which is on strength of feedback, brainstorming, and expert judgment, and to test after the HC likely failure causes had been speculated. e HC availability would be improved if effective countermeasures are to be established for corresponding root causes [37,42]. e proposed solution is also effective to perform maintenance strategy for the other mechanical components of HE and is handy for company maintenance crew.
ere are several failure-based planning methods for deriving the best maintenance policies which take into account the information about component or system deterioration. To anticipate a failure mechanism, methods of critical analysis are commonly used, such as tree diagrams, FMEA, and critical analysis. Firstly, we incorporate the temporality of fault events of the old components to facilitate prognostics for the new components. e failure stage where new HC are more inclined to be judged according to the slope parameter β * is estimated by Minitab based on the statistical historical failure data. Secondly, experts will do FMEA for the new HC and make maintenance strategies. A simple example is shown in Table 1, the failure part, failure time, and failure modes can be obtained from maintenance records offered directly by the maintenance crew. e most important but hard-to-find items are failure causes of the failure part. ere are several likely failure causes, so some on-site tests or analysis after the part is returned to the manufactures should be done by engineers for the equipment to make sure which one is true. If it is true, put the detect result with Y; otherwise, put the detect result with N. O ij is suggested ranking system for the occurrence of failure modes, and S ij is suggested ranking system for the severity of failure modes, as is shown in Tables 2 and 3, respectively. e value of RPN � S ij * O ij ; if one of the three inequalities O ij > 6, S ij > 6 , and RPN > 50 is true, the countermeasures would be taken to make the part modified. e weight here means how important the failure part is relative to the other parts of the equipment. It plays a vital reference role in making what kind of maintenance items for the failure part. After the failure cause is determined, the maintenance items, maintenance cycle, and executor could then be made.

Implementation Steps.
e implementation steps of making maintenance policy for HC of HE with ERRS are presented in Figure 5. e approach proposed in this article consists of the following steps: (1) To collect and sort the failure data for old HC of HE (2) To calculate the reliability R(t) and determine the reliability target value R set of the new HC (3) To calculate the reliability R * (t) of the new HC based on the failure data from HE with ERRS and to do parameter estimation to get the value of β * (4) To make HC maintenance decision by the result of the comparison between R * (t) and R set (5) To make HC maintenance strategy by FMEA with reference of the value of β * .

Main Contributions of is
Work. e following points summarize the main contributions of the work: (1) In this study, a simple and practical maintenance policy is proposed for HC of HE with ERRS, which is newly developed based on the previous generation.   (2) e proposed approach is also appropriate for other mechanical components which are cost-sensitive in general industries and civil industries. (3) Although reliability analysis and FMEA methods have been developed for many years, they also have to be optimized when used in real-world application, which is what we have done.

Historical Data Collection from Old Type of HC and
Reliability Target of New HC. Since HC is the core component of HE with ERRS, the failure of HC can lead to degradation and insufficiency of function of the whole hydraulic system, so it is significant to do regular evaluation for the same series of previous excavators and help to take maintenance policies to decrease failure rate of the HC and to partially enhance the complete machine reliability. R(t) should be determined by the historical failure data from the old type of HE. ere are 900 HC in total from old 450 sets HE in the same series and 269 failure times; because of the large volume of data, the specific failure times are omitted here. rough the sorting and calculation of the historical failure data, the result is shown in Table 4. R(t) at any time could be obtained using the parameter estimation. According to the GPS uploaded data, most of the working hours of HE assembled with the new HC is no more than 1000 hours, and most of the failure times of the HC are during 0～300 h. Here, we get the value of R(t) � 0.7212 at 300 h which is the effective working hours for HC. If the reliability increases by 20% compared with the old HC, the target reliability value R set � 0.86544.

Data Collection from HC of HE with ERRS.
e field failure data include the failure time, and failure modes of new HC were collected throughout one year on 99 HC with ERRS of the same series during the operation. e total failure number is 38, and the effective working hours of HC of different HE are listed, as is shown in Table 5. e Anderson-Darling goodness-of-fit test for the new HC failure times data has been done with Minitab; there are 14 types of alternative distribution models in total. e fitting results are shown in Table 6. e 3-parameter Weibull distribution has the smallest AD statistics, with the value of 1.061, which is the best goodness of fit. e cumulative failure distribution function of the 3-parameter Weibull distribution is given as equation (1). Figure 6 is the probability diagram of the new HC. As is shown in Figure 6, the values of the parameters estimation  e cumulative failure distribution function of the new HC is as given in equation (2): Finally, Kolmogorov-Smirnov hypothesis testing of the distribution model is required. When significance level α � 0.05, D na � 1.36 � n √ ; here n � 38, D n � max [|F n (t)-F 0 (t)|], D n < D na is always true, so the failure times of the new HC proved to obey Weibull distribution. e failure modes are shown in Figure 7. Leakage, creep, and abrasion are the main problems of HC, with 36.8, 23.7, and 18.4 percent, respectively. e failure causes of these failure modes will be analyzed and tested in detail in Section 4.4.

Maintenance Decision-Making.
We get the value of R * (t) � 0.6787 at 300 h, as is shown in Table 7. e reliability of new HC R * (t) � 0.6768 < R set � 0.86544 of the HC at operation time 300 hours, so maintenance policies have to be proposed. e parameter β * estimation of new HC from HE with ERRS is shown in Figure 6. e value β * of the new HC is 1.341 > 1; the failure causes of the HC could be analyzed according to the explanation about Figure 4 in Section 3.1. Table 8. According to the failure modes of the new HC, a few likely causes could be predicted due to the value of β * . To further confirm the correctness of the prediction, some on-site tests and analysis after the damaged HC is returned to the manufactures have been done by engineers. Finally, the main cause for the crack is that HC suffer external impact during working cycle, and the values of S ij , O ij , and RPN are 8, 8, and 64, respectively, so the countermeasures should be taken. Here, the protective board is suggested to be added on top of the HC which suffer intense impact easily. e main cause for the leakage is that instantaneous high pressure causes the damage of the seal, since the stored energy in HA released to the hydraulic system when the boom goes up, the system pressure with ERRS will rise instantly, and the values of S ij , O ij , and RPN are 7, 10, and 70, respectively. Here, a new designed accumulator is suggested to use, which could provide constant pressure during energy releasing process for reducing the instantaneous high pressure caused by energy released from HA to the hydraulic system. e main cause for the abrasion is that unreasonably kinematic pair clearances between the cylinder and the piston are caused by lower assembling accuracy. e values of S ij , O ij , and RPN are 7, 9, and 63, respectively. Improving assembling accuracy and strengthening the final inspection are necessary for modifying this kind of failure. e main cause for the creep is that hydraulic oil pollution causes the stagnation of the valve spool. e values of S ij , O ij , and RPN are 8, 10, and 80, respectively; if the cleanliness of the hydraulic oil is checked at regular intervals and timely replaced, this kind of failure mode will be better modified. And the proposed maintenance policies can help improve availability for HC based on the countermeasures, as is shown in Table 9. Besides, the countermeasures are listed corresponding to the main failure causes of the new HC failure modes, which is done to reduce the occurrence of failures after the measures are embodied on the machine. To some extent, reducing the occurrence of failures can help to improve the reliability of the part.

Conclusions.
e maintenance strategy is proposed for HC from HE with ERRS based on reliability analysis and FMEA in this paper. It is easy and effective for engineers to apply in actual production with lower cost. erefore, it is vital for some cost-sensitive industries. e maintenance decision is made according to the comparison between the reliability of HC of the same series with and without ERRS based on the historical failure data. In this work, we choose the reliability value at 300 hours for instance, which is the effective working hours for the new HC, because most HE with ERRS are less than 1000 working hours by the time of data collection, and the failure times are no more than 400 hours.
In this study, the reliability of the new HC from HE with ERRS R * (t) � 0.6768 < O ij R set � 0.86544, so the maintenance decision is made. And β * � 1.341 > 1; FMEA is taken as is shown in Table 8 based on β * . e countermeasures and maintenance policies for the failure mode of HC are proposed and listed corresponding to the main failure causes of the new HC failure modes, which is done to reduce the occurrence of failures after the measures are embodied on the machine. To some extent, reducing the occurrence of failures can help to improve the reliability of the part. And the proposed maintenance policies can help improve availability for HC based on the countermeasures, as is shown in Table 9.
e results show that the proposed approach is easy to operate. is is shown in a concrete case study about HC from HE with ERRS. e approach can also be used to make maintenance policies for other mechanical parts, if optimized.

Future Work.
A lot of literatures use the time between failures (TBF) or time to repair (TTR) to do reliability analysis, but in this study, the effective working hours are selected to do reliability analysis for HC, since we did not get the data about TTR from HE enterprises, and the failure data  about new HC was too few to calculate the TBF. If we were to get more data in the near future, the reliability comparative analysis about HC would be done based on the TBF, TTR, and effective working hours, respectively.
It is a long-term work to do the reliability improvement for the hydraulic system of HE with the ERRS. We will continue to track and collect more failure data of HC and other key parts of the hydraulic system of HE with ERRS and make our analysis more accurate.