Risk-based Reliability allocation methodology to set a maintenance pRioRity among system components : a case study in mining

This study aims to build up a maintenance priority methodology for system components with the help of existing literature on reliability allocation. The offered methodology was applied to two high-capacity earthmovers using actual datasets collected by operations in Tuncbilek Coal Mine, Turkey. Prioritization of maintenance for components was achieved by adapting their operational risk factors to a generic reliability allocation algorithm. In this sense, direct and indirect financial consequences of component failures were considered in estimation of risk severity factors where component reliabilities were assessed comprehensively with top-to-bottom evaluation to determine risk occurrence factors. This paper is the first initiative in component maintenance prioritization in the mining sector where machinery reliabilities have a vital importance in production. In addition, previous studies have generally used reliability allocation as weakness detection tool in design and development of their systems. In this basis, this paper utilizes reliability allocation in instantaneous measurement of component maintenance requirements during operation.


Introduction
The issue of reliability in production industries has become a greater concern in recent decades since the availability and performance of systems employed in production cycles are required to be at desired levels to satisfy short-to long-term production goals.Reliability supports improvement of system performances via revealing root-causes of failures, their occurrence frequencies, consequences of failures, and maintenance-critical components.In this sense, reliability allocation as an essential part of system reliability modelling helps to figure out reliability growth requirements of individual system elements for target system reliability at a specified time zone.In the literature, reliability allocation has mainly concentrated on understanding performance factors of systems in design and development stages.There are various studies to optimize early-stage reliability allocation for software [3,22,25,34,35], network systems [4,21,23], and other mechanical or electromechanical complex systems [1,2,13,17,18,31,33].In addition, more general methodologies have also been proposed to be applied in design and development of various systems.In this basis, equal apportionment technique, ARINC method, feasibility-of-objectives technique, and minimization of effort algorithm are extensively utilized conventional methods in reliability allocation [9].Equal apportionment technique assigns same goal reliabilities to components for target system reliability where ARINC method allocates reliabilities considering weight of component failure rates in system failure rate.Feasibility-of-objectives technique regards factors such as, system intricacy, state-of-the-art, performance time, and environment when determining allocation weights of components.The minimization of effort algorithm aims to minimize total effort for improving component reliabilities while ensuring target system reliability.Besides the conventional methods, Mettas [19] developed a flexible formulation for reliability allocation which bounds system components between minimum and maximum reliability values and optimizes allocation weights via minimizing the cost function according to the feasibility of component improvements.Kim et al. [14] proposed a new reliability allocation method to be utilized in the early development of mission-critical systems considering maximum severity of failure modes and their approximate failure occurrence rates.Thomas and Richard [27] considered warranty burden rate while generating the allocation method and they obtained target reliability values with a budget control.Sriramdas et al. [26] utilized fuzzy logic includ-sciENcE aNd tEchNology ing expert opinions for evaluation of reliability allocation factors in the early stages of engineering system designs and developments.In addition, reliability allocation for redundancy systems was discussed in detail by Elegbede et al. [9], Li and Zuo [16], and Yalaoui et al. [32].The main effort in reliability allocation literature was given to improving system reliabilities in pre-operation stages.Furthermore, these studies have commonly regarded cost factor as a financial response of reliability improvement in the design or testing stages.In this basis, the current research study uses both comprehensive system reliability modelling and reliability allocation to develop a maintenance assessment methodology that can be utilized by operation or maintenance managers to update their maintenance polices via detecting and investigating critical components during operations.Frequencies of failure modes and their resultant direct and indirect economic consequences were included in the study to evaluate risk levels of components and to decide their reliability allocation priorities.
In order to verify the developed model, the study methodology was applied to two draglines currently operating in an open-cast coal mine in Turkey.Draglines are extensively utilized in overburden stripping activities which are integral parts of open-cast coal mine productions.Overall productivity in these mines is substantially affected by availability and reliability of draglines.In the United States alone, almost half of the overburden stripping operations are performed by draglines with a bucket capacity more than 30 m 3 [10].They create an operational radius using their booms with a varying length between 37 and 128 meters [12].System functionalities such as hoisting, dragging, swing, and walking ensure continuous and cyclic operation of draglines.Draglines can provide an estimated overburden removal of 30-35 million m 3 , annually [5].Components in dragline subsystems have high functional and operational dependencies.Failures in these components can damage the operational sustainability and the resultant production losses can raise up to 1 million dollars per day [28].Although there are various research studies about the effect of failure breakdowns on dragline operability [6,24,29], componentbased maintenance prioritization has not been discussed in the literature.In this basis, the offered methodology aims to highlight the draglines components with the highest financial risk that should be maintained with priority for the reduction of failure-based breakdowns.In addition, there is not any observed study on application of the maintenance prioritization for other mining systems although mine production efficiency is directly affected by performance of machinery systems which are generally capital intensive and high capacity.Besides its contribution to mining, the methodology offered in the study can also be applied to other production systems when determining the maintenance priority levels among system components and measuring the required component reliability growths for various system reliability goals.Usage of reliability allocation in the study differs from the literature by adapting risk factors to estimation of the priority scores of components in an operating system.
The study methodology as seen in Figure 1 briefly entails (i) acquisition of repair and lifetime datasets and data classification according to available failure modes in target system, (ii) data independency and trend tests for repair and lifetime datasets, (iii) determination of lifetime and repair time characteristic parameters, (iv) determination of severity and occurrence factors to obtain Risk Priority Numbers (RPNs) for individual failure modes, (v) conversion of RPNs to main-tenance feasibility factors in reliability allocation algorithm, and (vi) identification of maintenance-priority components and the required reliability growths for target system reliability.In item (v), generic model of Mettas [19] was utilized in estimation of the feasibility factors.This model allows consideration of current and maximum achievable reliabilities of system components and prioritization of component reliability growths according to their feasibility factors using an exponential relation.Therefore, financial risk factors of failures can be practically adapted to the model considering these feasibility factors as priority factors.
Although the study methodology can be applicable for any production systems, this paper is structured considering a mining case study.In this sense, the methodology is discussed under 6 main sections.Following the introductory part, Section 1, Section 2 gives brief information on acquisition of the datasets, classification of failure modes, and data independency and trend tests applied for the datasets.Section 3 discusses the determination of lifetime (uptime) and repair time (downtime) characteristics of individual system compo- nents.Section 4 includes the estimations for severity and occurrence factors of failure modes, their resultant RPN values, and conversion of these values to maintenance feasibility (priority) factors.Detection of maintenance-priority components and their required reliability growths to support a specific target system reliability is discussed in Section 5 with a numerical example.Finally, Section 6 provides the main conclusions drawn from this study.

Pre-processing of repair time/lifetime datasets for individual failure modes
Reliability allocation analysis initially requires precise system decomposition to reveal primary failure zones and their statistical profiles.In this basis, maintenance catalogues, previous maintenance records at plants, expert opinions, and functional abilities of systems can be utilized to detect major failure-inducing components in systems, their occurrence modes, and their recovery conditions.In the current research, two working draglines were examined in details as a case study.Draglines perform stripping operation via dragging, hoisting, swinging, and dumping actions of its working components.The dragline initially creates an operational radius via its boom and locates its bucket suspended from the boom via throwing it away from the main frame.Overburden stripping is achieved with dragging the bucket toward the machinery house.Then, the filled bucket is hoisted and dumped to spoil area after a rotation around machinery's own axis.The dragline keeps its operation going with successive cycles of these stripping, hoisting, swinging, and dumping actions.Considering the functional and structural dependencies in dragline, the system was decomposed into seven main subsystems as hoisting, rigging, bucket, dragging, movement, machinery house, and boom.Operational and schematic views of a dragline are illustrated in Figure 2.
Draglines in the study are currently utilized in an open cast coal mine in Turkey.They work in conjunction with excavator-truck dispatching system to achieve overburden stripping of 60-65 million m 3 annually.These draglines hold bucket capacities of 20 yd 3 (15.3m 3 ) and 40 yd 3 (30.6 m 3 ) and they are referred as Dragline-1 and Dragline-2 in the study, respectively.They have an excavation ability of 20 and 35 meters in depth, respectively and complete a full stripping and dumping cycle in less than 60 seconds with an operational radius of more than 60 meters.
Following a 13-year investigation between 1998 and 2011 into the maintenance data sheets, Dragline-1 and Dragline-2 were observed to halt for 938 and 903 times due to failures which resulted in the breakdown of the system for 13,954 and 16,471 hours, respectively.Main failure-inducing components in the draglines were revealed regarding definitions in the maintenance records and interviews with dragline maintenance experts.These components and their common failure modes and repair types are given in Table I.The components holding different failure modes were separated using denominations of mode01 and mode02.Mode01 refers the failure conditions where component recovery can be provided with replacement alone.On the other hand, mode02 indicates that component dislocates from its hosting mechanism and it can be recovered without a complete replacement.According to these assumptions, a total of 30 failure modes was listed to be analyzed in this study.These failure modes will be referred in the analyses using abbreviations given in Table I.
Once the draglines were decomposed into the components, failure statistics were assigned to the individual failure modes.Contributions of subsystems to system failure number and breakdown durations can be examined using Pareto Charts as seen in Figure 3.These charts explicitly state that there is not a strict correlation between failure numbers and resultant breakdown durations of subsystems.For instance, dragging and rigging units cause short downtimes although they frequently fail compared to the other units.On the other hand, machinery housing for both draglines lead to the longest production losses when any functional interruption takes place in these subsystems.In this basis, 56 and 47 percent of the overall failure breakdowns are due to the failures in machinery house units of Dragline-1 and Dragline-2, respectively.
Datasets of each failure mode in the draglines cover sequential time-between-failures (TBF) and time-to-repair (TTR) values as a time series.Significant correlations between successive data or ascending/descending trend throughout datasets disrupt stationary data behavior.Although best-fit distributions are enough to evaluate reliability of components with stationary datasets, nonstationary datasets require the utilization of stochastic methods with ability of measuring this deviation.Therefore, examining correlation and trend of individual repair and lifetime datasets is significant for precise forecasting of system behavior in different time intervals.In this sense, Lag-1 plot and Pearson correlation coefficient were utilized to check data correlations in the study where hypothesis testing methods such as, Crow-AM-SAA and Laplace were used to investigate data trend behavior.
Lag-1 correlation plot helps qualitative evaluation of data randomness in a time series.The plot is generated using a scatter plot of (i−1) th value along the horizontal axis and (i) th value along the vertical axis for a dataset with sequential order.Serial correlation between successive data causes the scattered data to be in an identifiable pattern and reduces data randomness.Otherwise, data is scattered randomly without any specific pattern.In the study, the inquiry regarding serial correlation was also verified using Pearson correlation coefficients quantitatively.The tests validated that the datasets are free of serial correlation.One representative example for the correlation tests is illustrated in Figure 4.
In addition to data correlation, validity of any data trend also changes the assessment method for estimating repair time/lifetime parameters.In this basis, Crow-AMSAA and Laplace methods were utilized in the study to test whether any regular deterioration or growth rate is valid or not for repair and lifetime intervals of failure modes.Crow-AMSAA test accepts trend behavior of the repair time/ lifetime datasets if 2 where N is the total number of failures, β is the expected shape parameter, is the score of chi-square distribution, and 1−α is the confidence interval.β can be estimated using Equation 1 where T i is cumulative time-between-failures till the i th failure [30].
The Laplace method accepts data trend if where z b is the score of standardized normal distribution.Test parameter, U L can be calculated using Equation 2 [30]: Sample test scores for time-between-failures, i.e. lifetime, datasets of MH2 and MH3 components in the machinery houses and decision on data trends can be viewed in Table II.Trend behavior of Dragline-1 MH2 was validated via both Crow-AMSAA and Laplace tests.This trend can also be examined qualitatively in Figure 5 which illustrates ordered cumulative time between failures of the components.
The tests also showed that lifetime datasets coded with DR1, HO1, RI1, and MO1 for Dragline-1 and HO2, HO4, BU2, BU4, BU5, RI6, MH1, MH3, MO1, and MO3 for Dragline-2 hold trend behavior.In addition, there is not an observed significant trend in the repair time datasets.Effects of data trend decision on reliability assessment will be discussed in Section 3.

Lifetime and repair time characterization of failure modes
System reliability analysis requires precise identifications of functional dependencies between components.A dragline may operate only if its seven subsystems perform their functionalities properly.Since any failure of components in Table I leads to compulsory breakdown of draglines, components in the subsystems are connected to each other with series dependency.Therefore, subsystem and system reliability formulations can be generated as in Table III.
The reliability function R(t) can be derived using the cumulative failure function which is the integral of failure density function f (t), over a time interval (Equation 3).Reliability function is also called as survival function, gives probability of a component to operate properly at a time.
In this study, failure density functions of the components were estimated regarding trend behaviors of lifetime datasets discussed in Section 2. In this sense, general renewal process (GRP) was performed for trend-components while lifetime parameters of the other components were estimated via best-fit distributions of the time-between-failures.GRP offers flexible modelling of non-stationary datasets since the process allows estimation of renewal success between as good as new and as bad as old via assigning a restoration factor between 1 and 0, respectively.In addition, GRP assumes two separate cases in estimation of the restoration factors: i) Maintenance recovers the defects only between two failure points, called as Kijima-I model  and ii) Maintenance contributes to elimination of accumulated defect from the beginning of lifetime, called as Kijima-II mode [20].This study considers Kijima-II model in estimation of GRP models since maintenance works on dragline components recover accumulated damages proportionately.On the other hand, best-fit distributions assume the renewal of components to as good as new condition after maintenance and it makes sense to use them in the reliability assessment of non-trend components.In this study, lifetime parameters of the dragline components were estimated using Reliasoft Weibull++7 as seen Tables IV-V.
Tables IV-V indicate that the majority of components can be defined using a Weibull distribution in lifetime characterization.GRP with power law function (λβt β−1 )also uses similar descriptive parameters of Weibull distribution as well as a restoration factor [20].In addition, loglogistic, lognormal, normal, and exponential distribu-tions were also observed to be fitted in lifetime description of the components.The related failure density functions can be examined in Table VI.
Shape parameters of a Weibull distribution and GRP, β, identifies the slope of data behavior curve.The shape parameter less than one refers infant mortality in the mechanism and lifetime curve exhibits quasi-exponential behavior.If the shape parameter is higher than one, this condition points to potential wear-out problems in the functionality and lifetime curve is fitted in bell-shape.If the shape parameter is exactly one, then Weibull distribution turns to exponential distribution.Parameter η is the scale parameter referring to a specific time point where failure probability of the relevant component is fairly equal to 63.2 %.The last parameter, γ, identifies the start point of curve with respect to the origin.Positive γ values are also referred as failure-free time that denotes Parametric estimations in Tables IV-V determine uptime characterization of the operating draglines.On the other hand, repair durations, i.e. time-to-repair (TTR), of the components were also estimated to reveal production losses due to failures.In this sense, lognormal distribution was detected to be well fitted for TTR datasets (Table VII).Mean time-to-repair (MTTR) values showed that failures in generators (MH1), motors (MH2), and hoisting rope-mode01 (HO2) induced the longest downtimes for both draglines.However, rigging components were observed to be maintained in shorter periods compared to the other components.
This section estimated lifetime and repair time characteristics of the dragline components to form a basis for reliability allocation analysis.Effects of the downtime and the uptime behaviors on the allocation of component reliabilities for target system reliability are discussed in Section 4.

Setting component maintenance priorities in the reliability allocation model using risk factors
This study utilizes reliability allocation in development of effective maintenance policies via determining maintenance-critical com- where ( ) sciENcE aNd tEchNology ponents and their required reliability increments to sustain system functionality at the intended level.In the study, a generic reliability allocation model [19] was selected to evaluate these reliability improvement rates.The model given in Equations 4-7 allocates reliability values regarding both improvement convenience and lifetime characteristics of components within a system.The algorithm aims to minimize the cost of improving component reliability while taking account of the goal system reliability.The cost parameter, c i (R i ), in Equation 4 is dimensionless and it rates the difficulty to raise the i th component reliability from its current value to R i .At the constraints, R s and R G are current and target system reliabilities at time t, respectively.Moreover, R i,min and R i,max refer the current (minimum) and maximum achievable reliabilities of i th component at time t, respectively.Utilization of R i,max and R i,max in the algorithm restricts unit improvement of higher-reliability components compared to the lowerreliability one, realistically.The last parameter, f i , is the feasibility parameter, which originally implies the convenience of component for reliability improvement in the system development stage and takes a comparative value between 0.01 and 0.99.This parameter is actually a priority-setting factor among the system elements when allocating the reliability values.In the study, the algorithm was forced to allocate reliabilities considering maintenance priority of components via evaluating the feasibility factors with severity of failure modes and their occurrence frequencies: Subject to: Expected emergence rates of failure modes (occurrence) and their financial consequences (severity) were specified as main determinants in the operating system when designating maintenance criticalities among the components.In this basis, risk priority numbers (RPNs) were utilized in the study to estimate feasibility factors for each failure modes (Equations 8-9): In Equation 8, S i and O i are severity and occurrence factors of the i th failure mode and take comparative rankings between 1 and 10.Each feasibility factor f i , is calculated via proportioning i th RPN with the highest RPN in the system where 0.99 is the maximum achievable feasibility factor value.The severity factor is generally estimated subjectively and considers one or many issues such as, safety risks, environmental hazards, production losses, and damage of corporate image in case of failures.If failure records are available, repair times of each failure mode can also be utilized to score the severity of failures [15].This study evaluates severity factor considering economic consequences of failures since the cost is an effective and rational measure of failure severity in a system.In production industries, economic consequences of failures can be measured including direct and indirect costs (Equation 10).Direct cost is a physical consequence of a failure where production loss due to downtime can be considered as indirect cost.In this sense, Equation 11gives the estimated production loss of a dragline due to failures and it can also be utilized for any earthmover with bucket production.Indirect cost formulation uses mean time-to-repair values of failure modes (MTTR i ), bucket volume (V bucket ), fill factor (F), swell factor (S), cycle time (T cycle ), efficiency of operator (η operator ), and profit per  Following failure cost estimations, the severity factor of each failure mode can be rated using a severity ranking table particular to the system.In this basis, Table VIII was offered to be used in severity evaluation of dragline failures.This table can be modified according to economic aspects of the related system to be analyzed.
The frequency of failures can be measured using mean time-between-failures (MTBF) of each failure modes given in Tables IV-V and included in RPN calculations as occurrence factors.Table IX illustrates ranking values for expected failure rates (1/MTBF).These scores were specified by Department of Army of The USA [7] to be utilized in failure modes and effects analysis of systems.
Findings about severity and occurrence factors of the dragline failure modes were converted to the feasibility factors using Equation 9as given in Table X.Generators (MH1) and motors (MH2) in machinery house units and walking and rotation mechanisms in movement units were detected to hold the highest priority in reliability allocation for both draglines.Majority of the mechanical components are with feasibility factors less than 0.30 and this value raises according to the complexity of component.
Once the component lifetime characteristics in Tables IV-V and feasibility factors in Table X are obtained, then the reliability allocation algorithm in Equations 4-7 can be applied for target system reliabilities.Decision on target reliability value should be specified realistically considering spare part policy and crew condition.Moreover, reliability improvement in system should not develop any conflict in production plans.In case that the conditions are suitable for improving system reliability, maintenance policy can be modified regarding reliability allocation results.In order to validate success of the modified policy, system is required to be monitored for a specified period and system reliability should be assessed using up-to-date datasets.The applied policy can survive if the reliability assessment results prove success and validity of the policy.If not, target value for system reliability can be modified considering the short-comings of the recent policy.Sustainable maintenance policy using reliability allocation can be developed as given in Figure 6.
The following section, Section 5, presents a numerical example of reliability allocation for the draglines to achieve a specific target system reliability.

A numerical example: allocated reliability values for a target system reliability of 60% at 24 th operating hour
As discussed in Section 4, target system reliability for a pre-defined time interval should be specified by decision-maker considering the conditions such as, production rate, spare part inventory, and suitability of maintenance crew.Once these pre-conditions are satisfied, then the target system reliability can be specified.Success of the policy requires a long-term observation period about the suitability

Conclusions
Production industries necessitate the implementation of effective maintenance policies to ensure their production goals.Development of a risk-based maintenance strategy via identifying and characterizing failures modes and their effects on system functionality may help to prioritize mechanism problems required to be fixed.This study gathers both system reliability assessment and reliability allocation to reveal the required reliability improvements of components for target system reliability.In this sense, two active draglines were selected to be analyzed as a case study.The reliability assessment section covered data correlation and trend tests and resultant evaluation assumptions.Reliability allocation analysis was performed using a cost minimization algorithm which considers both reliabilities and failure risk evaluations of individual components.Severity and occurrence of the failure modes were included in the risk evaluation.Effect of risk evaluation on reliability allocation scores and resultant allocation values were criticized with a numeric example for an observation period of 24 hours.In this basis, the current survival probabilities of the draglines are detected to be 43-44% at the end of 24 th operating hour.If the manager decides to upgrade the maintenance policy which can ensure a system reliability with 60% for this operating period, the policy should focus more on rotation and motors for Dragline-1 and rigging pulley-mode01 and generators for Dragline-2.In addition, this policy should provide a reliability growth of 9.08, 8.32, 3.95, and 3.32 % for those components, respectively.The study methodology can be applied to any system with a reliability importance.Maintenance authorities at production plants can define their target system reliability values considering their crew, spare part, and production conditions and rearrange the framework of maintenance policies regarding the methodology of this study.of the decided target reliability for the system itself.Following this observation period, if the target system reliability cannot be achieved by the modified maintenance policy, this decision can be reviewed again as discussed in Figure 6.In this numerical example, observation time is selected as 24 operating hours.Using component dependencies in Table III and lifetime parameters in Tables IV-V, Dragline-1 and Dragline-2 were observed to have 43 % and 44% reliabilities at 24 th operating hours.Target reliability, R G , for both draglines was assumed to be 60% at the end of this operating period.Therefore, it is required to allocate component reliabilities to ensure a system reliability growth with 16-17% for both draglines.Maximum (R i,max ) and (R i,min ) minimum i th reliabilities of component were identified as 99.99% and actual component reliabilities at 24 th operating hour, respectively.Reliability allocation was carried out using the estimated feasibility factors in Table X and a constant feasibility factor to reveal the effect of risk assessment.The resultant allocated reliabilities (R i ) according to the formulations in Equations 4-7 can be investigated in Table XI.
Table XI illustrates the components with a priority of reliability growth and the resultant increase rates to satisfy a dragline reliability of 60% at the end of 24 hours operating period.The allocation results regarding the feasibility factors in Table X reveal that motors (MH2), rotation mechanism (MO1), bucket body (BU1), and rigging pulley-mode02 (RI6) for Dragline-1 and rigging pulley-mode01 (RI5), generators (MH1), rotation mechanism (MO1), and walking mechanism (MO2) for Dragline-2 require the highest reliability improvement.In this basis, the modified policy should satisfy a reliability increase with 9.08 and 8.32 % for rotation mechanism (MO1) and motors (MH2) of Dragline-1 where it should be 3.95 and 3.32 % at least for rigging pulley-mode01 (RI5) and generators (MH1) of Dragline-2, respectively.For the given feasibility factor of 0.5, target components and their reliability growth values were observed to differ compared to the values with actual feasibility factors.It shows that reliability allocation without any risk evaluation can cause development of maintenance policies with misleading decisions.Such a policy may not be efficient

11 ,
MTTR values covers total of time required to detect, repair, and inspect for failures:

Fig. 6 .
Fig.6.Methodology of sustainable maintenance policy cycle using reliability allocation.

Table I .
Common failure modes and maintenance types of dragline components.

Table II .
Data trend tests for sample lifetime datasets of motor and lubrication components.

Table IV .
Lifetime Parameters of Dragline-1 Components Not iid: Not identically and independently distributed

Table III .
Reliability equations of the dragline subsystems and the main system.
a particular time where failure probability is zero prior to it.In addition, logarithmic and loglogistic distributions use logarithmic state of mean and standard deviation in expressions.Another distribution type, exponential, always holds a continuous descending distribution plot indicating the accumulation of data near a starting point, i.e. early failures.Failure rate (λ) of an exponential distribution always remains constant.In a 2-parameters exponential distribution, γ is also utilized to refer the presence of failure free time.

Table V .
Lifetime Parameters of Dragline-2 Components Not iid: Not identically and independently distributedTable VI.Descriptions of some common distributions and general renewal process.

Table VII .
Time-to-repair (TTR) characteristics of the dragline failure modes.

Table VIII .
Severity scores for dragline failure modes.

Table IX .
Ranking scores of RPN occurrence factor.

Table X .
Feasibility parameter calculations using risk priority numbers.

Table XI .
Reliability allocation of dragline components for target reliabilities of 60%.