A multinomial process tree for reliability assessment of machinery in autonomous ships

Maritime Autonomous Surface Ships have received a significant amount of attention in recent projects. They promise a reduction in marine accidents and mitigation of human errors. Most of the ongoing research effort is directed toward autonomous navigation and cybersecurity. However, the importance of a machinery plant in the engine room that can operate reliably without human attendance is hardly investigated. To prevent failures in such systems and extend the interval between required human interventions, it is essential to improve their reliability. This paper aims to present a systematic approach to evaluate the reliability of an autonomous system under the influence of uncertain disruptions and to predict failure rates of unattended machinery plants. A Multinomial Process Tree is used to model failures in the main failure-sensitive components. Hierarchical Bayesian Inference is adopted to facilitate the prediction of frequencies of disruptive events and estimate the entire system ’ s failure rate. The outcome of this research enables design strategies to improve the reliability of autonomous ships and prevent Fatal Technical Failure during the operation. This allows assessing whether a given machinery plant is sufficiently reliable to be used on unmanned ships. A case study is considered to demonstrate the application of the presented method.


Introduction
The concept of autonomous shipping has peaked considerable interest in recent years due to the potential of reducing operational cost, removing the difficulties to hire personnel on board, and reducing the number of human error-induced incidents in marine transportation.According to the safety and shipping review presented by Allianz, between 75% and 96% of marine maritime accidents are caused by human errors [1].This is mainly due to human exhaustion, the complexity of managing operational activities, and making complicated decisions based on the overall performance of the systems [2].Autonomous ships should enable a reduction of the number of crew members and facilitate human decision-makers by providing reliable operational planning [3].Examples of relevant projects include the Maritime Unmanned Navigation through Intelligence in Networks (MUNIN) project [6], the Advanced Autonomous Waterborne Applications Initiative (AAWA) project [4], the recently initiated AUTOSHIP project [11] and the NOVIMAR project [12] to develop the concept of a vessel train consisting of a manned lead ship and following ships with reduced crew.DNV GL has participated in several projects revolving around ship automation and autonomous control.The ReVolt project is one example of an autonomous ship designed as a proof of this concept [5].The project investigates a wide array of aspects relevant to commercial unmanned shipping from technical development to safety, legal and economic aspects as well as societal acceptance [4].These existing research projects are mainly focused on investigating advanced control systems [5][6][7], navigation, and communication [8][9][10] in autonomous ships.These projects provide several statements regarding the importance of researching predictive maintenance of unattended machinery and evaluating the performance of unmanned system, but they did not develop any predictive models to address these challenges.The ship systems contain many components that still require the hard-to-replace skills and expertise of experts.The higher dependencies of machinery on humans complicate the maintenance and repair of broken equipment in unmanned ships.By the absence of onboard experts, the operation will be susceptible to emerging risks, which will significantly impact the unattended machinery's performance [4].These unanticipated events will negatively impact the assets' acceptability, thus hindering the widespread deployment of smart operations in future maritime trade.
The AAWA project states that onboard systems of an autonomous ship "need to be resilient to failure and extend maintenance intervals" [4].The findings of MUNIN include that "Current preventative maintenance procedures should be able to ensure availability of unattended components during the voyage [8].According to the MUNIN findings, the power plant, propulsion system, and its auxiliaries are the most vital systems onboard in ensuring the completion of autonomous missions that may extend to more than a month.A systematic approach is needed to provide sufficient information about an unmanned engine room's availability and to reliably perform independent missions for 500 hours without human intervention [6,15].As stated by [12], the major challenge is that the experience with autonomous ships is very limited to evaluate the performance of unattended machinery plants (UMP) the same as a manned system.Different studies have elaborated the safety and risk assessment challenges that autonomous ships will face.A system-theoretic process analysis (STPA) was recently proposed for hazard identification of UMPs in maritime transportation [18][19][20].However, the STPA is mainly a qualitative-based approach that cannot model the operational uncertainty in an integrated framework.In [23] a supervisory risk control is developed for autonomous systems to integrate STPA and Bayesian Belief Network to quantify the outcomes of STPA, though the demonstration of the model is limited to constructing uncertainty network.Some of the research is devoted to evaluating the risks for the general concept of autonomous vessels, identifying concrete challenging aspects for the execution of operations, and preventing collision accidents [11,13,14,[21][22][23][24].
The above is a selection of recent works investigating the concern of reliability engineering and safety issues in autonomous shipping and remotely operating system.However, most of the research is mainly focused on investigating advanced control systems, navigation, and communication in the application of autonomous ships.The high dependency of machinery on humans complicates the maintenance and repair of broken equipment in unmanned ships, but the challenge of ensuring sufficient reliability of this equipment is hardly addressed, making it an important knowledge gap in the development of autonomous ships.In the present study, a framework is proposed to enable a predictive tool for assessing the reliability of unattended machinery plants and estimating the trusted period of the unattended system.As an example, in works conducted by [14-16, 23, 33-36], different systematic and theoretical approaches are developed to identify the hazards in autonomous ships, but the y do not provide a solution for estimating the hazards rate in unmanned system.However, if the industry intends to convert the existing ships to unmanned vessel (such as AUTOSHIP Project [11] and NOVIMAR Project [12]), it is then essential to develop a quantitative model that can predict the performance of unattended machinery subject to operational uncertainty.The industry needs to know how long they can confidently leave out their system without human interventions and what the hazard rate of the unattended machinery to observe critical failures is.The answer to these questions is the main concern of present paper.The outcome of this study will help the industry with a quantitative representation of how to design or reconfigure their current system in a way provides confidence that a mission can reliably be executed without human attendance.This article provides a method to predict how long the machinery plant can operate without human intervention before reaching a reliability threshold for the entire system.The presented framework is able to evaluate trustworthiness of unattended engine room under different operational scenarios and predict the critical disturbances in an unmanned system that can put the operation in major risk of disruption.
To this end, this research aims at demonstrating a proof-of-concept of a framework to estimate the reliability of UMPs in autonomous ships.To make sure that the machinery plant operates reliably without human intervention during a voyage, the developed framework will address the modeling of random failures in systems and the prediction of critical components' reliability.The objective is then to understand how the frequency of disruptive events of critical components will change without human attendance and predict the entire system hazard rate.To overcome the lack of reliable failure data and anticipate a system's failure; an integrated probability model is designed based on a Multinomial Process Tree (MPT) and Hierarchical Bayesian Model (HBM).As stated by [3], this lack of reliable data is the "Achilles heel" of every autonomous system due to the large uncertainties that are involved in operation before a failure happens.The lack of informative historical data occurs because traditional ships are often designed for a specific application and produced in limited series, which creates a slow accumulation of relevant failure data compared to other industries [3].The model is able to consider uncertain factors in the operation that can lead to major failures in the system.It can also incorporate the scarcity of event data in MASS operation due to lack of historical data.The advantage of the model is demonstrated by predicting the period of safe operation and estimating the hazard rate function of UMP.To indicate the application of the framework, the ship's main engine is considered as the case study, although the method is not limited to this area, and the approach can be used in other fields to estimate the reliability of complex autonomous systems.

Problem definition and Solution Strategy
The focus of this study is on the machinery plant used in merchant ships.An overview of the machinery of the power plant analyzed in this paper is illustrated in Fig. 1.The engine itself consists of many parts, however, not all the components will be weak points if they are left unattended.Therefore, a criterion should be established to identify weak points in unattended engine rooms and then build a reliability model to estimate the frequency of disruptive events and related failure rates.Therefore, the frequency of requested maintenance activities will play an important role in perceiving the health state of unattended machinery.
Event-data are recognized as a source of failure data representing the time, place, and reason that a particular event occurred [3,10].The important thing with event-data is that it should always include the type of action performed as a part of the asset management strategy, such as the number of repairs of an item [7,[14][15][16].The amount of time an engineer spends on specific equipment will give a good indication of a system's health.As recommended by [3], event data will help predict failures in the design, especially if there is not much informative historical data for autonomous ships [15,16,17].In general, crew tasks can be divided into two major categories that can be used to identify and analyze weak points in the machinery plant: • Maintaining equipment, i.e., performing planned maintenance This action represents performing maintenance of equipment according to the planning.Therefore, the frequency of 'Maintenance' will be used here as a conservative indication of non-critical failure in the system.This assumption implies that maintenance intervals are timed in such a way that maintenance is executed shortly before minor malfunctions may occur.Due to a lack of run-to-failure data, this is the best available estimate.
• Repairing equipment, i.e., performing unplanned maintenance This action represents performing major repairs before maintenance is planned.Therefore, the frequency of 'Repair' will be used here to indicate a critical failure in the system which leads to immediate stopping of the operation.
The observation process structure for conducting the prediction will then be established according to the frequency of maintenance and repair activities observed for each component.To better represent the solution for the problem, an illustrative example is shown in Fig. 2. Let's assume that from prior observations, it has been discovered that "Attached Pump" is the failure sensitive component.Now, the goal is to monitor system performance that its nominal condition changes to critical failure.For this example, two main paths stem from the top node (i.e., nominal condition) and end at the terminal nodes representing relevant action (coloured circles).The green circle is defined as "Continuing the operation since the system is performing according to the specs".The yellow circle is defined as "Performing non-critical maintenance without an urgent stop of the operation".The red circle is defined as "Do major repair and halt the operation as soon as possible".All individual branches in the tree have a chance of occurrence.The issue is that not only the branch probabilities of a and b are unknown, but also the resulting probability functions are always nonlinear.These challenges can be addressed through a model that quantifies the uncertainty in which the outcome of each action is characterized in a categorical distribution [8,14].A Multinomial Process Tree (MPT) is a right choice for modelling the categorical problem, and a Hierarchical Bayesian Model (HBM) is a strong predicting tool to estimate the uncertainty of random variables.The details for adopting these two models for confronting the problems will be elaborated in Section 3.

MPT Model
An MPT model is built out of a set of j>1 mutually exclusive and exhaustive observable categories, C = {C 1 , C 2 , …C j }, and a set θ of S latent parameters arrayed in a vector θ = {θ 1 , θ 2 , …θ s }representing for the tree's branches [13].Each parameterθ s represents the probability of the occurrence, and (1 − θ s ) the non-occurrence of latent events.The processing tree consists of a single path and a collection of processing branches, each terminating in a particular response category [13,48].In general, an MPT model can haveI j paths (denoted by B ij , i = 1, 2, …I j ) leading to category C j .Each branch has a probability of occurrence that proceed to a new condition for the system, and these probabilities are required to satisfy a specific functional form underlying parameters [13] that lead to a particular form for the branch probabilities, given by Where B ij is the ith branch leading to category C j , while c ij is always a positive number, a ijs and b ijs are nonnegative integers [12,13].Once the tree structure of the model has been established, the category probabilities are given by WhereI j is the number of branches terminating in category j, and j = 1, 2, …J.The essential conditions for the final probabilistic structure of the model need to satisfy that ∑ j P(C j , θ) = 1, for all processing branchesB ij , which allows each parameter to vary independently between [0,1].

Methodology
In the present study, a method is developed to adopt HBM as a prediction tool for modeling uncertainty involved in an MPT to estimate the probability of failure for unattended systems.The model serves as a frame of reference to predict failures of unattended machinery while there is no support from human intervention.This will enable evaluation of system availability connected with random failures, thus predicting how long a system can operate without calling for experts' attendance.To this end, a dynamic reliability model is constructed to react as a function of failure that is capable of updating itself based on receiving feedback from new observed conditions (See Fig. 3).The classic fault tree analysis is a static approach dependent on an initial condition for performing reliability assessment.Static techniques normally refer to open-loop methods since they cannot update themselves if new observations become available [18-22, 31,33].In static approaches, failure rates cannot be tracked over time, and they are considered as a particular type of failure that indicates the reliability of system behavior.However, by presenting a dynamic model, the failures can be categorized as unwanted events or disturbances.The approach will be an assistive tool for future risk mitigation strategies and provide adequate resources to prevent disruptions through the system.
Accordingly, the framework consists of four main parts; (Step 1) Consist of three parts to determine the failure of sensitive components for the application of unmanned operation.The number of sensitive components (n), subdivision of human activities (C) and the number of observations (K) for each human activity will be assigned accordingly.(Step 2) Consists of three parts for constructing the process tree for failure modeling of the system behavior.The unknown parameters (θ) for each branch (B) in the tree will be defined to model branch proba-bilitiesP(B,θ).The accumulation of all branches for each specific path in trees will result in categorical failure probabilities P(C, θ).(Step 3) Consist of four parts to construct the hierarchical Bayesian model that is used to run MCMC and predict unknown parameters (θ) in the MPT.This step will result in estimating relevant posterior distribution functions for the categorical failures.Finally, (Step 4) Consist of one part for simulating the model and predicting failure events in the UMP operation.
The frequencies of human activities that serve as a crucial input for the model are defined using a frequency index.Recently, [32] and [7] created a Frequency Index (FI) that is used in the risk and reliability assessment of autonomous ships.The index is sorted in the frequency of an event per year per ship [32].The FI will be used here to set up the reliability strategies and assist in the identification of the failure-sensitive components in the system.The range of FI is from 1 to 5, where 1 indicates a low frequency and 5 indicates a high frequency as defined by [7].Table 1 shows the definition of each FI for three actions based on the number of occurrences per ship per year.To complete the definition, Table 2 describes how these frequencies relate to the frequency of activities as defined by the ship's engineers' actions.
For example, the failure-sensitive components of the main engine included in the assessment are selected according to their risk index.This is done by processing data acquired from several expert engineers by Colon [7].The summary of the results is presented in Table 3. Subsequently, the components are categorized from high risk to low risk, and the components with the highest risk index are included in the analysis.
The concern is to estimate the failure events in a group of sensitive components in the system and predict the safe operation time that they can operate without human intervention needs.To do this, the MPT model is constructed to simulate the behavior of the system by addressing the categorial actions (i.e., planned maintenance or repair) for the unattended failure-sensitive components.To provide confidence in how this was done, the remainder of this section is dedicated to a detailed discussion of the model.
The general overview of the MPT model for predicting the functional capacity of the system is shown in Fig. 4a.The unknown parametersθ = {θ 1 ,θ 2 ,...,θ i ,θ i+1 ,...}represent the probability of each processing branch P(B ij ,θ)that will result in proceeding to the probability of a categoryP(C, θ).The action categories C = {C 1 , C 2 , ..., C k } are a nonlinear function that stands for the behavior of the system.As illustrated, in the MPT model, the right-hand side of the terminal categories of set C represents critical failures, and the rest of the set describes non-critical failures, and the C k represents the condition that the operation can continue without any interruptions.The collected event-data from experts, i.e., frequencies of maintenance and repair, are fed to the MPT model.To model the uncertainty, the associate directed acyclic graph of the Bayesian Inference network for uncertainty modeling of the process is depicted in Fig. 4b.In the figure, the circles show the uncertain parameters, double circles represent the uncertain categorial failure function and the square is related to the deterministic numbers.The probability model consists of two main steps (3) and (4) as illustrated in Fig. 3; firstly, predicting the failure by running the MCMC package for predicting unknown parameters from available observation event data, and secondly running Monte Carlo simulation for predicting the frequency of required actions (e.g., maintenance or repair) based on the posterior distribution derived from Bayesian Inference.Generally, a system may need a k number of activities to keep its components in a reliable condition.Therefore, the observation parameter will be a set of frequencies K n,k for each action connected with the related critical component.Parameter n is defined as the total number of critical components.Each set of frequencies will describe the outcome of the required action as C jk for a particular component j=1,…,n, and k th observation.Node C is the nonlinear function for predicting the probability of categorial actions P(C, θ), whileθis the unknown and unobservable parameters in the MPT.Then Monte Carlo Simulation will set for N trials to populate the posterior distribution obtained from MCMC for estimating failure events in each consecutive trial.Then the updated frequency of actions in each trial will be stored in a set of M (t) n,k , where t = 1,...,N.The parameter definition and the structure of the observations for the proposed model are shown in Table 4.

Table 3
FI, SI and RI of the main engine according to the expert crews' actions (taken from [7]).

Maine Engine
Frequency  The first step in Bayesian Inference is to determine prior distributions of unknown parameters of process branches θ.Due to a shortage of engineering data and physical information regarding the entire process's details, the best choice is to adopt a model with non-informative prior as recommended by [18][19][20][21][22][23][24][25][26][41][42][43][44][45][46].The non-informative prior is a type of data that can be directly derived from sampling distributions.Based on the suggestion of previously conducted research such as [27][28][29], the inference starts with non-informative uniform priors that allow the process free of bias.Bayes suggested that when nothing is known aboutθin advance, let the prior be a uniform distribution [36].There is no informative prior knowledge for branch probabilitiesθin the MPT model in the present study, i.e., the model is not supported with either physical and engineering information or expert's judgments and historical data under the same or similar circumstances.The only available practical knowledge is the frequency of terminal nodes in the MPT, which is related to the final categories for critical or non-critical failures.Setting a non-informative uniform prior, means that the prior probability of occurrence for each unknown parameter θis equally distributed in the tree [30].More importantly, when used for Bayesian updating, a non-informative prior does not strongly influence its posterior distribution.This forces the Bayesian update to entirely depend on observation data [37].Therefore, the predicted posterior distribution will consider the uncertainty in limited available observation data and accurately reflect their true nature [37].The uniform distribution, Jeffrey's prior, diffuse gamma, and diffuse normal distribution are the typical choice of non-informative distribution for hyper-parameter suggested by previous researches [38,43,47].According to the suggestion made by [29][30][31], the Beta(α, β) prior distribution used with α = 1 and β = 1 adopted, that simply represents uniform distribution.The uniform distribution is more suitable in inference solutions regarding employing external evidence through multinomial distribution.Assuming independent branches, the probability for a category C k is given by the summation of branch probabilities terminating to its category [39], using Eq. ( 2) and Bayes' Theorem: In the case of categorical data, the multinomial distribution is the most general and neutral statistical distribution, which is a neutral generalization of the binomial distribution to more than two categories.In the multinomial distribution, observations are independent and identically distributed (iid) over the categories and each category has a parameter representing the probability that a random observation falls into it [13].Therefore, the MPT will express the probability parameters as functions of the system behavior for different circumstances and re-parametrize the multinomial distribution for an objective situation.Each branch of the tree represents a different hypothesized sequence of the operation's processing stages, resulting in a specific response category regarding obtained knowledge over the system.The model's likelihood function for the occurrence of disruptive events is simulated by inserting these category probabilities into the density function of multinomial distributions [40], given by Finally, the marginalized posterior distribution P(θ) will be derived using Bayes' Theorem.To perform Bayesian Inference, the open-source MCMC WinBugs software Package [29] is employed and predict marginal posterior distributions.Consequently, considering expected posterior probabilities for each category, the number of occurrences of different actions will be defined in a specific period for a total number of N consecutive intervals.For the category k th a set of M n,k can be created as shown in Table 4 where m (t)  jk is the number of occurrences for an event in each interval [37]; i.e... performing m jk numbers of action, observation type k for the component type j in t th trails.Therefore, the probability of occurrence for the frequency of m (t) jk in the trial step t will be derived from the categorial probability distribution given by P ( m ) where Where the sum of the number of categorical occurrences ij for the whole simulation will end to N × noutcomes, no matter if the failure event happens or not.By setting up an objective Critical Failure Limit (CFL) according to the expected probability of critical and non-critical failures that perceived from predicted m actions, the system can be evaluated whether it passes the threshold for sufficient reliability.This will allow for estimating the target hazard rate function, which is essential in identifying the reliable period of UMP's operation according to the standards of autonomous shipping, i.e. [32].The promising framework will be demonstrated given a case study in the ensuing sub-sections.

Setup of the case study
To demonstrate the application of the framework, the failuresensitive components of the Main Engine (ME) are considered as the case study.Table 3 showed which components are included in the analysis.The data is collected from short-sea ships in European waters.To model the MPT for evaluating the performance of a system, two scenarios are considered in this study.For this reason, top roots in the two scenarios are considered as "Functional capacity of the entire system" and "Functional capacity of individual components" respectively.
The MPT for the entire system is constructed based on the required expert actions for performing Repair (C 1 ) and Maintenance (C 2 ) that categorized as C = [C 1 , C 2 , C 3 ], while category C 3 represents a safe operation.The branch probabilities for deviating operation from normal condition to critical failure are labelled as θ 1 →θ 2 : C 1 , the branches for observing non-critical failure in the system are labelled as θ 1 →(1 − θ 2 ) : C 2 and (1 − θ 1 )→θ 3 : C 2 respectively.The final path for continuing the operation (1 − θ 1 ) × (1 − θ 3 ) : C 3 is labelled as safe operation.Therefore, the only categorial functions P(C 1 ) and P(C 2 ) will contribute to predicting failures (i.e., critical and non-critical conditions) in the entire system.The developed MPT model illustrated in Fig. 5 and associated categorical functions are given by: Table 4 Parameter definition for adopting in the proposed model.

Prior Observation for Frequency Matrix
Posterior Frequency of Categorial Actions in Trial 1 Posterior Frequency of Categorial Actions in Trial N The MPT model for the individual components is constructed according to the fact that none of the sensitive components should be disrupted when left unattended.Components should be reliable enough that the chances for requesting either "Repair" or "Maintenance" below a given threshold during the operation.Therefore, the observation process is performed based on the accumulated frequencies of repair and maintenance of the components.This will help to consider the system performance for reliable design and minimize the risk of system failure due to unexpected disruptions.The related MPT model illustrated in Fig. 6, and associated categorical functions are given by: P(C i : Repair or Maintenance) = θ i × θ i+9 where i = 1, ..., 8 Where the categorical failures are selected as C = [C 1 , …C 9 , C 10 ]according to the sensitive components listed in Table 3.The arrays C 1 , …C 9 account for the failure in the items from "Cylinder cover" to "Turning gear and tuning wheel" respectively, categorized as immediate Repair or Maintenance, while C 10 is the safe condition where the components can continue the operation.The branches' probability of deviating from its nominal condition is labelled as θ i →θ i+9 : C i where i = 1,..., 9; and the branches for staying in nominal condition are labelled as The observations are entered into the HBM to develop likelihood functions and estimate random variables' posterior distributions.Two chains are considered in the MCMC simulation using WinBUGS to calculate the probabilities of processing branches θ.Each simulation is performed with a total of 300 E+03 iterations to predict the posterior distributions.It should be noted that θ is an operational predictive parameter for the system that helps to model the associate uncertainty in the operation and stand for predicting the behaviour of the system that deviates from its nominal condition; i.e. observing the conditions that lead to either repair or maintenance.To show the uncertainty involved in the frequency of actions for performing repair and maintenance a kernel distribution is represented as the probability density function (pdf) of random variables.The kernel distribution is useful when a parametric distribution cannot properly describe data, or when it is intended to avoid making assumptions about the distribution of data [49].

Results and discussion
Using the model and case study setup that discussed in the previous section, it can now analyse the reliability of the investigated main engine.The required actions of the sensitive components in the entire system are plotted in Fig. 7.The graph shows the probability density functions for the maintenance and repair of the system.It reveals the importance of maintenance compared to repair to prevent disruption in the operation, since repair is located more to the left side of the figure.The more significant the planning for preventive maintenance of sensitive components is the smaller requests for repair actions.The safe scenario for the machinery's overall performance is when the number of required repair and maintenance actions moves toward the left tail (i.e., low frequency of occurrence) to minimize the risk of operational downtime.From the graph, it can be concluded that the most probable frequency index for occurrence of Repair is estimated as two, which means the chance of critical failure in the system is anticipated to be remote, i.e., approximate once in the ship's lifetime.However, the average frequency index of a failure that requires maintenance is four.Referring to Tables 1 and 2, this means that the considered main engine MAN B&W K98MC-C7-TII has the highest chance to require planned maintenance for non-critical failures at least ten times per year, but there is a significant chance that such failures will occur more frequently.By looking at the collected data from [7], the cylinder cover and maneuvering system are more likely to observe a higher rate of critical failure events.In contrast, the clutch system, gearbox, and stern tube seal cover have needs for more frequent planned maintenance and repair.
The predicted random process for the frequency of disruptionsm(t)(i.e., events leading to required Repair and Maintenance for all sensitive components) is plotted in Fig. 8.The plots demonstrate the stochastic behavior of disruptions.The results plotted in Fig. 8 are a cumulative number of repair and maintenance activities per day, i.e., critical and non-critical failures in the sensitive components of the main engine.It should be noted that the plot represents the number of failure events that are derived from the multinomial process.For instance, the "stern tube seal cover" has the average number of failure events equal two (E[m 3 (t)] = 2), the categorial probability of the occurrence for this failure event for the whole simulation isP(C 3 ) = 9.693%, while the average daily failure occurrence is estimated as Attached pumps m 7 (t) and Clutch m 6 (t) are also prone to interrupt the process with the average daily probability of P(m 7 1e − 4respectively; while the categorial failure probabilities over the whole simulation for these two components are estimated as P(C 7 ) = 7.52%and P(C 6 ) = 7.930%.To better explain of the number of occurrences, the random set of outcomes predicted demonstrates the uncertainty of disruptions in the system per day.The simulation is compiled for 1000 days of operation, and the cumulative percentages for the whole operation predict that 20.97% of the outcomes lead to critical failure and 38.67% lead to non-critical failure, while 40.37% leads to nominal operation.The results will assist in predicting the hazard rate function of a process degradation.The probability of occurrence for each set of outcomes per day is illustrated in Fig. 9.Each line shows the probability, denoted as a percentage per day, that the main engine observes critical failure, non-critical failure events.The results prove that the chance of observing non-critical failures in the system is always higher than critical events, though it will not cause the operation being stopped.To predict the most expected interval of disruptions in the system, the relative frequency of each event regarding the required     actions for repair and maintenance for the entire process is plotted in Fig. 10.The categorical probability of occurrence for different combinations of failure outcomes is also presented in the same plot to highlight the possibility of each event.The combinatory plot for different possible actions demonstrates that the number of requests for planned maintenance of failure-sensitive components is always higher than repairing.This confirms that most of the disruptions in the main engine will result in non-critical failures.The plot also quantifies how the probability of observing both critical and non-critical failures increase by consecutive days.However, the first time to failure is expected to occur in specific uncertain intervals during the operation.The first interval for observing failure is predicted as [31,56] days with an expected probability of 0.028, while the most expected failure events are estimated to occur during days [173,191] with a cumulative probability of 0.225.The possible intervals of the first time to failures' occurrence are shaded with grey colour, demonstrating four major disruption intervals are expected to occur in the first 200 days of operation.These separate intervals occur mainly due to the previous observations for failure events in the system.
To define the instantaneous rate of failure for the process at a given time, the Hazard Rate Function (HRF), H(t) is predicted for the entire system.Four different Critical Failure Limits (CFL) considered observing the time of disruptive events from the simulations and defined as CFL = [ .Each limit represents an allowable threshold; e.g., 5/ 1000 means that the machinery is safe if it has less than five failures over 1000 days.If the system exceeds the CFLs, it means that the probability that the operation will encounter major disruptions that will halt the whole system is unacceptably high.These safety thresholds are selected to show the effectiveness of the framework to derive time-depended failure rates, and they can be regarded as a real case according to the autonomous shipping standards [32].The challenge is to change the operation of the manned vessel to an unmanned level without altering the design of the components.Therefore, before making any radical changes to a ship's engine room, the system's reliability should be evaluated to understand whether the existing machinery is reliable enough to be left unattended for a longer time.The simulation presented  in Fig. 11 shows that the whole Main Engine is prone to fatal technical failures with a probability of occurrence in the range of 0 to 0.025 for 200 days.This means that the highest probability of failure is between [0.020,0.025] in the first 200 days of operation; therefore, the CFL = 2/200 is set up for all critical systems related to the main engine to cover all possible rare disruptive events that may affect the performance of the current engine.This will help predict the time of safe operation and the time of starting degradation in the system that will cause failures.The HRFs for all limits are illustrated in Fig. 11.The graphs demonstrate the increasing hazard rates for the process.The system's degradation for all safety levels starts from days 95, 88, 84, and 38, respectively.This means that the reliability of unattended machinery should always be higher than the minimum accepted safety limit, thus keeping the vessel operational for more extended period without human interaction.According to the MUNIN project for autonomous commercial ships "An engine should operate reliably at least for 500 hours (equal to 21 days) without physical interference from a person".Within MUNIN objective and following the present method, the results of the case study emphasize that even if the main engine and associated systems of a UMP are designed based on the HFR with a safety limit of 2/1000, then the expected time for starting the degradation and exceeding the safety threshold is predicted at 912 hours (38 days) which is almost twice the MUNIN limit.By considering the first 100 days of the operation, the expected failure rates for the system's critical disruptions are predicted asE[λ 100 ] = [49.80e− 05, 14.80e − 04, 18.0e − 04, 94.00e − 04] respectively.These results show that the current main engine is reliable enough to pass the MUNIN objective for leaving a machinery plant unattended.It should note that the current case study is only considered the main engine, and the other failure sensitive systems such as auxiliary engines are not included.Therefore, the current results do not guarantee that the overall engine room meets the reliability threshold.To keep the manuscript readable and demonstrate the application of the methodology, the complexity of the entire engine room is excluded in the case study.

Conclusion and future work
In this paper, a categorial failure model is developed to predict failure probabilities in the system that will lead to interruption of the operation of UMPs.First, an MPT model is adopted to model the system's behavior in terms of categorical failure functions with other equivalent critical levels.Second, HBM is employed to model the uncertainty of failure events in the unattended system.The event frequency data were obtained from the expert engineers (Repair, Maintenance for the failure-sensitive components).The results are summarized for a real case study of the main engine.Based on the available data and considered safety threshold, for this particular main engine, the critical components of the system are reliable enough to be left unattended since the probability of a critical failure in 500 hours of continuous operation is below the assigned thresholds.For the present case study, 912 hours is estimated as the time until the safety limit zone in the system for the CFL=2/1000 is exceeded, which is well beyond the MUNIN reliability standard that requires the machinery to operate reliably for 500 hours without human interference.It should be noted, however, that not only a selection of the ship's machinery has been modeled.Due to the scarcity of operational data for autonomous shipping, the present work is set up based on the three main assumptions described in Section 2 for evaluating the performance of the unattended machinery.As the limitation of the current work, the planned maintenance is used as an indicator of failure intervals.However, it is conservative since it does not represent the actual failures.This limitation can only be solved if more actual failure data becomes available.Also, the present model does not consider the recovery for the UMP after observing critical failures, which will affect the operation conditions.As a part of this model, redundancy engineering should also be included to evaluate the system's state after encountering significant disruption while suggesting recovery solutions to make it more realistic for the unattended engine room.
The method presented in this study is developed for failure modelling of unattended systems and evaluating the whole operation's functional capacity.This leads to an estimate of how long the machinery plant can operate without human intervention before reaching a reliability threshold for the entire system.The presented framework, which is one of the first attempts to evaluate UMP's performance, may bring new insights into understanding the trustworthiness of an unattended  engine room under different operational scenarios and considering the impact of redundancy to improve the resilience of the system.The method may be used to support decision-making concerning the designing engine room for autonomous shipping purposes and performing asset integrity assessment to meet future goals for increasing the operation time of unmanned systems.The outcomes of this study demonstrated that the method can predict the trusted period without human intervention, which is an essential step for improving the reliability of autonomous systems.For future work, the approach can be integrated with the practical Prognostic Health Monitoring (PHM) techniques and design of Digital Twins (DT) of autonomous systems.To build up the recovery strategy, redundancy can be adopted with the model to increase the functional capacity of the system for unattended operation.To incorporate the uncertainty quantification into redundancy modelling, such as adopting a multi-state system level by integrating HBM with Markov cell-to-cell mapping techniques.Moreover, the optimization of a redundant system can be included in the last step as a framework for minimizing the associated cost and weight in the implementation of k out of n components in the unattended engine room.The future work of the authors will include these crucial actions for further addressing the challenges of the safety in autonomous shipping.

Declaration of Competing Interest
None.

Fig. 2 .
Fig. 2..An illustrative example for real-time risk-based approach on improving the reliability of machinery in autonomous ships.

Fig. 4 .
Fig. 4.. Illustrative model for constructing (a) MPT and (b) HBM to enable the evaluation of failure events.

Fig. 5 .
Fig. 5.. MPT model for performance analysis of the entire system (the letter 'p' underlines in each branch represents for the path labels).

Fig. 6 .
Fig. 6.. MPT model for performance analysis of individual components (The letter 'p' underlines in each branch represents for the path labels).

Fig. 7 .
Fig. 7.. Probability density function for performing Repair and Maintenance for the sensitive components in the entire system.It should be noted that according to the guidelines for Autonomous Shipping presented by Bureau[32], the highest value of FI = 7 represents an occurrence of once per month on one ship, and the value ofFI = 6represents for likely to occur once per year on one ship.For the value, less than FI ≤ 5 refers to Tables2 and 3described in Section 3.2.

Fig. 8 .
Fig. 8.. Random distribution of cumulative actions for the repair (i.e.critical failure) and maintenance (i.e.non-critical failure) perceived from categorical failure analysis of the MPT model designed for the individual components (Figure 6).

Fig. 9 .Fig. 10 .
Fig. 9.. Probability of process randomness of experiencing non-nominal and nominal operation for different types of disruption (critical, non-critical, and wrong malfunction of the main engine's components).The result perceived from the MPT model for the entire system.

Fig. 11 .
Fig. 11.. Hazard rate function H(t) of fatal technical failure for the entire system.

Table 1
Explanation of Frequency indexes value for expected maintenance performance (Values are in occurrence per ship year).

Table 2
Explanation of Frequency Index.
Index Check Maintain Repair