Hierarchical Bayesian reliability assessment of energy networks

It is often the case that data used for the systems reliability assessment comes from more than one information source. Whether they are power plants at different geographical locations, gas transmission pipelines operating in different environment or power transmission networks deployed within various areas. Therefore, different operating conditions, varying maintenance programs and efficiencies have its share in influencing the vulnerability and variability of reliability data. However, in practice it is usually the case that this heterogeneity is neglected leading to the underestimation of underlying uncertainty of the data. Bayesian models are capable of dealing with this kind of uncertainty as opposed to the frequentists statistical methods. Hierarchical Bayesian modelling technique provides means to quantify not only within-source, but also between-source uncertainties. Even in the case of small data samples it performs well, unlike for example the classical likelihood method which may provide degenerate estimates. In this paper authors investigate the possibility to incorporate this kind of uncertainty into the systems reliability and vulnerability assessment through the Bayesian framework in several cases: gas transmission networks and power transmission grids.


Introduction
There is an issue with systems reliability data that deserves a wider attention from the scientific community. Consider a case when data is collected from components or systems operating in various areas. For example, overhead electricity lines deployed through entire area of the country, pipelines operating in varying soil and environmental conditions. Or maybe some class of rare components, for which the data is collected from nuclear power plants operating in different countries or continents.
These are examples of cases when in order to assess reliability and vulnerability, the usual choice is to pool the failure data.
In reliability engineering applying statistical data analysis it is a quite common practice to pool statistical information from different sources. At first sight it seems quite natural to make such decision: if systems are similar and perform the same function, then samples also should be treated as similar. However, such aggregation causes a loss of information about specificity of those systems and impact of their environment. However, similar components can have different ageing behaviour when operating in different environmental and maintenance conditions. In addition, there is another issue, leading to the decision to pool the data -highly reliable systems, especially if they are age-dependent, do not supply sufficient statistical information for long-term reliability investigation, so that aggregation would strengthen statistical inferences. The drawback of such heterogeneous and rare data pooling is that one may get estimates with smaller uncertainty bounds, than it would be when betweensource (separate samples) variability is considered. Those too optimistic uncertainty bounds will lead to less strict safety margins what itself causes higher risk of safety limits exceedance [2]. In this paper we are concerned with modelling reliability data heterogeneity, since it has a significant influence on the overall system reliability and its prediction considering data uncertainty. There are two approaches that could be used for the statistical analysis of heterogeneous data: mixed-effect models [5] and hierarchical Bayesian models [1]. The second class was chosen and together with Bayesian methods will be employed through this entire paper without turning to the frequentist concepts. We investigate the possibility to incorporate heterogeneity into the systems reliability and vulnerability assessment through the Bayesian framework in several cases: gas transmission networks and power transmission grids.

Hierarchical Bayesian modelling of reliability data
Due to the widespread use of the Bayesian methods we assume prior knowledge of basic probabilistic language in constructing the Bayesian models: likelihood function, prior distribution, and posterior distribution. Consider N samples of failure evens (reliability data) observations, i.e. we observe N systems and register the failures of some particular group of components. For the sake of simplicity we assume that the number of data points in each sample is the same and is equal to M . One option is to pool all samples to obtain only one, as already discussed in the introduction section, but this would most likely lead to the incorrect variance estimates. Another option is to build a hierarchy and to introduce a partial pooling. Suppose we have a reason to believe that each source generates data from similar but not exactly the same probabilistic model. Denote the model of the i-th source by f (x|θ i ).This means that we do not assume that data from different sources are identically distributed. These models form the first level of the hierarchy. We have now sets of unknown parameters that needs to be estimated. Estimating these parameters as though they are independent from each other would lead to large variances especially if the samples are small. However, we assume that these unknown parameters are related to each other by enforcing a stochastic model onto them. We treat parameters θ i as unobserved data (or unobservables) and assume a model π(θ i |ζ), where ζ is another unknown parameter that has to be estimated. This model is called the second level of the hierarchy. It seems like we made our lives more difficult by introducing even more unknown parameters. However, this second level model enables the sharing of information between those N samples, so that they are partially related (partially pooled). Now we put all the hierarchies into a mathematical expressions: where X i,j is the j-th observation for i-th source, π 0 is a prior distribution for the parameter ζ. The so called posterior distribution π(θ i |ζ) is proportional to L(X, θ)·π 0 , i.e. the product of likelihood and prior distribution. All the inferences are then made from this posterior distribution. The calculations are another thing most likely posterior distribution will not have an analytic form and approximating algorithms must be employed. For this the most common choice is Markov Chain Monte Carlo algorithms [4].

Power transmission grid
13 overhead lines of 330 kV rating were observed over the period of 42 years. To be more specific, not all of the lines were observed over exactly 42 years, some were put in operation several years later comparing to others. Therefore we have 13 data sources with registered outages. More about the application of hierarchical Bayesian models in power grid context see the recent work of authors [3]. Variability of data clearly indicates that one cannot assume the same stochastic model for all overhead lines -frequencies are simply too different. Therefore we consider building a hierarchical Bayesian model. If X i denotes the number of outages over the entire observation period ∆ i for the i-th overhead line of length L i and if we assume a Poisson distribution for the count data, the stochastic model is as follows: where exp(λ i ) denotes the failure rate for i-th line when length and period are assumed to be equal to 1. Exponential function was chosen due to the simplifications, because now λ i are not restricted to the fixed positive values and normal distribution with parameters µ, σ 2 can be imposed over these parameters. The likelihood function can be expressed as follows: Since the prior distribution for the parameters µ, σ 2 is chosen to be uniform, the posterior is proportional to the likelihood function L(X|λ). The estimates of the outage rates are presented in Fig. 1. If one would treat the outage rates as coming from the same probabilistic model, one would get posterior outage estimate equal to 2.07, obviously inappropriate value for the variation level present in the data sample.

Reliability of natural gas pipeline network
Now consider a problem of evaluating the reliability natural gas transmission pipeline network. It is often the case that for this task international databases are used because of large data samples contained in there. Most widely known such databases are OPS PHMSA [2], EGIG [2], UKOPA [9], and NEB [10]. However, OPS data sample has two break points at the years when the data collection criterion was changed. In addition, for our case study we include Lithuanian data sample, which is very small and has only 7 data points with one breakpoint. Therefore we have a 5 data samples coming from different sources albeit representing the similar components. How to deal with those data breakpoints and changing collection criteria see [8,6,7]. The model constructed to deal with additional breakpoints was called a Criteria-Dependent Poisson model (or CDP model). The hierarchical model in this particular case is quite complex as it includes age-dependency, change in collection criteria and the hierarchical structure: Level I Poisson (E 1 t λ(t; θ 1 )), t = 1, . . . , 14 Poisson (E 1 t λ(t; θ 1 )(1 − p 1, − p 2, )), t = 15, . . . , 33 Poisson (E 1 t λ(t; θ 1 )(1 − p 1, )), t = 34, . . . , 42 for Lithuanian case.
for EGIG, UKOPA, NEB cases. Unknown parameters are assumed to have lognormal distribution. Here X k t denotes the k-th sample which is age-dependent, E k t is the explosion, p 1, and p 2, are probabilities of data falling under certain collection criteria (think about it as an adjustment factor resulting from different data collection criterion), also we made an assumption that λ(t; θ) has two unknown parameters θ k = (θ k 1 , θ k 2 ) of which the first is positive, hence has lognormal distribution, and the second is normally distributed. Normality and log-normality are assumptions (could be changed by another) and the sensitivity of them has not been investigated in this paper. Now we turn to the inference for Lithuanian pipeline network. Due to the small size of the network and the incident criterion used until 2004, the data is quite scattered and the inferences based on it alone would be questionable. Therefore it is an advantage to be able to support this small sample with the information contained in the data samples from other databases. When comparing the estimated trend (see Fig. 2, inferred zone), hierarchical model generally provides wider credibility bounds, than obtained from non-hierarchical variant of CDP model (only considering Lithuanian case). This is due to the fact that hierarchical structure of the model allows incorporating and quantifying additional level of uncertainty, i.e. variation throughout different databases is now accounted as well. In addition, two data points that were collected under new incident criterion (since 2003) are less underestimated by hierarchical model.

Conclusions and further comments
In this paper we have tackled the task of including additional level of uncertainty into the model of systems reliability. We briefly described the mathematical machinery and illustrated it with three quite different examples. The implications of including heterogeneity of the failure data are wide ranging -uncertainty due to heterogeneity pierces every reliability concept not only failure rate. The easy-to-interpret Bayesian modelling language serves well this purpose and enables to create complex models with various degrees of uncertainty taken into account. We have demonstrated that modelling of uncertainty due to data heterogeneity provides more realistic inferences and thus gives us practical approach to make more informed decisions related to reliability and vulnerability of various systems.