Sample selection for extreme value analysis of inspection data collected from corroded surfaces

Abstract Inspection of corroded engineering components is vital for ensuring safety throughout the lifetime of infrastructure. However, full inspection can be infeasible due to time constraints, budgetary limits or restricted access. Subsequently there is growing interest in partial coverage inspection (PCI) techniques which use data from the inspection of a limited area to assess the condition of larger areas of a component. Extreme value analysis (EVA) is a tool for PCI, it allows an inspector to build a statistical model of the smallest thicknesses across a component. Construction of extreme value models relies on the selection of the smallest thicknesses from the inspection data. Current methodologies rely on the judgement of the analyst to select sets of thickness minima and frequently the inspection data is not checked to ensure that the assumptions made by EVA are reasonable. Consequently, the resulting models can be subjective and can provide inadequate models for extrapolation. In this paper, a framework for building extreme value models of inspection data is introduced. The method selects a sample of thickness minima such that the data is compatible with the assumptions of EVA. It is shown that this framework can select a suitable set of minima for a large number of correlated exponential and Gaussian surfaces and the method is tested using real inspection data collected from an ultrasonic thickness C-scan of a rough surface.


Introduction
Corrosion costs the petroleum industry in the United States of America around $8 billion per annum [1]. Accurate assessment and tracking of corrosion related degradation is vital to ensure smooth operation of facilities and to prevent accidents [2]. The condition of a facility is assessed using regular inspections performed by experienced and independent contractors. Often regular shut down periods are scheduled to allow for these inspections, some of which require access to hazardous areas of the plant. Furthermore, despite all efforts full inspection is not always possible because of access problems (other plant components concealing the area, scaffolding or excavation required for the inspection), time constraints in shut down periods and limited inspection budgets.
Risk based inspection (RBI) strategies are becoming commonplace in asset management [3]. Certain areas are more safety critical or degradation mechanisms (such as corrosion) are known to be more aggressive in particular parts of the plant. These areas are considered at higher risk than others. Therefore, to be most economical, asset owners prioritise inspections in these sample areas. Sometimes inspectors can only access a fraction of these areas. In this situation partial coverage inspection (PCI) can be used to estimate the worst case damage in the whole structure based on the data that is available. PCI builds a statistical model of the condition of an inaccessible area using the inspection data from accessible areas of a component (an example thickness map is shown in Fig. 1a) which are exposed to the same operational and environmental conditions. This approach is attractive as it has the potential to estimate the condition of very large areas of a component using small samples of data. The technique can be applied to data from conventional inspection techniques such that all existing sensing technologies can be used.
Examples of applications of PCI to real ultrasonic thickness inspection data can be found in Stone [4]. Stone calculated the empirical cumulative distribution function (ECDF) of thickness measurements collected as part of real inspections (an example of which is shown in Fig. 1b). The ECDF is an estimate of the probability of measuring a thickness of a given value, which can be interpreted as the fraction of the area with a thickness of less than a given value.  different inspections of the same area can be very different [4]. These variations lead to different estimates of the fraction of the area of the component covered by the smallest thickness measurements. In order to build an accurate picture of the condition of the uninspected area one needs to take into account the variation which arises from sampling the smallest thickness measurements.
A key part of this problem is that an inspector only has access to data from a small inspected area. In this area, there is only one minimum thickness, which does not provide enough information to build a model of the smallest thicknesses. An inspector can generate a sample of the smallest thickness measurements by partitioning the inspection data into a number of equally sized blocks. In each block the minimum thickness is recorded. This set forms a sample of the smallest thickness measurements. From this sample, one can build a model which takes into account the variations of the smallest thickness measurements. Extreme value analysis (EVA) provides a limiting form for this model. It states that, if the underlying thickness measurements in each block are taken from independent and identical distributions, then the sample of minimum thickness measurements will follow a generalized extreme value distribution (GEVD).
The GEVD makes it possible to calculate the probability of measuring a minimum thickness of less than a given value. This has inherent value to both the plant operator and the inspector. The model allows the inspector to report both the smallest thickness they have found and a probability of finding a minimum thickness less than this value in the uninspected areas of the structure. Potentially, a plant operator can make decisions about inaccessible areas of a plant. For example, Schneider used EVA to model the condition of an inaccessible area of a pipework system on an oil platform [5]. EVA allowed Schneider to calculate estimates of the probability of future leaks in the inaccessible area based on inspections of the accessible area. Kowaka and Shibata give similar examples of the application of EVA, ranging to generating a probability distribution for pit depths in steel piles in sea water to calculations of the most likely maximum pit depth in an oil tank [6,7].
The problem with existing applications of EVA to corrosion data is that the analysis is dependent on the judgement of the analyst and does not necessarily check that the data is suitable for EVA (i.e. they do not check that there is evidence the assumptions made by EVA are fulfilled). For example existing methods for selecting a suitable block size have focussed on examining the fit of the GEVD to the set of minima selected using that block size. Glegola selected a block size by extracting sets of thickness minima using multiple block sizes [8]. For each set of minima the quality of the fit to the GEVD was examined and the block size which gave the best fit to the GEVD was used for the analysis. Another example is the work by Schneider, who selected a block size to ensure that the minima from each block were independent [5], however he did not confirm the identicalness of the distributions in each block. Schneider examined the two dimensional autocorrelation function of the thickness map and chose a block size, L, such that thickness measurements separated by L were weakly correlated. In contrast to Glegola's method this approach chooses a block size based on one of the assumptions of EVA.
However, in addition to the independence of thickness measurements, EVA also assumes that that probability distribution of thickness measurements in each block is identical. Ensuring that there is evidence that both of these assumptions are met is key to implementing an extreme value model for partial coverage inspection. Thickness maps from corroded components are often complex as they undergo damage from different modes (e.g. pitting corrosion as opposed to uniform corrosion). Each damage mode will produce a different thickness measurement distribution. If one naively builds an extreme value model, the complexity of the surface can violate the assumptions of an extreme value model. Due to the risks involved in drawing conclusions from an inappropriate model (e.g. component failure leading to loss of life), an analyst should gather a body of evidence which supports their choice of model. This will reduce the chance that an inappropriate model will be used for PCI. In summary, the current state of the art offers methods for building extreme value models from inspection data which can lead to subjective models as they rely on the judgement of the analyst rather than compliance to an objective set of requirements.
The aim of this paper is to introduce a data analysis procedure that checks that all of EVA assumptions are met and which an analyst can refer to when developing extreme value PCI models. This paper begins with a discussion of extreme value theory in relation to inspection data (Section 2), progressing to a description of the framework in which EVA can be applied to ultrasonic thickness maps of corroded engineering components (Section 3). Section 5 presents evidence that the presented approach yields sensible results when applied to simulated surfaces and data acquired by ultrasonic measurements. This is followed by a discussion of these results and conclusions are drawn (Section 6).

Extreme value analysis of corrosion data
Most statistical models describe the average behaviour of the thickness distribution. These models are useful for predicting average corrosion rates or the average condition of a component. Extreme value analysis (EVA) is the study of extreme deviations of a random variable. In the context of this paper, where the assessment of a component with regards to corrosion damage is of interest, this random variable is the thickness of the component. Applications of EVA have been as varied as predicting the development of localised corrosion [9], calculating risk in insurance [10] or analysing the effect of different atmospheric conditions on surface morphology [11]. It is a promising tool for PCI as it provides a statistical model for the thinnest areas of a component.
Extreme value theory states that, if the underlying thickness measurements are from independent and identical distributions (i.i.d.), the minimum values of thickness can be modelled by the generalized extreme value distribution (GEVD): where ∈ R is the location parameter, > 0 is the scale parameter, ∈ R is the shape parameter and (x| , , ) is the probability of measuring a minimum thickness (in a block) of less than x. For the case of = 0 the equation takes the limiting form: For the analysis presented in this paper the assumption that / = 0 was made and Eq. (1) was used as a model. While in the past, for pitting corrosion, it has been shown that Eq. (2) provides a good model for the extremes [12][13][14]. In this paper we have performed the model fitting using Eq. (1). This form of the GEVD will allow the fitting process to determine a suitable value for as an alternative to restricting its value. The case where = 0 has not been implemented in this paper. It should result in a better model. Readers interested in a more comprehensive description of extreme value theory can be found in Coles [15].
In an application of EVA an inspector will extract a sample of minimum thicknesses from the inspection data. Usually, the inspection data takes the form of an ultrasonically measured thickness map of an area of a component. In the thickness map there is only one minimum thickness. Consequently, the inspector partitions the thickness map into a number of equally sized blocks, from which a minimum thickness is extracted. This provides a sample of thickness minima from which the parameters of the GEVD can be extracted. Parameter estimates for the GEVD are calculated from the sample using maximum likelihood estimation [16].
Once an extreme value model has been constructed for the data, EVA can be used for direct extrapolation to a much larger area. The return period of a surface is the average number of blocks that would require inspection to measure a minimum thickness of less than a given value. It can be shown that the return period for a thickness measurement t can be calculated as [6]: where R(t) is the average number of blocks one would need to inspect to measure a minimum thickness of less than t and (t| , , ) is the probability of measuring a minimum thickness of less than t. For example, if the GEVD model gives the probability of measuring a thickness t is 0.01, the corresponding return period would be 100. If the EVA model was constructed using 10 blocks, the return period reveals that (on average) an area of 10 times the initial inspection area would be required to measure a minimum thickness of less than t.

Data analysis procedure to check EVA assumptions are met (Blocking algorithm)
Schemes to partition thickness maps must tread the line between ensuring there are a sufficient number of sample minima and that the thickness measurements selected are extreme deviations from the median thickness. Too large a block size and there will not be enough minima to extract parameters for the GEVD; too small and the sample will not be representative of the extremes of the thickness distribution. An effective scheme will balance these requirements whilst ensuring that there is evidence that the assumptions made by EVA are met by the inspection data.
In this light, this paper describes a framework for checking that all of EVA assumptions are met prior to building a model. First, we check the independence of the underlying thickness measurements by calculating the autocorrelation function of the thickness map: where C(x , y ) is the correlation between a thickness measurement T(x , y ) and T(x, y). C(x , y ) is a two-dimensional surface reflecting the fact that the thickness map spans two horizontal dimensions, described by the x and y coordinates.
In this paper we restrict ourselves to isotropic surfaces. Consequently, the autocorrelation function for all of our test cases is symmetric in the x and y directions and all the information about the correlation structure can be obtained from C(x , y = 0). This is easily expandable to non-isotropic surfaces as the autocorrelation function can be calculated in any direction. Fig. 2 shows an example of an autocorrelation function calculated from one of the test surfaces. The ordinate axis is the correlation between two measurements, where 1 indicates perfect correlation and 0 indicates no correlation, while the abscissa is the distance between the pair of measurements. Measurements separated by smaller distances are more correlated, which indicates that measurements closer together are likely to be interdependent.
It is common practice to define a correlation length c for a surface, which is defined as C( c ) = e −1 . Using the correlation length, one can define a distance at which two measurements are uncorrelated and likely to be independent. Fig. 2 shows that the autocorrelation function of the surface has dropped to zero at a distance of 2 c , therefore it is imposed that measurements must be at least be 2 c apart in order to guarantee that data points are uncorrelated. Once a correlation length has been calculated, the surface is partitioned into a number of equally sized blocks. For the 200 mm square surfaces studied in this paper, we chose block sizes ranging from 10 to 60 mm. Starting with the smallest block size, a random sample of thickness measurements, including the minimum thickness measurement, is selected from every block. The sample is chosen such that every thickness measurement is separated by 2 c , which ensures that the thickness measurements in the sample are independent of each other. The algorithm then checks the random samples from every pair of blocks are from the same underlying thickness measurement distribution using a two sample Kolmogorov-Smirnov (KS) test [17].
The algorithm performs a two-sample KS test on the random samples from every pair of blocks. If a single pair of blocks fails the two sample KS test, then the algorithm increases the block size and repeats the blocking process. Otherwise, if every pair of blocks does not fail the two sample KS test, the algorithm has found a block size for which there is evidence the distribution in each block is identical. The sample of thickness minima extracted using this block size can then be used to build an extreme value model for the thickness map. The parameters for the GEVD are then extracted from the sample of minima selected by the algorithm using maximum likelihood estimation (MLE) [16].
In summary, Fig. 3 shows a graphical form of the proposed algorithm. For the smallest block size, the thickness map is split into equally sized blocks and a two-sample Kolmogorov-Smirnov test is performed on the distributions from every pair of blocks. This tests that the thickness measurements in each block are from the same distribution. If the tests show that the thickness distribution is the same in every block, the algorithm terminates and this is the correct block size. Otherwise, the algorithm repeats this process for the next largest block size until there are no remaining block sizes and we conclude that the inspection data is not suitable for EVA. In this way the algorithm is selecting a block size by looking for evidence that the partitioned data meets the assumptions made by EVT.

Simulation set up
The algorithm was tested using a large number of both Gaussian and exponential surfaces. Gaussian surfaces were generated using sequences of uncorrelated random numbers drawn from a normal distribution, while the exponential surfaces were generated with sequences drawn from an exponential distribution. Correlated surfaces were generated from these sequences using a weighted moving average with weights chosen such that points on the surface had a root mean squared (RMS) height of 0.1, 0.2 and 0.3 mm (examples of Gaussian and exponential surfaces with RMS = 0.3 mm can be seen in Fig. 1b), a mean position of 10 mm and a Gaussian autocorrelation function: where C(x , y ) is defined as in Eq. (3). For the surfaces generated in this paper a correlation length c = 2.4 mm was used. A more detailed discussion of the surface generation can be found in [18], whose implementation is based on the method developed by Ogilvy [19]. For both the Gaussian and exponential height distributions 1000, 200 by 200 mm surfaces were generated and each surface was processed using the blocking algorithm, with candidate block sizes ranging from 20 to 60 mm (in steps of 5 mm). If the algorithm successfully chose a block size for the surface an extreme value model was generated using that block size. This model was validated using the scan return period (SRP): where S is the SRP and N blocks is the number of blocks the thickness map has been partitioned into and t min is the smallest thickness across the surface. If the model adequately describes the surface then S ≈ 1 as there is at least one thickness measurement of t min . Therefore, this metric provides evidence of the quality of the extreme value model for that surface.

Performance of the algorithm
The statistics of the block sizes and scan return periods were calculated to examine the performance of the algorithm. Figs. 4a,b and 5a,b show histograms of the block sizes selected for each surface. The histograms are a visualisation of the distribution of the block sizes. For each block size, the number of surfaces for which that block size was selected is visualised using a bar. The height of the bar is proportional to the number of surfaces for which that block size was selected. Fig. 4a shows the distribution of the blocks selected for Gaussian surfaces using a significance level of 0.01. The black, grey and white bars show the results from surfaces with RMS = 0.1, 0.2 and 0.3 mm. The mode block size selected by the algorithm for the Gaussian surfaces is 40 mm. For comparison, the mode block size for the exponential surfaces (Fig. 5a) was 35 mm for both RMS = 0.2 and 0.3 mm, and 30 mm for RMS = 0.1 mm. This difference originates in the height distributions of the surfaces. For an exponential distribution, the average deviations from the median of the distribution are larger than for a Gaussian distribution. Consequently, one requires a Fig. 3. A flow chart of the proposed blocking method. A user will input an ultrasonic thickness map and the algorithm will split it into a number of equally sized blocks. The blocks will then be tested to ensure that there is evidence they meet the assumptions made by extreme value analysis. Fig. 4. Histograms of the number of Gaussian surfaces against block size at different significance levels, showing the number of surfaces for which the algorithm has selected a given block size. With a significance level of 1%, the algorithm could not find a suitable block size for 1% of the surfaces, this increased to 20% with a signficance level of 5%.

Fig. 5.
Histograms of the number of exponential surfaces against block size at different significance levels, showing the number of surfaces for which the algorithm has selected a given block size. With a significance level of 1%, the algorithm could not find a suitable block size for 3% of the surfaces, this increased to 30% with a signficance level of 5%. Fig. 6. Histograms of the number of Gaussian surfaces against scan return period at different significance levels. With a significance level of 1%, scan return periods ranged as far as 14 scans, which corresponded to block sizes greater than 40 mm. On increasing the significance level to 5%, the algorithm did not select a block size for these surfaces and the range of scan return period decreased. smaller number of thickness measurements and therefore a smaller block size to measure an extreme than with a Gaussian distribution.
The mode block size of 40 mm corresponds to thickness minima sample sizes of 25 for the Gaussian surfaces and the mode block size of 35 mm corresponds to 32 sample minima for the exponential surfaces. In general this was a sufficient number of minima to be confident about the quality of the generated extreme value model. However, there are a fraction of the surfaces for which a block size of greater than 50 mm has been selected, which corresponds to smaller samples of minima (16 for 50 mm and 9 for 60 mm). Consequently the models generated using these block sizes will produce poor descriptions of the surface as there is less information from which to estimate the model parameters. This is evident in Figs. 6a and 7b which show the distribution of the SRPs for the models generated by the algorithm. The mode scan return period is around 1 for both types of surface, which is expected from our definition of SRP. However, some of the models have very large SRPs. These models were generated using the larger block sizes (and the associated smaller sample sizes). In these cases the algorithm has required a much larger block size in order to find a sufficient level of evidence that the thickness measurements come from identical distributions.
This level of evidence is determined by the choice of the significance level of the KS test. A lower significance level means that the algorithm requires less evidence that the distributions are identical, increasing it raises the amount of evidence required. When the blocking algorithm fails to find a suitable block size we conclude that there is insufficient evidence that the assumptions made by EVA are met by that surface. As with any method, there are circumstances in which EVA is suitable and those in which it Fig. 7. Histograms of the number of exponential surfaces against scan return period at different significance levels. With a significance level of 1%, scan return periods ranged as far as 14 scans, which corresponded to block sizes greater than 40 mm. On increasing the significance level to 5%, the algorithm did not select a block size for these surfaces and the range of scan return period decreased.
is not. Although the assumptions made to generate the surfaces are congruent with those of EVA, each surface is a random process. Consequently, it will not necessarily show evidence that the assumptions of EVA are met.
Figs. 4b and 5b show the distributions of block sizes using a significance level of 0.05 for the Gaussian and the exponential surfaces respectively. The mode block sizes remain the same, however, there are no longer any surfaces for which a block size of greater than 50 mm has been selected. In fact, the algorithm has failed to find a suitable block size for around 20% of the Gaussian surfaces and 30% of the exponential surfaces, compared to 1% and 3% at a significance level of 0.01. These surfaces mostly correspond to the larger block sizes in Figs. 4a and 5a. As a result the distributions of SRPs at a significance level of 0.05 (Figs. 6b and 7b) do not show SRPs greater than 5. This suggests that a higher significance level leads to models which more accurately describe the surface.
Figs. 8 and 9 show box plots of the SRP for each block size for the Gaussian and the exponential surfaces. Box plots provide a visualisation of the distribution of the SRP calculated from each model. The interquartile range (IQR), represented by the length of each box, is the bounds within which half of the values of SRP lie. The median of the distribution is shown by the line in the middle of each box and the whiskers show the range (scan return periods within the 1% and 99% quantiles) which does not contain any outliers. Any values outside of this range are plotted individually as crosses. Fig. 8(a)   For block sizes of 55 and 60 mm, the median scan return period deviates significantly from 1, as the models using these block sizes rely on a small number of minima. There are also a number of large outliers for some of the smaller block sizes. With a significance level of 0.05, there is a large reduction in the number of outliers shown in Fig. 8(d)-(f) and there are no longer any models generated using block sizes greater than 50 mm. The average deviation of the median from the black dashed line is also reduced. This is a consequence of the more stringent requirements for surfaces deemed suitable for EVA.
This pattern is continued for the exponential surfaces. Fig. 9(a)-(c) shows the box plots for exponential surfaces with RMS = 0.1, 0.2 and 0.3 mm at a significance level of 0.01. The median of the SRP for the models generated for each block size was close to 1 with IQRs of around 2 scans. In a similar manner to the Gaussian surfaces, the median SRP deviates significantly from 1 for block sizes greater than 40 mm, which indicates that these models are  poor descriptions of the data. At a larger significance level of 0.05, there is a reduction in both the IQR for each set of data and the number of outliers shown in Fig. 8(d)-(f). This suggests that making the test requirements more stringent increases the quality of the models produced by the algorithm.

Example of an extreme value model generated from inspection data
In addition to the numerical studies discussed in the previous section, an example of an ultrasonic thickness C-scan of a steel plate machined with a Gaussian surface with RMS = 0.3 mm and c = 2.4 mm was processed using the blocking algorithm. Exact details of the experimental set-up, including a discussion of the machining process and any errors from the measurement technique, can be found in [20]. Fig. 10a shows the empirical cumulative distribution function generated from the experimental ultrasonic thickness map. The diamonds represents the ECDF generated from the experimentally collected data and the stars are the ECDF from the point cloud of the actual surface condition. There are discrepancies between the ultrasonically measured thickness measurements and the actual condition of the surface which arise from the scattering of the ultrasound from the rough surface. This bias arises from the random scattering of ultrasonic pulse by the rough surface which is manifested as consistent underestimation of the smallest thicknesses across a component [20]. This is an effect rooted in the physics of the inspection routine rather than the statistical analysis of the data.
As the smallest thickness measurements from a UT scan are underestimates of the true thickness of a component, one would expect an extreme value model generated from UT thickness data to overestimate the severity of damage in an area. Any estimates of the minimum thickness made using an extreme value model will be smaller than the true minimum thickness. To investigate this effect the thickness map was processed using the blocking algorithm with a significance level of 0.05 and a block size of 25 mm was selected. A set of minima was extracted from each using the selected block size and a GEVD was fitted to the sample minima. Fig. 10b shows the extreme value models generated using the sample minima from both the ultrasonic data (circles) and the point cloud data (crosses). The extreme value models for each set of data are shown by the dashed lines. Graphically, the models provide a good fit to each set of minima. The SRP for the ultrasonic data was calculated to be 1.16 for the ultrasonic data and 0.93 for the model generated from the point cloud data.
The model generated from the ultrasonic data overestimates the differences from the mean thickness compared to the actual surface condition. As a consequence the extreme value model generated from the ultrasonic data overestimates the severity of the damage across a component. For example, the smallest thickness in the point cloud data is 8.7 mm, which corresponds to a deviation from the mean thickness of −1.3 mm. The return period of this thickness calculated from the ultrasonic extreme value model is 0.04 scans. The extreme value model generated from the ultrasonic data is very conservative compared to the true surface condition. This arises from the differences between the distribution of the minimum thickness for the point cloud data and the ultrasonic thickness data, which comes from the scattering of the ultrasound from the rough surface [20].

Conclusions
While it is currently known that extreme value analysis can be used to model the thinnest areas of a component and to extrapolate to the condition of much larger areas that are exposed to the same degradation mechanism, there is currently no standard methodology to sample the minimum thickness from an ultrasonic inspection thickness map. This paper has addressed this problem and describes an approach to sample the thickness minima by looking for evidence that the assumptions made by EVA are reasonable the dataset that is being considered.The algorithm was applied to a large number of surfaces with both Gaussian and exponential height distributions. It successfully selected a block size for the majority of surfaces and generated extreme value models which provided good descriptions of the data.
Smaller block sizes correspond to larger samples of thickness minima (100 minima), whereas larger block sizes resulted in smaller sets of thickness minima (16 minima). It was found that the variation in the quality of models generated using the smaller block sizes is larger than that for the larger block sizes, which is the result of some of the minima not being extremes of the distribution. In contrast, larger block sizes ensured that all the thickness measurements in a sample were extremes. However, the smaller sample size lead to increased uncertainty in the parameter estimates for the extreme value models. For the majority of surfaces the algorithm finds a balance between these two cases.
Additionally, the algorithm is capable of processing real ultrasonic thickness measurements. It successfully selected a block size for three ultrasonic thickness scans of correlated Gaussian surfaces with RMS height of 0.3 mm and a correlation length of 2.4 mm.
The model provided a good description of the inspection data. Furthermore, congruent to previous findings [20], it was shown that extreme value models constructed using ultrasonic inspection data can overestimate the severity of the damage across a component because of the physics of the inspection technique.