Autonomous ultrasonic inspection using Bayesian optimisation and robust outlier analysis

.


Introduction
Non Destructive Testing (NDT) is an integral part of the assessment of quality and structural integrity in engineering components in-service and post-manufacture. Currently, most NDT is carried out manually, however, recent increased attention to robotic-based inspection [1] promises to change matters. This use of robotics is a major boon to inspecting high-value and/ or safety-critical assets; it also removes humans from harm when inspections are needed in harsh and dangerous environments.
Whilst the motivations for robotic NDT are clear, and research is under-way [1,2], little or nothing has been done in the way of performing autonomous, as opposed to automated, inspections. In an automated inspection scheme, the robot path is programmed prior to the inspection, and the robot makes no decisions regarding which areas of a component it should prioritise. The data collected has to be reported back to a human, or to further post-processing algorithms in order to assess the https://doi.org/10.1016/j.ymssp.2020.106897 0888-3270/Ó 2020 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). state of the component. In a truly autonomous scheme, a robot would make its own decisions regarding the path it should take, and should also continuously perform its own critical assessment of the component: does it contain a defect or not, and with what probability? Has enough area been inspected, and are there critical areas that need more attention? The purpose of this paper is to formulate an algorithmic framework for such autonomous inspection. This paper will focus on the particular use of ultrasound-based NDT, due to its current widespread use within engineering industry. However, the reader will recognise that the ideas presented are applicable to other types of testing. The proposed algorithm is demonstrated here using an example dataset originating from robotic inspection of a carbon-fibre wing panel with two known areas of delamination, as described in detail in [1].
The rest of this introduction will provide a general overview of the algorithm, with the following sub-sections providing a general background to ultrasound-based NDT, the challenges of autonomous inspection, as well as spatial sampling, which is closely related to the algorithm presented here.

Algorithm overview
The first question to be asked is, what is the objective of the robotic inspection? Here, this objective is defined as: determining, with high confidence, whether the component being inspected contains a defect, flaw or damage of a given minimum target size, ' and with the fewest possible number of measurements. Given such a clear objective, it is possible to cast the autonomous inspection problem as an optimisation problem. After an observation is taken, the optimiser should decide at what position the next observation should be taken next, so as to maximise the given objective. These decisions are thus carried out sequentially as information is being gathered. One method that deals particularly well with this sequential design problem is Bayesian optimisation, which will be discussed in detail in Section 2. The general idea is to fit a Bayesian nonlinear regression model -such as a Gaussian process (GP) -to the objective function. This allows one to infer the objective at unobserved locations. As GP regression is a Bayesian method, it can not only predict the objective at unobserved locations, but also estimate the uncertainty in the prediction. This allows one to place observations in areas of both high uncertainty and with high values of the objective function. This strategy maximises the information gain of every single observation that is taken, it can be carried out sequentially and in an autonomous fashion.
For optimisation to work, a good objective function is required. Here, the objective of the problem is to detect defects, so the objective should capture information regarding whether a particular area contains a defect or not. This task can be cast as a problem of novelty detection [3]; the basic idea being to model the probability density of data known to belong to an undamaged class, and to then compute a distance between new observations and the mean of that density. Here, the data are assumed to be Gaussian, and the Mahalanobis squared-distance (MSD) will be used to provide a novelty index. The MSD deals well with multivariate data, and is thus useful in the context of analysing NDT data. If the highest values of a novelty index are found, these correspond to the most abnormal observations with respect to the rest. One of the key novelties of this study will be the use of novelty indices as the objective function to be optimised.
Combining Bayesian optimisation, with a novelty index seems like a good idea; however, the novelty detection must be carried out both on-line (since it must guide a robot) and must also, when capturing a reference undamaged set, ignore abnormal observations. If outlying observations are taken into account for training of the reference set, the novelty detector will become biased towards them and subsequently mask potential damage sites. A remedy to this problem is to use robust statistics, and in particular, the Minimum Covariance Determinant (MCD): a robust method for estimating the parameters of a Gaussian distribution. The MCD is discussed in detail in Appendix A; it attempts to discriminate between outlying and inlying observations when outliers are present in the training set, and thus does not bias the reference set toward the outliers. This enables an autonomous inspection algorithm that is entirely data-driven -it does not use external information to form a model of a healthy component. An overview of the autonomous inspection scheme is illustrated in Fig. 1. Step one involves data collection and feature extraction from raw data.
Step two involves modelling these features using robust outlier analysis in order to establish a threshold. Steps three and four form the Bayesian optimisation step, first modelling the objective function with a GP, and then choosing a location that maximises information gain.
A major assumption made in this autonomous inspection strategy is that the physical area being scanned is nominally homogeneous. It is assumed that variability in the material and geometric characteristics, together with any measurement/computation noise, can be modelled by a Gaussian distribution. If a component did have two sections with different characteristics, this would add multiple modes to the underlying probability distribution of undamaged class features, and render the MSD ineffective. This issue could still be dealt with via a clustering analysis, and appropriate novelty indices exist that capture this situation well [4]. However, in this scenario of actively looking for data that are dissimilar, it would not be possible to distinguish between damage and a benign change in normal properties. The key here is to use prior knowledge. A good use of prior knowledge is to assume the robot knows which sections of a component are expected to be nominally the same. This can be achieved by performing independent inspections over areas that are known to be homogeneous in their material and geometrical properties. If a component is homogeneous, there is no problem. If not, one can elicit information from Computer-Aided Design (CAD) models, as already demonstrated in robotic path planning in NDT inspections [2]. In this paper, the emphasis is placed on achieving autonomous inspection, through the use of Bayesian optimisation and robust outlier analysis.

Ultrasound signal features
This paper will use example data from ultrasound-based NDT, due to its current widespread use in practice. This section will provide the reader with an understanding of the type of signals, and the features extracted from such signals, as well as the type of physical testing required to collect ultrasound data.
A typical ultrasound pulse is shown in Fig. 2a, with two time indices marked as t a and t b . These times correspond to reflections from the front and back wall of the inspected specimen respectively. A pulse of this kind effectively constitutes an Ascan. The important information is the time difference t f ¼ t b À t a , and this is often referred to as the ultrasonic Time of Flight (ToF). ToF can be related to the thickness of the specimen, if the propagation speed of bulk waves for the material is known. Another feature of interest is the ratio xðt a Þ=xðt b Þ (where xðtÞ is the measured amplitude of the ultrasound pulse), as this contains information about the attenuation of the wave as it travels through the thickness of the component. An A-scan thus gives information about a single physical coordinate on a surface. A B-scan can be formed by collecting a series of A-scans along a line (illustrated in Fig. 2b), while a C-scan is formed by collecting a series of B-scans, to give a two-dimensional grid of ultrasound pulse information (illustrated in Fig. 2c). Note that higher times of flight in Fig. 2c imply greater plate thickness. The scanned component in Fig. 2c is a wing panel, the salient features visible are two visible regions of delamination, the stringers and the variable panel thickness. The stringers are evident by a low ToF index, due to their thickness being higher than the maximum value that can be captured within the collected data.

Towards practical autonomous inspection
In practice, A, B and C-scans are often performed manually, with the aid of engineering expertise, in order to carry out four key tasks: 1. Physically scan the structure with an ultrasound probe. 2. Interpret the results of the collected dataset and form a decision over the quality of the data, and the state of the component. 3. Decide which areas, in any, of the component need re-scanning. 4. Provide an assessment of the best course of action, and recommend an appropriate repair strategy.
Recently, there has been a move towards an increased use of robotics for automating the process of acquiring the necessary data, where ultrasound data acquisition hardware is combined with robotic interfaces in order to automate the physical scanning of the structure (task one in the hierarchy above). Examples of this can be found in [5][6][7] and more recently in [1,2], where problems such as path planning and integrating CAD geometries into the scanning process have been addressed. Other important issues, such as dealing with uncertainty in the positioning of autonomous robots, have been addressed in [8] using Bayesian filtering methods.
The interpretation of results, the second task in the hierarchy, has received a noteworthy amount of attention in the NDT research community. Research has evolved from investigations into extraction of useful features from the raw data [9][10][11], into the use of machine learning and novel statistical inference methods applied to those features in order to detect, cluster and classify defects [12][13][14][15][16].
The third task in the hierarchy involves deciding appropriate places to scan, based on information collected so far about the component. The continuous path-planning problem using optimisation has been investigated before in a general context [17], but significantly less research has been carried out towards this end in NDT. The state of the art in path planning for robotic ultrasound inspection, at the moment, is to use CAD models to plan the scanning trajectory, where research has focused around issues with the feedback control of the ultrasonic probe head [2].
The objective of performing a scan is to find defects that are larger than a minimum given size. The current solution in pulse-echo ultrasound measurements is to define a spatial grid with a high enough resolution to capture this minimum expected flaw size. Whilst this approach has worked so far, there are two major issues: 1. The scan can take a long time, and there are cost implications associated with this. 2. A large quantity of data needs to be stored and post-processed later.
These effects are exacerbated when large scanning areas are involved, as is typical in aerospace industries. Depending on the motivation for the inspection, the scan time may not present an issue; for high-value, safety-critical engineering components, large scanning times are acceptable. However, the data storage and processing requirements of a dense grid over a large area can make the problem intractable.
The data storage requirement can be alleviated by compressing the raw data for later post-processing, leveraging advanced signal processing techniques such as wavelet transforms [18], sparse signal modelling [19], and more recently compressive sensing [20].

Spatial sampling
Efforts in developing compression algorithms for time domain data will do little to address the problem of dealing with large quantities of spatial data. An important question is: what is the minimum number of observations required to capture the spatial statistics of the data collected from the component? This problem has been studied in a number of research fields, (see [21] for an overview). The most active research communities in this field are those of geo-statistical modelling, environmental modelling [22], and disease mapping [23], where the motivations for studying spatial sampling schemes arise from the fact that measurements can be sparse; this raises the need to interpolate data in regions with low observation density, while quantifying the uncertainty around those predictions.
In spatial statistics, the features of interest are treated as random variables across space; they can be separated into two categories, those which can be assumed identically and independently distributed (iid), and those that are autocorrelated across space. NDT data falls in the latter category, where values of NDT data features in homogenous materials will tend to be similar to those observed a small distance away (there is some experimental support for this [24]).
Modelling the spatial correlation of stochastic processes is central to the task of interpolation in spatial analysis [25]; the correlation quantifies the influence of a variable on itself at a distance r away. In spatial statistical analysis, this influence is modelled through the use of a covariance, or correlation function. The quality of spatial interpolation, based on correlations or covariates, relies heavily on the properties of the covariance function used; it should encode the overall smoothness of the random field being modelled, as well as the overall length-scale over which this random field correlates. Analysis of spatial variation has been of interest in NDT, and has been used as a tool to assess optimal resolution and scanning strategies [26,24].
When an application is risk-sensitive -which is the case in NDT data analysis -it is critical to use methods that are probabilistic. Gaussian Process (GP) regression is adopted here as a flexible tool for Bayesian, nonlinear and nonparametric regression; it will be discussed briefly in Appendix B. In previous work, GPs were used to replace more tradition empirical variograms when smoothing NDT data features across space. An interesting application of spatial modelling to outlier analysis is presented in [27] where the GP was modified to derive 'extreme spatial processes.' In this work, the GP is used to fit a surrogate model to novelty scores from a robust outlier analysis procedure; this approach is novel compared to other work in this area, where GPs are usually used to fit data features directly.
One important aspect of spatial variation and NDT data analysis is that of Probability of Detection (PoD); the likelihood that an object will be detected in a particular measurement scenario. Its origin can be traced back to the development of radar, but in the present day it is widely used to quantify the reliability of NDT processes [28]. PoD curves describe the probability that a particular NDT system will detect a flaw as a function of the flaw size. Traditionally, these curves are built experimentally, by performing blind scans of calibration pieces with flaws of various sizes, and quantifying the false negative rates [29]. PoD is affected by a number of factors: measurement noise, spatial scanning resolution and the scattering characteristics of the expected flaw with respect to the ultrasound wave. Evidently, the spatial resolution of a scan will affect the overall probability of detection. The problem of evaluating PoD curves from spatial random fields that exhibit autocorrelation has been investigated in [30]. This is interesting from the point of view of the current work, as it links the idea of using a predictive model to the computation of the more classical assessment of reliability of the NDT process based on PoD. While in [30], the surrogate being used is Polynomial Chaos model, this paper models spatial variation using a GP; the analysis of PoD, however, would follow similar lines to those of [30].

Contributions of this paper
The novelty of the work here stems from the combination of spatial statistics with novelty detection, which directly enables an optimised acquisition, in terms of minimum number of measurements, through the framework of Bayesian optimisation. The acquisition of data is sequential and performed fully autonomously, with the system deciding the next optimal point to scan, guided by estimates of uncertainty and high discordancy values.

Outline of the paper
As explained in Section 1.1, and illustrated in Fig. 1, the autonomous inspection algorithm consists of two main ingredients: 1) setting a robust threshold based on MCD, and 2) Bayesian optimisation to guide the next scan point. Following a short overview of some necessary theory, a detailed description of the proposed method is found in Section 4. Throughout the paper, illustrations of the various aspects of the procedure are drawn from ultrasound data collected from the aerospace panel discussed in Section 1.2. These provide subsets of the data illustrated in Fig. 2. Section 5 provides a more detailed experimental investigation of the autonomous inspection algorithm, using this data set. Finally, Section 7 will present some conclusions and directions for future work.

Bayesian optimisation
It is prudent to begin by discussing Bayesian optimisation and how such a scheme can used to select appropriate scanning points, based on the features generated by the robust outlier analysis. Bayesian optimisation is used primarily in situations where it is expensive to evaluate the function being optimised, as is often the case in the field of Design of Experiments (DoE). The Bayesian optimisation technique presented is based upon the GP as a Bayesian nonlinear regression model. However, before delving into mathematical details, the motivation and reasoning for selecting this type of optimisation algorithm over others will be described first.

Reasoning and motivation
In a move toward autonomous inspection, the task becomes the selection of optimal points to scan, or 'query', in order to achieve an estimate of the condition of the component to within a specified confidence level. In this paper, this task is addressed as a problem of optimisation. As discussed in the introduction, the objective being modelled is the distribution of novelty indices across the specimen. These indices are calculated by means of a robust outlier analysis, the FastMCD described in Appendix A.
In general, the optimisation task is to find the arguments of a function that maximise (or minimise) an objective (or cost) function. If at the end of a scan, one analyses the data and finds that the maximum novelty score falls under a novelty threshold, then confidence is established in absence of anomalies. Alternatively, if a significant number of observations lie above the novelty threshold, this indicates a problem. The task of inspection is thus a search for regions that differ most from the rest, in showing the highest novelty scores. The problem of searching for outliers in a spatial domain can, therefore, be cast as an optimisation problem, where the objective, or cost function being maximised is the novelty score.
Optimisation is used throughout science and engineering, often to determine parameters in statistical or physics-based models. If the objective function of interest is differentiable, a popular technique is gradient descent, which takes steps in the steepest directions on the objective function surface. While this is often effective, it is not suitable for the problem here; there is no reason to believe that taking the direction of greatest change in the novelty score will lead to the location of greatest damage. The optimisation is, therefore, chosen to be a global and gradient-free method.
Because of the nature of inspection, it is desirable for the optimisation method to quantify uncertainty in its predictions; this would allow the user to tune the search according the acceptable risk in a given inspection problem. Bayesian techniques are well suited to this end. One example of a Bayesian approach has previously been investigated for ultrasound test parameter optimisation [31], a related problem to that of finding an efficient search path. The issues here are that the search must be sequential, and without the aid of a forward-model; Markov Chain Monte Carlo (MCMC) methods will not do well with these two requirements. An advantage of Bayesian approaches to optimisation, that fit a surrogate model to the objective function [32,33], is the reduced number of evaluations of the objective required. These methods are important when function evaluations require expensive simulations or experimental campaigns.

Bayesian optimisation Background
Bayesian optimisation can be cast as a sequential design problem: given all the observed data so far, D t , 1 what is the ''optimal" point to observe next, such that maximum information gain is achieved? The information gain is estimated directly from the surrogate model, by forming a posterior distribution over the objective function, denoted here as f, given the data observed so far D t . This includes both the coordinates x and observations y, so D t ¼ fx t ; y t g. This posterior can be written down as, This distribution contains useful information about where to place the next observation, in terms of both areas of high uncertainty as well as areas of high values of f. The sequential search needs to balance between exploration and exploitation. Exploration involves searching in areas of high uncertainty, whilst exploitation involves placing observations in areas of high function values, in order to narrow down to the values of x that lead to globally maximising f. The difference is between highrisk high pay-off steps and low-risk low pay-off. Bayesian optimisation balances exploration and exploitation through the use of an acquisition function that evaluates an expected utility uðxÞ. The approach adopted here is described in Algorithm 1; it iterates over evaluations of the posterior over the objective, pðf jD t Þ in order to find points that maximise uðxÞ, and then acquire new data at those points.

Algorithm 1 Bayesian Optimisation Algorithm
The objective function being assessed is modelled by means of a standard Gaussian process regression, a short review of which can be found in Appendix B. With regard to the choice of acquisition function this is a user choice in the algorithm, two popular options are the probability of improvement (PI) [34] and the expected improvement (EI) [35]. The probability of improvement seeks to maximise the likelihood of the next assessment of the objective function improving its value, EI assesses the expectation over an improvement function which can be shown to balance, better, exploration and exploitation in the optimisation routine [35]. For this reason, EI is chosen for this work.

Experimental setup
An 800 mm Â 550 mm composite panel was scanned using a six-axis robotic head, with a water-coupled ultrasound probe. The probe consists of 64 transducers, each of which fires a 5 MHz tone burst, and also acts as a receiver. The resolution of the scan can be adjusted, but for these results, the speed of the probe was set to yield a spatial resolution of 0.8 mm in the direction of the probe travel. The C-scan shown in Fig. 2c was generated using this data set. Details of the experimental procedure have been published in [1], to which the reader is referred for further details.
In order to capture the range of different depths of this composite specimen, an acquisition time of 9.6 ls was used. This equates to 480 samples at a sample rate of 50 MHz. The original ultrasound pulses collected through the water-coupled probe are susceptible to misalignment with respect to the arrival time of the first burst (the front wall reflection); this is evident, for example, in the illustrative B-scan shown in Fig. 2b.
Two data features extracted from the ultrasound pulses were used: ToF and attenuation, between the front and back wall of the specimen. In order for the misalignment across B-scans not to affect the data features, the ToF and attenuation were both estimated using a Hilbert envelope of the autocorrelation of each pulse, where the location of the maximum lag specifies the ToF and the attenuation is extracted from the argument of the envelope at that time lag. Note that these computa- tions could be performed directly in a compressed domain, thus requiring an even smaller quantity of data, adopting the techniques developed in [20].

Autonomous inspection
The proposed strategy for selecting scanning points in a sequential fashion is now discussed; this is presented as a Bayesian optimisation problem, where the objective function being maximised is the novelty index of the measured data. The novelty index describes the dissimilarity of a measured point against the rest of the group, and it is computed using robust measures to ensure that the reference model is not biased by potential outliers due to measurement noise or damage itself. The optimisation is performed over this novelty index to identify areas of interest. The result is a scan that minimises uncertainty across the spatial field while focussing on areas of potential damage. The goal is to achieve this with the least possible number of measurements. The implementation of the autonomous inspection algorithm involves important practical aspects such as initialisation, computation of damage thresholds and probability of damage.
The main iterations follow Fig. 1, where the four main elements of the iteration are: The overall search algorithm, incorporating the robust estimates of mean and covariance of NDT data features into the Bayesian optimisation framework is shown in Algorithm 2. 2 The feature vector y could be any damage sensitive feature, like the ToF or attenuation mentioned previously; however, the MSD accommodates multivariate features, so high-dimensional objects like waveforms could work in principle [36]. In this respect, the overall algorithm presented is quite general.
An illustration of the application of Algorithm 2 is presented in Fig. 3. The one-dimensional section shown was chosen to cut across two regions of the panel containing damage. The example shows the result of the algorithm from several iterations, initialised with five random samples. Overlaid on the plots, in light blue, is the EI; its maximum indicates the point where the next observation is to be taken. Fig. 3 shows that, as expected, the algorithm balances exploration and exploitation. In the initial stage, it places more observations at high novelty index values, but then moves away to explore areas of low novelty index values but high uncertainty. As more observations are gathered and areas of low novelty are explored (and thus its variance reduced), the EI starts to weigh more heavily those locations with high function values. On the last instance shown in Fig. 3, observations have been placed evenly in areas of low novelty indices, and there is a higher density of observations placed around the areas of delamination. Also in this last instance, the EI indicates a clear priority for high novelty indices, meaning that if the algorithm were to run more iterations it would concentrate around areas of damage.
The mean and covariance of the data are re-estimated at every iteration of the algorithm in a robust manner using FastMCD. One should note that the MCD method defines inliers as those representing at least 50% of the data. These two points together imply that placing more than 50% of the observations around damaged regions will result in a shift where the damage area defines the inliers, and the data for rest of the undamaged component are defined as outliers. This outcome is of course undesirable; as a remedy, any observations flagged as outliers after the initialisation step must be excluded from the set used to re-estimate the mean and covariance of the data.

Algorithm 2 Autonomous Inspection
Initialise l and R and z in with low resolution C-scan Acquire data at x t y mahalðz in ; l; RÞ .Compute novelty scores z in z½y t < T .Update outlier-excluding set l; R FastMCDðz in Þ .Update robust statistics D t fD; ðx t ; y t Þg .Include new observations end while

Estimating probability of damage
An observation is flagged as abnormal if its novelty score is higher than the damage threshold. The GP provides a probabilistic estimate of the novelty scores over a two-dimensional spatial field. The probability that a given novelty score at spa-tial coordinate x corresponds to damage, is given by the probability that its novelty score exceeds the threshold T; this can be obtained using the mean and variance estimates from the GP, which define a local Gaussian distribution for the novelty scores through m t and v t .
The probability that this uncertain measurement lies above the threshold, is thus given by the integral of the Gaussian probability density function, which can be evaluated from the cumulative density function of the Gaussian, This quantity is assessed at every spatial coordinate for which a GP prediction is available. When the variance of the GP is high, the probability of damage will be higher due to lack of information. Note that this measure is sensitive to the definition of T, so an estimate of the threshold should be used that truly captures the variability of the maxima of the novelty indices. The probability of damage here is the probability that the GP prediction exceeds the threshold set as the highest novelty index that would be attributed to a normal-condition process. Fig. 4 provides an illustration of the problem using a small subset of the ultrasound NDT data from the wing panel. On the left, Fig. 4a shows a set of MSDs corresponding to the training data. Note that no values above the threshold are shown because, as per Algorithm 2 and the above discussion, these observations are discarded for the purpose of building a reference covariance model. On the right, Fig. 4b shows four different predictions of a GP, corresponding to 1) a low mean with low variance, 2) a low mean with high variance, 3) a high mean with high variance and 4) a high mean with low variance. Overlaid on the probability densities is the probability of damage, computed using Eq. (3).
This figure provides an intuitive view of this quantity; it is simply the amount of total probability mass above the threshold. In the case of an inlier GP prediction with low variance, this quantity is close to zero, and one can be confident that the observation is not an outlier. If variance increases, a more significant amount of probability mass lies above the threshold, and hence the probability of damage rises, even though the mean may still be within the threshold. In the case of outlying means, a high variance will tend to lower the probability of damage, while a low variance will have the effect of moving it closer to unity.
Note that the GP prediction is characterised by a Gaussian distribution; its tails decay to infinity, so there will always be some small probability mass either under or over the threshold, therefore never allowing a probability of damage of zero or unity.

Two-dimensional example
An illustration of the autonomous inspection procedure on a two-dimensional dataset is shown in Fig. 5 (a subset of the data shown in Fig. 2). As before, ToF and ultrasound signal attenuation were used as features. The section shown in Fig. 5 is the region around the delaminated area of the composite, on the top section.
The figure shows the evolution of the location of the observations as data is gathered, illustrating the mean and variance of the GP, and the probability of damage derived from them. Note that as soon as a region of high novelty values is discovered, the search prioritises exploitation around this region, until the uncertainty around it is significantly reduced, and it becomes more worthwhile to explore elsewhere. Note that the result of the search is close to a grid search in regions of low novelty, with a resolution given by ', while a higher resolution of observations is placed around potentially-damaged regions.

Full scanning procedure results
Specimens containing different types of realistic damage are rare, so to demonstrate the full algorithm across the whole specimen, it is divided into eight regions with and without damage. These regions are shown in Fig. 6. Note that only regions  from nominally-similar areas of the composite specimen are considered here. These are the thin wall sections. There are two areas of known delamination in this region, as well as areas free of damage.
The probability of damage resulting from the application of the autonomous inspection algorithm is shown in Fig. 7, as contours overlaid over the spatial coordinates of each of the eight sub-sections analysed individually. The damage probability contours correctly localise the two known areas of delamination with a near 100% probability of damage. The two delaminated regions have cross-sections of no less than 50 mm. As such, the result is not too surprising, given that a length-scale of ' ¼ 10 mm was used for this search, so any regions with observations at a spacing much more than ' will develop high uncertainty, and thus be prioritised by the search. This makes the identification of damage in regions two and six almost guaranteed. More interesting is the fact that the small area of delamination that was caught up in zone three has been detected with high probability, even though the maximum size of the delaminated area, (as observed through the original high resolution C-scan) has an overall maximum length of 5 mm.
As observations are collected sequentially, one is interested in the evolution of the probability of damage for every region. This is shown in Fig. 8. For each region of inspection, the interesting metric is the overall maximum observed damage; in order to summarise this, the 99th percentile of the estimated damage probability per region is plotted in Fig. 8. Note that this 'test' was run with a maximum budget of 500 observations per region, although some terminated early, due to the lowest predicted GP variance reaching the estimated r 2 n . In zones two and seven, where a probability of damage of 100% has been estimated, termination occurred as the algorithm attempted to collect data at a previously observed location, of high novelty index value. This is an artefact of applying the algorithm to pre-recorded data, as there is a finite discretisation grid. The only region that did not converge in this case is region eight, albeit with a very small probability of damage, less than machine precision.
During the first 100 observations, the damage probability tends to jitter; this is due to the fact that the robust statistics were re-estimated throughout the first 100 observations, and held fixed afterwards. Also, in regions with small manufacturing imperfections (evident as small round features, predominantly in zones one and four), the 99th percentile of the damage probability tends to jitter, instead of converging to a small value.
The regions with damage were found here with a very low number of observations. Any more observations from that point onwards only helped to confirm the presence, location and extent of damage (exploitation, in the Bayesian optimisation context), as well as exploring the remaining space, once the damage had been characterised well. Conversely, regions without damage took more observations to converge to a low damage probability, as it simply took more observations to reduce the uncertainty along all spatial coordinates by means of exploring the space. It is also interesting to see that in zone six, where only a 5 mm region of damage exists, the damage probability did not reach a level close to 100% until around 300 observations were acquired. Note that in this case there is a much larger spread in the resulting maxima of the damage probability, and it would take further observations to reduce this. The advantage of applying this autonomous inspection scheme is clear when one considers that the original C-scan data for each zone consisted of approximately 39,000 observations (individual ultrasound pulses, or A-scans). Performing the data collection sequentially using the Bayesian optimisation scheme reduces this significantly, while at the same time providing the required information in the form of a damage probability.

Discussion
A number of key factors affecting the algorithm implementation are discussed in this section.   9. Effect of length-scale hyperparameter on uncertainty predictions of the GP. Example shows 3r bounds for four GP posteriors, conditioned on two observations, using increasing length-scales. A Matérn 3=2 covariance function was used.

Initialisation
A sensible implementation of the approach suggested here requires a number of initial scan points before optimisation can begin-this is a requirement from both the outlier analysis and the Bayesian optimisation.
Considering the robust estimation of the mean and covariance of the novelty indices, an issue arises from the dimensionality d of the features extracted from the data. The lower bound on the number of observations is twice the dimensionality of the feature vector, for the covariance to be well defined. From this point of view, at least 2d observations are required prior to commencing the autonomous scan. The issue is exacerbated by the fact that some of the observations may be potentially discarded from the set used in the FastMCD algorithm to compute the covariance.
A number of initial observations are also required to begin the Bayesian optimisation. For the GP to be contributing information, at least one observation is required. Even in the limit of such low observations, this one shot sequential design results in an initial GP model that is performing pure extrapolation at the initial stage of the algorithm. The GP predictions of where high function values and uncertain regions lie will tend to be poor in this extrapolation setting. The result would be an optimiser that takes much longer to converge, i.e. many more scans will be required.
It is beneficial to initialise the algorithm with a small number of points, by performing a low resolution scan. In the field of Optimal Experimental Design (OED) a popular choice is Latin Hypercube Sampling (LHS) [37]. The main feature of LHS is that it ensures that samples are well distributed over the space of each individual dimension. One of the drawbacks of LHS is that it tends to underestimate the variance of the objective function [38], this can have a serious impact on the performance of the objective function as a novelty detector. For this reason, random sampling from a uniform distribution, in the interval between the minimum and maximum x; y coordinates is used. This is sub-optimal in terms of space-filling, but is known to provide an unbiased estimate of objective function variance [38], which is ultimately more useful here.
Given the discussion above, the problem of exactly how many samples are required for initialisation translates to a problem of finding the number of samples n required to correctly capture the parameters of a Gaussian distribution of dimension p. The recommendation in [39] is to use a sample size of at least 14p in order to avoid non-convexities in the log-likelihood function used for estimation of the data feature covariance matrix. This specifies the minimum number of samples necessary to have a well-determined problem. The minimum number of samples required to correctly capture the variance of the novelty scores is a different one, and will in practice tend be greater than 14p. The important aspect is to gather enough samples to correctly capture the variance of the novelty indices, so in this paper, Monte Carlo samples are gathered until the novelty index variance converges to a stable value. In the wing panel ultrasound data used in this paper, in practice, 100 samples appeared to yield a stable variance.

Hyperparameter selection for GPs
Even though the GP is a 'nonparametric' regression technique [40], its behaviour is dependent on a small number of hyperparameters in the covariance function; the signal variance, length-scale and noise variance. Their values can greatly influence the result of the optimisation. It is essential that they are chosen in a sensible manner, since their values will influence the sensitivity of the proposed method. It has become commonplace in the GP literature to set the hyperparameters through an optimisation of the marginal likelihood of the GP [40]. However, it is also possible to set these hyperparameters based upon some prior knowledge about the function being modelled. In the case of the autonomous inspection problem this is the approach used.
The signal variance of the GP, r f , can control the exploration-exploitation balance of the optimiser. Because r f , defines an upper limit on v t , if it is set too low, even with few observations, most areas of the space will default to r 2 f . The result is that the acquisition function (EI, or a similar measure) will choose to only scan at points of high mean values, thus performing pure exploitation; this would lead to a high resolution scan around areas where flaws are suspected (high m t ), but an underexploration of the rest of the space. In contrast, setting a high value of r 2 f would result in pure exploitation, as it would lead to extremely high uncertainty in the space observations. A middle ground is clearly necessary, and one reasonable solution would be to scale the data to unit standard deviation, and set the scaling hyperparameter to a reasonable value such as r 2 f ¼ 1. However, the data in this case are novelty scores, evaluated sequentially, so the rescaling of the data would also need to happen sequentially, introducing another layer of complexity.
The length-scale of the covariance function ', controls the rate in the input space at which the variance grows as one moves away from observed values, i.e. the rate of return to the prior. In the problem of spatial sampling this mediates the size of the region for which a sample can be said to be informative and as such is the most critical hyperparameter. A short length-scale implies that a sample only provides information about a very small region and vice versa. In this problem of spatial sampling, smaller length-scales will result in the GP variance growing at higher rates between observations far apart, while long length-scales will result in the GP predicting a lower variance between those same observations. This point is illustrated in Fig. 9 where the variance is shown for a GP conditioned on two observations for a range of growing lengthscales.
In view of this, the length-scale can be seen as effectively controlling the risk one is willing to accept regarding detecting a specific flaw size. This is the tuning parameter of this algorithm that will have the largest impact on the resulting data acqui-sition, and overall probability of damage estimates. The data acquisition will be altered through the effect of the length-scale on the acquisition function. Short length-scales will mean observations will be placed closer together, and vice versa.
The recommendation, in this work, is that the length-scale used should based on the minimum expected flaw size. The form of the covariance function used can shed some light into what this implies. For instance, the (unscaled, r 2 f ¼ 1) Matérn 3=2 function used in this paper, yields a correlation of 1=2 at a distance ' apart, and a sharp decay in correlation beyond that. This means that good interpolation is expected from the GP at a distance of at least ' and placing observations closer to this is inefficient, whilst placing observations further away increases risk. In such a way the length scale of the process can be set a priori by the user on the basis of their requirements.
If a component were to be scanned that contained no anomalies whatsoever, this autonomous inspection scheme based on Bayesian optimisation should result in a scan with an average spatial resolution of ' (the acquisition of data would change when high novelty scores are found).
The other two hyperparameters here also play an important role in the uncertainty quantification given by the GP i.e. the process variance r 2 f and the noise variance r 2 n . The process variance will place an upper bound on the total possible uncertainty; it effectively specifies prior information on the uncertainty and defines the variance in the absence of any observations. The noise variance does exactly the opposite and places a lower bound on the uncertainty by specifying the variance around regions where many observations exist. The noise variance thus provides an indication of the inherent variability in novelty indices that is to be expected by taking the same observation multiple times at the same location.

Convergence
The lower bound on the variance of the GP prediction given by r 2 n is clearly important from the point of view of convergence criteria. If all the regions of high function values have been explored and the uncertainty across the whole range of possible measurement locations has been reduced to a level close to r 2 n , then there is little more benefit to be gained from further observations, and the search should be stopped. An estimate of the noise is available from the FastMCD.
The length-scale controls the total number of observations required to achieve a given spatial resolution (Section 6.2), shrinking the uncertainty across the specimen to the lower bound. Short length-scales lead to low convergence rates and vice versa. This effect is illustrated in Fig. 10, where the average variance and its 3r bounds are shown for 800 iterations of Algorithm 2 over a two-dimensional sub-section of a composite specimen.
The final factor affecting when to stop scanning is the computational burden of the GP, this is not directly addressed in this paper although it is discussed (generally) in detail in the literature [41,42] and specifically for spatial data in [43].

Conclusions
A fully data-driven framework for autonomous inspection based on a Bayesian optimisation over robust novelty indices has been presented and demonstrated on an NDT data set consisting of ultrasound measurements on a high-value aerospace composite specimen. The specific specimen was used as a demonstration as it contains both damage-free and delaminated sections. The algorithm is data-driven, making use of robust inference of the mean and covariance of the NDT data features. This means that outliers that are present in the data set are identified and excluded from the estimation of the data statistics. The inspection scheme presented here is probabilistic, and it is aimed at minimising risk against an expected flaw size. The risk minimisation is provided by utilising Bayesian optimisation techniques, which also allow this algorithm to run sequentially. The impact is that the damage inference can be performed on-line and terminated either when no further information can be gained from the scan or when damage has been found. Bayesian optimisation is designed to find optimal points using the minimum number of observations possible. In the context of inspection, this implies that if a specimen does contain damage, or other anomalies, they will be found quickly, and the inspection will focus around these areas. If no damage is present the inspection will focus on minimising risk by means of exploring the space. This paper has focussed on the presentation of the algorithm. The key avenues of future work will be to address fully how this methodology can be implemented in real time and to demonstrate this on a full-system example. For this to happen it will be necessary to rigorously demonstrate the improvement in scanning speed over a pre-planned inspection path, to show how a computationally efficient form of the algorithm may be implemented and to demonstrate the effectiveness of physical selection of the hyperparameters of the algorithm. Overall, this algorithm enables autonomous inspection and it removes the human user decision-making regarding appropriate areas to scan, and instead replaces it with sequential decision making guided by on-line autonomous interpretation of the measured data.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. the smallest possible determinant that leaves out h observations from the training set, where h is the number of outliers. One advantage of the MCD over the MVE, is that the MCD is affine equivariant; location and scale parameters will remain constant under rescaling and translation of the data. Furthermore, the estimates of l under MCD schemes have been shown to be asymptotically normal [49]. Beyond these theoretical guarantees, one useful practical advantage comes from the existence of an efficient way to compute MCD parameter estimates: the Fast-MCD algorithm [50]. Exact MCD estimates are numerically expensive to compute, requiring an evaluation of n Â h subsets of size h. Fast-MCD uses an iterative scheme in order to achieve an MCD estimate with efficient computation, as described in Algorithm 3. The key step is to iteratively form h-subsets of the data matrix Y, which contain values of low Mahalanobis distance (Eq. (A.1)), against the current estimates of R and l. While the Fast-MCD algorithm is naturally greedy, it is guaranteed never to increase the determinant of the covariance, detðRÞ, at any iteration [50]; the stopping criteria for the algorithm is such that it continues as long as detðR k Þ -detðR kÀ1 Þ.
An application of covariance estimation with the Fast-MCD algorithm is given in Fig. A.11 for an illustrative synthetic data set. The data are samples from a two-dimensional Gaussian random variable, contaminated with outliers. The original distribution of the uncontaminated data is shown in light-grey, the maximum-likelihood covariance is plotted with a dashed line, while the Fast-MCD fit is shown with a solid line. The bias that the outliers introduce to the covariance is clear: they increase the scatter. The robust covariance gives a better approximation to the original density.

A.2. Application to NDT data
The Mahalanobis-squared distance (MSD), using robust estimates of the data density, is central to the scanning procedure presented in this paper, which combines Bayesian optimisation with outlier analysis. The application of robust outlier analysis to NDT data will be briefly discussed here, in order to make clearer the application of Bayesian optimisation, presented in Section 2, to the problem of outlier analysis.
The goal is to assemble a mean and covariance of a reference, undamaged condition, even when there are outliers present in the data, which as discussed above, is often the case.
The NDT features discussed in Section 1.2 -namely, the Time of Flight (ToF) and attenuation -will be used here to demonstrate the idea of novelty detection, using robust measures. In order to simplify the problem, a one-dimensional task is considered; data from a cross section along the y-axis of the aerospace panel shown in Fig. 2 has been used. 3 The ToF, together with the robust estimates (using Fast-MCD) for its mean and variance are shown in Fig. A.12a. This cross-section was selected as it contains two small regions where delamination of the carbon-fibre composite has occurred; these are evident as areas of much lower ToF compared to the rest of the specimen, due to the reflection of the wave from the delaminated section. Fig. A.12b shows the resulting robust MSD for all of the observations. The red horizontal line shows the threshold, which has been defined as the MSD of the 3r values of Y (just the ToF in this particular case). Any observations with an MSD above this threshold are defined as outlying or abnormal. The two regions where there is delamination have clear excursions above the threshold. Other outliers visible on the left side of the plot, are not at known damage locations, so it is likely that they have arisen from electrical noise or bad contact between the ultrasound probe and the specimen.

A.3. Threshold estimation
One important aspect of the algorithm, in particular the novelty detection part, is the definition of a threshold over the MSD above which observations are defined as outliers. There is more than one way of addressing the threshold selection problem; methods commonly used in SHM include the use of percentiles, Monte Carlo sampling [3], and extreme value statistics [51,52]. For simplicity here, the threshold was determined by computing the MSD of points six robust standard deviations away from the robust mean, along each dimension. This thresholding methodology makes use of the robust covariance computed by FastMCD, which has already been tuned for the separation of inclusive outliers.   Fig. A.11. For comparison purposes, the GP predictions are shown using two different covariance functions; the Matérn 3=2 of Eq. (B.7) and the squaredexponential kernel [40]. The squared-exponential is known to produce smooth functions, since it is infinitely differentiable. Fig. B.13 zooms in on a region around the boundary between an undamaged and damaged condition. The Matérn kernel deals better with this sudden change. The length-scales in this case were fixed to ' ¼ 10 and the noise variance was set at r 2 n ¼ 10. In this example, the squared-exponential kernel imposes unrealistic assumptions on the data, whereas the Matérn 3=2 kernel is better able to model the transition between undamaged and damaged conditions.