A Comparison of Linear Approaches to Filter Out Environmental Eﬀects in Structural Health Monitoring

This paper discusses the possibility of using the Mahalanobis squared-distance to perform robust novelty detection in the presence of important environmental variability in a multivariate feature vector. By performing an eigenvalue decomposition of the covariance matrix used to compute that distance, it is shown that the Mahalanobis squared-distance can be written as the sum of independent terms which result from a transformation from the feature vector space to a space of independent variables. In general, especially when the size of the features vector is large, there are dominant eigenvalues and eigenvectors associated with the covariance matrix, so that a set of principal components can be deﬁned. Because the associated eigenvalues are high, their contribution to the Mahalanobis squared-distance is low, while the contribution of the other components is high due to the low value of the associated eigenvalues. This analysis shows that the Mahalanobis distance naturally ﬁlters out the variability in the training data. This property can be used to remove the eﬀect of the environment in damage detection, in much the same way as two other established techniques, principal component analysis and factor analysis. The three techniques are compared here using real experimental data from a wooden bridge for which the feature vector consists in eigenfrequencies and modeshapes collected under changing environmental conditions, as well as damaged conditions simulated with an added mass. The results conﬁrm the similarity between the three techniques and the ability to ﬁlter out environmental eﬀects, while keeping a high sensitivity to structural changes. The results also show that even after ﬁltering out the environmental eﬀects, the normality assumption cannot be made for the residual feature vector. An alternative is demonstrated here based on extreme value statistics which results in a much better threshold which avoids false positives in the training data, while allowing detection of all damaged cases.


Introduction
Vibration-based Structural Health Monitoring (SHM) techniques have been around for many years and are still today an active topic of research. Despite this fact, very few industrial applications exist. Two major trends coexist in the field: model-based and data-based techniques. Model-based techniques are often sophisticated and require a high degree of engineer- 5 ing knowledge and heavy hardware and software resources; they have however more potential to cover all levels of SHM, from damage detection to damage prognosis. On the other hand, data-based techniques are appealing as they require less engineering knowledge as well as limited hardware and software resources. From that point of view, they are ideal candidates for industrial applications. colorred These methods are however generally limited to the lowest The three basic elements of data-based damage detection are therefore (i) a permanent sensor network system, (ii) an automated procedure for real-time or periodic feature extraction, and (iii) a robust novelty detector [2]. The first element has received much attention in the last decade and the enormous advances in sensors and instrumentation make it possible today 25 to deploy very large sensor networks on structures and gather the measured data in central recording units at high sampling rates. The second element is still today a challenge, and for the most widely-used features (eigenfrequencies and mode shapes) is an active topic of research [3,4]. An alternative is to look at other features which can easily be extracted automatically from the time domain data. Several efforts have been made in that direction, 30 such as the use of residuals based on Hankel matrices [5], or peak indicators in the frequency output of modal filters [3]. For the third element, different approaches have been borrowed from statistics and machine learning, the most commonly used being control charts [6], outlier analysis using the Mahalanobis squared-distance [7] (which is similar to Hoteling T 2 control charts for individual measurements) or hypothesis testing [5]. 35 This paper focuses on the third element of the data-based damage detection system in the presence of confounding effects due to environmental and operational variability. Important variability of the dynamic properties of structures under ambient vibrations has been identified in a number of studies [8,9,10,11]. These studies highlight the fact that variations in the dynamic characteristics due to confounding effects (temperature, humidity, traffic, wind) can 40 be of the same order of magnitude as, or greater than, the variations due to damage, which hinders detection of the onset of damage. In the context of data-based damage detection, which is the focus of this paper, several methods have been proposed in order to filter out these effects. The simplest methods are based on the identification of the linear subspace to which the environmental and operational conditions belong, in order to remove their effect on the monitored features. Such methods are well suited when the dimension of the feature vector is large enough to be able to find a linear subspace to which the confounding effects belong. When such is not the case, non-linear methods which consist in identifying a nonlinear manifold instead of a linear subspace can be used [12,13,14]. These techniques are however much more complex and computationally costly. The present paper has two aims: 50 (i) to show that the Mahalanobis squared-distance can be used to filter out confounding effects in a very similar way to the linear techniques, and (ii) to demonstrate the link between the aforementioned technique and the two most commonly used linear techniques to filter confounding effects: principal component analysis [15] and factor analysis [16]. This paper is organised as follows: the second section details the mathematical foundations of 55 the methodology proposed to filter out confounding effects based on the Mahalanobis squareddistance. The third section presents briefly the two most commonly-used linear techniques for filtering out confounding effects: principal component analysis and factor analysis, and shows the link with the method proposed in Section 2. In the fourth section, the three techniques are applied on data acquired from a wooden bridge under changing environmental conditions.

60
The results clearly demonstrate the strong similarities between the three techniques.

The Mahalanobis squared-distance
Consider a set of N feature vectors {y i } (i = 1...N ) of dimension n, representing N samples of the 'healthy state' of a structure, of which the mean vector {y} of size n × 1 and the covariance matrix [C] of size n × n can be estimated as follows, The multivariate feature vectors correspond to the features extracted from the vibration measurements such as a set of eigenfrequencies, modeshapes, FRF or transmissibility functions at given frequencies, etc. (Throughout this paper, curved braces denote vectors, and square braces denote matrices).

70
The principle of outlier analysis [7] is, for each sample of the multivariate feature vector {y ζ }, to compute the Mahalanobis squared-distance [17] given by, Computing D ζ for all the data in the training set used to compute [C], it is possible to set a threshold. If a new sample {y ζ } of the feature vector results in a value of D ζ above this threshold, it will be considered as an outlier. Outlier analysis can be inclusive in the case that the damaged data is included for the computation of the mean vector and covariance matrix, or exclusive in the case that only the undamaged data are used.
Outlier analysis using the Mahalanobis squared-distance has been used in various application fields such as tool condition monitoring [18,19], monitoring of cracking in concrete [20], monitoring of railroad tracks [21] or disease diagnosis [22]. This distance is also used for

Spectral decomposition
In most cases, the features in the data vector are not statistically independent, so that the covariance matrix is not diagonal. It is however possible to perform a transformation of the feature vector in order to diagonalise the covariance matrix. This is done by computing the 85 eigenvectors {U i } and eigenvalues σ 2 i of [C], The orthogonality properties are given by, where [U ] is the matrix whose columns contain all the eigenvectors, [S] is a diagonal matrix containing the eigenvalues σ 2 i in descending order of magnitude on the diagonal, and [I] is the identity matrix. The spectral decomposition of the covariance matrix is given by, and the spectral decomposition of the inverse of the covariance matrix by, Assuming now the following transformation, The mean and covariance matrix estimated from the N transformed samples η i (i = 1...N ) are given by, and by using the orthogonality condition, one sees directly that the covariance matrix of {η} 95 is diagonal and that the standard deviation of each component η i is given by σ i , via, Using the inverse transformation, the Mahalanobis squared-distance reduces to, This shows that the Mahalanobis squared-distance can be decomposed into a sum of independent contributions from each component of the transformed variables η ζi = {U i } T {y ζ }.

100
The contributions are weighted by the inverse of the associated eigenvalues σ 2 i , which can be interpreted as the variances of the new, transformed variables. If the variance is large, the contribution to the distance is small.

Filtering of the environmental effects
In many cases, when the number of features is large enough, the total variability in the 105 feature vector extracted from the healthy condition can be explained by a smaller number of transformed features, usually called the principal component scores. Strictly speaking, this occurs when some of the eigenvalues of [C] are equal to zero. The associated eigenvectors form the null-space (or kernel) of the training data. In practice, due to the noise and numerical precision issues, the eigenvalues are not strictly equal to zero, but a significant drop in the 110 eigenvalues can be observed and can be used to define the number of principal components which account for most of the variability. An effective null-space is defined by putting a threshold on the singular values, assuming that the singular values below this threshold are only non-zero due to the noise in the training data. In the following, the effective null-space or kernel will simply be called the 'null-space' or 'kernel'. A practical way to determine the 115 number p of vectors in the principal subspace is to define the following indicator, and to determine p as the lowest integer such that I > e(%), where e is a threshold value (i.e. 99.9 %). The meaning of this threshold is as follows: p principal components are needed in order to explain e% of the variance in the observed data. Assume that these p principal components have been identified. For a new sample of the feature vector {y ζ }, the Mahalanobis 120 squared-distance can be decomposed into two parts, where D 2 1ζ is the Mahalanobis squared-distance of {y ζ } projected on the principal components, and D 2 2ζ is the Mahalanobis squared-distance of {y ζ } projected on to the null-space of the principal components.
If one now assumes that very large variability exists in the feature vector extracted from the 125 healthy condition due to environmental effects, if this variability is more important than other sources such as noise, it will belong to the set of the first p principal components. Because the Mahalanobis squared-distance scales each independent component with respect to the inverse of its variance, the distance will have a very low sensitivity to the environmental changes. By including the feature vector measured in all possible environmental conditions in the 130 computation of the covariance matrix, the Mahalanobis squared-distance is made insensitive to the environmental conditions. This idea will be demonstrated in Section 4 on a laboratory experiment.

Principal Component Analysis and Factor Analysis
The way principal component analysis can be used to filter out confounding effects is to project 135 the feature vector onto the subspace of the minor components, when the major components have been identified on data containing variability to confounding effects [28,29]. If the Mahalanobis squared-distance is subsequently used to perform outlier analysis, it reduces to computing, which is equivalent to considering that σ i = ∞ for i ≤ p. One can easily see that if there is 140 a clear drop in the σ i values, computing the full Mahalanobis squared-distance is equivalent to computing the Mahalanobis squared-distance of the feature vector projected on to the subspace of minor components. This is illustrated with the following example: in Figure 1, the feature vectors in two dimensions have been built from two normal distributions with σ 1 = 3 and σ 2 = 1 and using a 145 rotation angle of π/3. Equidistant points from the mean lie on an ellipse which is rotated at an angle π/3 with respect to the features axis. As σ 1 and σ 2 are of the same order of magnitude, there is no possibility to clearly define the linear subspace to which the confounding effects belongs. In Figure 2, consider now the same case but with σ 2 = 0.1. One can see clearly that the 150 feature vectors almost lie on a line, which is a subspace of the feature vector space. The Mahalanobis squared-distance can be decomposed into two independent terms. The first term corresponds to the projection of the feature vector in the direction of the line (direction with large σ i ). This term is only sensitive to a change of the feature vector in the principal direction. Looking only at this contribution corresponds to projecting the feature vector on 155 the first principal component. One notes however, that the contribution to the Mahalanobis squared-distance will be weak because the distance is weighted by 1/σ 2 1 where σ 1 is large. The second term corresponds to the projection of the feature vector on the direction perpendicular to the line (direction with small σ i ). This term is only sensitive to a change of the feature vector in the minor direction (in opposition with the principal direction). Looking only at 160 this contribution corresponds to projecting the feature vector on the direction orthogonal to the principal component. The contribution to the Mahalanobis squared-distance will be very strong because it is weighted by 1/σ 2 2 where σ 2 is very low. The procedure to identify the linear subspace and to project the data on to the orthogonal subspace is slightly different in factor analysis [16]. In order to show this, the theory behind 165 it and the link with principal component analysis is briefly recalled in the following.
Assume, as previously, that all the features are arranged in a vector {y} and one can write, The terminology unobservable factors is used in factor analysis [30]. This non-linear mapping 175 is generally unknown and there is no need to identify it in the present application. The unobservable factors {ξ} are assumed to be statistically uncorrelated variables with unit standard deviation. In linear factor analysis, a linear mapping Λ is used between the unobservable factors and the features. Equation (20) becomes, Due to the statistical hypothesis on {ξ} and {ε}, the covariance matrix can be written, Compute the eigenvalues σ 2 i and eigenvectors U i of matrix [C] * Determine the p eigenvectors needed to represent e% of the variance in the data and take

It is not obvious that the matrix [Ψ] should be a diagonal matrix in SHM applications.
Most of the features extracted for SHM contain some noise which is not added directly and independently on each feature, but results from the noise in the data acquisition chain which has been transformed in the feature extraction process, causing very likely correlations between the noise on the different features (see for example the study for mode shapes in [31]).

230
Once the matrix [Λ] has been identified, the removal of the environmental effects consists in determining the best approximation of the feature vector {y}, and in removing it from the feature vector in order to build a residual, There are several choices in order to build the best approximations, but previous studies have shown that all of them lead to very similar results. The simplest approximation is based on 235 ordinary least squares giving the following estimate, which reduces to, where [Λ] T [Λ] = [S * ] is a diagonal matrix with the eigenvalues σ 2 i , i = 1..p on the diagonal. This solution is the one that minimises the L 2 -norm of the error given by, An alternative to this approach is to use generalised least-squares where the norm contains a 240 weighting matrix [W ], such that, The minimum of this norm is given by: and equation (35) reduces to, which is identical to equation (32). Other alternatives can be based on Bartlett's and Thomson's factor score which have been found to give very similar results [32]. In this study, 250 equation (32) will be used. The residual can be written, It is equal to the projection of the feature vector in the null-space of the subspace spanned by the vectors of [Λ], which, as discussed earlier, is the subspace of the principal components if a single iteration is used in FA.
The above developments clearly show the link between PCA and FA, and how these two 255 methods can be related to the use of the Mahalanobis squared-distance proposed in Section 2. This is further illustrated in the next section where the three methods are applied to data from a laboratory experiment.

Application: Wooden Bridge
Consider the wooden bridge shown in Figure 5, equipped with a monitoring system, previously 260 developed and investigated in [33]. The total mass of the bridge is 36kg. A random excitation was applied using an electrodynamic shaker and output-only acceleration measurements were collected at 15 different locations. Mode shapes and eigenfrequencies were extracted from the measurements using output-only stochastic subspace identification. The monitoring was performed over several days during which modal properties varied significantly due to 265 temperature and humidity changes. Out of the 16 modes identified, only modes 6-8, 10 and 12-16 are used in this study. The first five modes correspond to low frequency modes related to rigid body motion of the bridge involving strain in the supports only, while modes 9 and 11 were not found consistently in all datasets, which is the reason for discarding them. In total, 1880 measurements were performed on the undamaged structure under changing envi-  Figure 6 shows the variation of natural frequency six with respect to the sample number. It is clear that environmental conditions are responsible for frequency changes of an order of magnitude larger than the damage (simulated here with an added mass).
The feature vector is made of nine natural frequencies and nine mode shapes measured at 15 locations. The mode shapes are complex and normalised with respect to the first component, 280 so that 14×2 values are used in the feature vector for each mode shape, resulting in a feature vector y of dimension 261 (9 frequencies + 14x2x9 mode shape components). Each component of the feature vector is then standardised with respect to the mean and standard deviations computed on the training data.  Figure 7 shows the evolution of the Mahalanobis squared-distance computed for each of the 2008 samples when considering the first 300, 1000 and 1880 samples for the computation of the covariance matrix (training data). It is clear from the results that the Mahalanobis squared-distance acts as a filter for the environmental variations which have been used for the computation of the covariance matrix. When using the first 300 and 1000 samples, the 290 value of the Mahalanobis squared-distance for some of the healthy samples (not used for the computation of the covariance matrix) is of the same order of magnitude as for the samples with the structural change. In this case, the novelty detection is hindered by the environmental changes. When including all the variability from the environment in the computation of the covariance matrix, the Mahalanobis squared-distance has a low sensitivity to environmental 295 changes and is very sensitive to structural changes.

Comparison with principal component analysis and factor analysis
The residual of the feature vector is computed for each sample using either PCA or FA. Figure 8a) shows the result when only the first iteration is used in FA and the first 1880 samples are used as training data, confirming that it is identical to PCA. Note that the same 300 number of principal components/unobservable factors has been used with the two methods (e = 99.9% was used to find p = 81). Figure 8b) compares the results of PCA with FA after convergence of the iterative process. The differences are very minor and difficult to detect.
In Figure 9, the results of FA and PCA are compared to the results previously obtained where no pre-processing of the feature vector is performed (raw data). Samples (1-1880) are used 305 for the computation of the covariance matrix in all cases. As expected, PCA and FA do not bring a major improvement in the damage detection.

Threshold setting and damage detection using the Mahalanobis squared-distance
In order to assess the possibility of using the Mahalanobis squared-distance to perform damage detection under changing environmental conditions, 80% of the undamaged samples under In order to detect damage, it is necessary to set a threshold above which the condition of 315 the system will be considered as abnormal. This value is dependent on both the number of observations and the dimension of the feature vector. The value also depends upon whether an inclusive or exclusive threshold is calculated. Another important parameter is the confidence level α which is set here to 0.001 (this sets the limit so that it is normal to have statistically 1 out of 1000 samples above the threshold).

320
The first approach to set the threshold makes the assumption that the residual vector is normally distributed. A Monte-Carlo simulation is used, summarised as follows [7]: Construct a (261×1504) matrix with each element being a randomly generated number from a zero mean and unit standard deviation normal distribution.  Repeat the process 1000 times and store the largest value of the Mahalanobis squareddistance for each process.
The inclusive threshold T inc is the value of the largest of the 1000 values stored, and the exclusive threshold can be computed from, where n is the number of training samples (n = 1504). The result is shown in Figure 10. It is clear that the threshold is too low, as there are numerous false positives both in the training and the testing set. This is due to the fact that the normality assumption is not correct.

Mahalanobis squared-distance
Training set Testing set Threshold Figure 10: Mahalanobis squared-distance and threshold for novelty detection using a normality assumption.
An alternative is to use extreme value statistics to compute the threshold. The methodology is described in [34,35]. The first steps consists in fitting a parametric model to the right tail of 335 the empirical cumulative density function (CDF) of the training data. It is shown in [36] that when looking at the distribution of maxima, there are only three possible distributions : the Gumbel, Frechet and Weibull distributions. The approach consists in fitting the parameters of these three distributions and selecting the model which gives the lowest error. The fitting of the parameters is performed using a differential evolution algorithm described in [35], and 340 in the present case, the distribution giving the lowest error is the Frechet distribution : Once the parameters δ, λ and β are known, the threshold is computed for a confidence level α (=0.001 in the present case)by inverting the expression: which gives the threshold T EV S : This threshold is represented in Figure 11. The figure shows that this threshold is much more 345 adequate than the one computed based on the normality assumption. There are no false positives in the training set and only one outlier in the testing set. All damaged samples lie above the threshold, showing that this type of threshold is adequate to perform damage detection under changing environmental conditions. Note however that after removing the point masses, the Mahalanobis squared-distance returns close to the original level but slightly 350 above the threshold, at a level corresponding to the first level of damage. This is probably due to the fact that the training data did not include the full range of environmental conditions.

Mahalanobis squared-distance
Training set Testing set Threshold Figure 11: Mahalanobis squared-distance and threshold for novelty detection using extreme value statistics -80% of healthy condition used as training set Figures 12 and 13 show the influence of taking randomly respectively 60 and 40 % of the healthy state in the training set instead of 80 %. One sees clearly that the number of false positives in the testing set strongly increases and exceeds the confidence level of the threshold, 355 showing that there are not enough samples in the training set. In the present case, at least 60 to 80 % of the healthy state should be considered in the training data.   Figures 14 and 15 show the influence of the size of the feature vector. In the first graph, only the real part of the mode shapes is considered and the imaginary part is discarded, resulting in a feature vector of size 135. In the second graph, the imaginary part and the high frequency 360 modes (14-16) are discarded resulting in a feature vector of size 90. In both cases, the number of false positives is not strongly affected for the testing set, but the first damage scenario now lies below the computed threshold. This illustrates the fact that, as expected, by reducing the size of the feature vector, some information is lost and it is more difficult to differentiate the damage from the environmental conditions.

Conclusions
The Mahalanobis squared-distance is often used to perform novelty detection. By performing an eigenvalue decomposition of the covariance matrix used to compute that distance, it has been shown that the Mahalanobis squared-distance can be written as the sum of independent terms which result from a transformation from the feature vector space to a space of inde-370 pendent variables. In general, especially when the dimension of the feature vector is large, there are dominant eigenvalues and eigenvectors associated to the covariance matrix, so that a set of principal components can be defined. Because the associated eigenvalues are high, their contribution to the Mahalanobis squared-distance is low, while the contribution of the other components is high due to the low value of the associated eigenvalues. This analy-375 sis shows that the Mahalanobis squared-distance naturally filters out the variability in the training data. This property can be used to remove the effect of the environment in damage detection. It has also been shown that this method is closely related to the two main linear techniques used to remove confounding effects: principal component analysis and factor analysis. This has been demonstrated on real data from a wooden bridge for which the feature 380 vector consists in eigenfrequencies and modeshapes collected under changing environmental conditions, as well as damaged conditions simulated with an added mass. The results confirm the ability to filter out environmental effects while keeping a high sensitivity to structural changes, and the equivalent performances of the new methodology proposed and the two established techniques. When filtering environmental effects, one should pay attention to using 385 a proper training set containing the full range of environmental conditions, and to using a large enough feature vector in order to ensure some separability between the damage effects and the environmental effects. The results also show that the normality assumption does not hold in the case studied, and that an approach based on extreme value statistics is necessary to set a proper threshold for damage detection.