Weak adhesion detection – Enhancing the analysis of vibroacoustic modulation by machine learning

Adhesive bonding is a well ‐ established technique for composite materials. Despite advanced surface treatments and preparations, surface contamination and application errors still occur, resulting in localised areas with a reduced adhesion. The dramatic reduction of the bond strength limits the applicability of adhesive bonds and hampers further industrial adaptation. This study aims to detect weak ‐ bonds due to manufacturing errors or contamination by analysing and interpreting the vibroacoustic modulation signals with the aid of machine learning. An ultrasonic signal is introduced into the specimen by a piezoceramic actuator and modulated through a low frequency vibration excited by a servo ‐ hydraulic testing system. Tested samples are single ‐ lap shear specimens, according to ASTM D5868 ‐ 01, with arti ﬁ cial circular debonding areas introduced as PTFE ‐ ﬁ lms or a release agent contamination. It is shown that an arti ﬁ cial neural network can identify various defects in the bonded joint robustly and is able to predict residual strengths and hence demonstrates great potential for non ‐ destructive testing of adhesive joints.


Introduction
Fibre reinforced composites are utilised in many load-bearing primary structures in aerospace and renewable energy production due to their superior weight and strength ratio, fatigue properties and resistance to corrosion [1]. Contrary to the idea of lightweight design, however, composite parts in primary structures are joined using traditional methods such as bolts or rivets which cause high stress concentrations in the bonded area and damages around the hole while machining. Consequently, this has to be compensated by an increased material thickness or metal inserts in the bonded region [2].
Adhesive bonding aids in reducing this additional weight and leads to homogeneous stress distribution in the joint [2,3]. For many applications a high level of process control with surface activation and process automation for the bonding process is available. Despite this, areas of reduced adhesion persist due to the occurrence of contaminations [2][3][4][5]. While other adhesive defects like voids and porosities can be detected with ultrasonic techniques, weak-bonds are determined by chemical interactions at the atomic level, which is orders of magnitude smaller than ultrasonic wavelengths [4]. Therefore, these defects are not detectable with conventional non-destructive testing (NDT) methods but reducing the bonds' strength dramatically [3,5,6].
To create artificially weak-bonds, most studies are either introducing a non-adhesive film [7][8][9][10][11][12][13] or contaminating the adherents surface [11][12][13][14][15][16][17][18]. The insertion of a polytetrafluoroethylene (PTFE) film into the adhesive joint is straightforward to implement. Since the physical properties of the bond are locally altered by the introduction of a PTFE-film, those defects are typically clearly detectable by ultrasonic techniques or other NDT methods [7,11,13]. Creating weak-bonds by applying a small amount of release agent on one of the adherents resembles reality much closer [17].
The detection of bond line flaws, especially weak-bonds, has been subject of various works employing various NDT techniques including linear and non-linear ultrasonic methods [4,11,12,19], X-ray [7], guided Lamb-waves [8,20,21], laser shock adhesion testing [14,22], digital image correlation [23] and highly non-linear solitary waves [18]. A summary of various NDT methods for the assessment of adhesive bonds is found in Ehrhart et al. [3]. Nevertheless, the Federal Aviation Authority approves currently no method to reliably detect a reduced adhesion-which emphasises the need for new methods applicable to real-life structures which are sensitive to all types of defects [2,5,6,24].
Despite the fact, that some non-linear methods show promising results in the aforementioned works, the vibroacoustic modulation (VAM) analysis has shown a superior sensitivity for the damage detection compared to other nonlinear ultrasonic techniques [25]. It has already been used to detect fatigue damage on metallic specimens [25][26][27][28][29][30][31][32][33] as well as impacts and subsequent delaminations in composites [34][35][36]. Further applications are summarised in [37]. Nevertheless, to our knowledge, a VAM analysis has not yet been applied to adhesive joints. And although the analysis of Lamb-wave based experiments with machine learning methods has been already the subject of several other studies [38][39][40][41], applying machine learning methods to analyse a VAM was to the authors knowledge only proposed by [32] who predicted the crack length and the remaining fatigue lifetime of aluminium specimens. In contrast to the method of [32], in this work, we propose to use multiple modulation amplitudes as input values for an Artificial Neural Network (ANN) rather then a (linear) combination of such.
A unique aspect of this work is the first-time application of VAM to adhesively bonded composite parts. In addition, we train neural networks on several subsets of our measured data to investigate if a change of the excitation frequency of the piezoceramic actuators or their positions on the specimen affect the accuracy of the algorithm. Finally, we discuss the as yet unprecedented classification accuracy of the defect detection, especially for weak-bonds, and interpret them in terms of true and predicted shear strengths of the various defects by means of sideband influences.

Vibroacoustic modulation
A high sensitivity is achieved in the VAM method due to the modulation of an ultrasonic probe wave (with frequency f Ca ) with an intense low-frequency pump vibration (with frequency f P ) at defect locations. The low-frequency vibration locally and nonlinearly alters the mechanical properties in the specimen at inhomogeneities or defects, which affects the ultrasonic wave propagation and can consequently be detected by another piezoceramic receiver. The general approach of a VAM analysis is illustrated in Fig. 1. Damages (and initial defects, i.e., discontinuities such as grain boundaries in metals, surfaces, interfaces and interphases, etc.) lead to non-linearities in the wave propagation [29]. It is hypothesised that due to the nonlinear behaviour in the weak-bond induced by a large amplitude pump wave, the ultrasonic Lamb wave is further modulated. This ultimately leads to higher sensitivities than conventional NDT techniques such as pulse-echo ultrasound. Consequently, the introduced frequencies will reveal higher harmonics (nf P^n f Ca jn ∈ N) and sidebands will evolve around the higher frequency (f Ca AE nf P jn ∈ N) due to the modulation [37].
Relevant for this work are "damage indices" usually calculated from the amplitude of the sidebands (A nAE jn ∈ N) which are commonly used to evaluate the "severity" of a damage. Frequently used damage indices are the "modulation intensity coefficient" (R) [28,34,37], the "Modulation Index" (MI) [31], a "non-linearity parameter" [29,32] and the "sideband ratio" [13]. In this work, only the first two are considered which are given by: The non-linearity parameter and the sideband ratio are frequently used in a different experimental setup (i.e., pump and probe frequencies are introduced by either a single or two separate piezoceramic actuators in combination or alone) and since those had not shown a distinguishable differentiation using our experimental setup they were not further considered in this work. It should be noted that since several sources only use the first sidebands [28,34] to calculate the modulation intensity coefficient, the dB scaling of the modulation index is basically the relevant dissimilarity between modulation index and modulation intensity.
Recent studies have attributed amplitude modulation and phase or frequency modulation to different damage types in the sample [30,31]. In the frequency domain, a pure amplitude modulation is only contributing to the occurrence of the first sidebands, while frequency modulation and phase modulation contribute effectively to an infinite number of sidebands [31]. To remedy this, Hu et al. [42] proposes the separation by a Hilbert-Huang transformation which suggested that the amplitude modulated signal has a higher correlation with the crack size. In contrast, after applying the so-called In-phase/Quadrature Homodyne Separation algorithm, the frequency modulated signal indicates fatigue damage for aluminium at an earlier point [30,31]. However, under certain conditions, narrow-band frequency modulation can cause just one sideband and a distorted or over-modulated amplitude modulation potentially leads to several artificial sidebands [42]. Tak- ing all these factors into account, the evaluation of the patterns in VAM sidebands is a complex endeavour. Thus, a data-driven analysis, as presented later, using an ANN to examine specific patterns in sideband amplitudes proves beneficial for such complex and highly correlated input data-if aforementioned limitations are taken into account.

Specimen preparation
The specimen geometry is based on the ASTM D5868 -01 [43] standard as shown in Fig. 2. Glass-fibre reinforced polymer laminates were manufactured in a resin transfer moulding process with a ½0 4 s layup of dry UT-E500 E-glass fibres from Gurit and a low-viscosity epoxy system RIMH135 with a RIMR137 amine-based hardener from Hexion. After infiltration, the plates were cured at 50 C for 24 h, and postcured for 15 h at 80 C outside the mould and in compliance with the datasheet. The glass-fibre reinforced polymer plates are further cropped to the required mould dimensions with a water-cooled corundum saw blade (ATM Brillant 265). The bond lines were finished with a 1000-grit sandpaper and cleaned with isopropanol to eliminate prior contamination.
Single-lap joints containing three different bonding types, including pristine specimen, are produced by bonding the glass-fibre reinforced polymer substrates in a secondary-bonding process using a 2C-epoxy adhesive (SikaPower-1280). Both materials are commonly found in adhesively bonded parts of wind turbine blades. Manufacturing of the weak-bonds was adapted from Harder et al. [17]. Circular areas with a diameter of 12 mm (18% of the overall surface) were contaminated with release agent (Mikon W-64 + from Münch Chemie) according to the datasheet. For a precise application of the release agent, stencils (Hostaphan from Mitsubishi) were cut (using an Aristomat TL 1625) and fixed with adhesive tape (Tesa) outside of the adherent area to prevent further contamination due to the tape. The second class of defects was created by introducing a circular PTFE-film (Goodfellow FP301100) in the centre of the joint with a thickness of 0.01 mm and a diameter of 12 mm. Sheets were cut and positioned using a stencil. This procedure led to equal-sized and similar located bond line flaws.
A milled jig was used to ensure an optimal alignment of the adherents and spacers, separating the adherents by 0.2 mm, as shown in Fig. 3. The produced panels are cured in an autoclave for 4 h at 70 C with a pressure of 2 bar and afterwards cut to their final dimension of 25.4 mm width with a water-cooled saw. Finally, all specimens were conditioned for two weeks according to ISO 291 at 23 C and 50% relative humidity.
Subsequent tests evaluating the quality of all produced specimens, have been performed to detect defects from the manufacturing process. First by an optical analysis (EPSON V850 Pro transmission light scanner) and afterwards using a pulse-echo ultrasonic measurement of bonded regions (USPC 3040 ultrasonic imaging system). In the ultrasonic C-scans shown in Fig. 4, inserted PTFE-films are clearly visible as red areas. In contrast, specimen with applied release agent are not distinguishable from pristine specimens. Proper application of the release agent was ensured by preparing different specimens in permutative order using a single stencil for all types of bonds (cf. Fig. 3). Indeed, in the fracture surfaces shown in Fig. 4 introduced defects are clearly identifiable as circular areas where the reduced adhesion prevents fibre tearing and, more importantly, adhesion.

Vibration measurement
Vibro-acoustic measurements were conducted similar to the method described in Refs. [27,30,31,33,44,45]. A pump frequency of f P = 5 Hz was applied by an 8801 servo-hydraulic testing system from Instron (max. load capacity of 63 kN) and controlled by the Instron WaveMatrix TM software. The pump frequency is limited by the hydraulic valve of the testing system and the maximum possible was hence 5 Hz. The hydraulic grips of the testing machine were closed with a constant pressure to eliminate boundary influences [36]. The amplitude of the pump frequency σ max was set to 11 MPa with a stress ratio of R = 0.1.
A high-frequency Lamb-wave is introduced into the specimen by bonding piezoceramic actuator disks with dimensions 10 × 2 mm (PI-Ceramics) to the adherents with double-sided tape (Tesa). Preliminary tests using double-sided tape have shown only a slight reduction of the relative signal strength (when measured in dB) and equivalent information in the signal as compared to the more commonly used 2Cadhesives-which offers the great advantage of reusing piezoceramics at almost no cost in terms of signal-to-noise ratio. The excitation and data acquisition was performed by a NI-USB 6366 data acquisition board (National Instruments) with a sampling rate of 2 MS/s and 16 bit resolution and was controlled from MATLAB2020a. The generated signal was twofold amplified using a BUF634 amplifier with an additional low-pass of 420 kHz and a 12 V pp sine.
The choice of the high frequency for the Lamb-wave excitation depends on the resonant frequency of the piezoceramic disks and the natural frequencies of the specimen, so the dimensions must be carefully chosen to account for the generation of Lamb waves for a given specimen dimension. A detailed summary of the conditions for the occurrence of signal modulation with Lamb waves is found in Ref. [29]. The sinusoidal signal was linearly swept between 1-300 kHz within 10 s to determine a suitable frequency range for the HF (more details on this frequency chirp are found in the Supplementary Information). The strongest signal on the receiving piezoceramic was found at around f Ca = 200-220 kHz. Using this range, 39 VAM measurements were made with an equal frequency spacing of 500 Hz and a duration of 2 s each. Additionally, as shown in Fig. 2, signal paths between receiver and sender are alternated for each specimen to deter-mine the influence of the piezoceramic location on the modulation signal. Hence, all piezoceramics are used for exciting and measuring the vibration resulting in a total of 234 VAM samples (39 frequencies and 3 Â 2 signal pathways) per specimen.

Data preparation and processing
Data preparation and processing is shown exemplarily in Fig. 5 for a high frequency excitation of f Ca = 210 kHz and a pump frequency of f P = 5 Hz. As shown in Fig. 5 from top-left to top-right, measured data after a transient regime was transformed with a fast Fourier transformation (FFT) to the frequency domain using the NumPy FFT package for Python 3.7. The transient regime (first 0.4 ms of each measurement) was removed from the analysis, as shown in the top-left panel of Fig. 5, to reduce the spectral leakage which resulted in more pronounced sidebands. Moreover, it was necessary to convolute the signal with a Hanning window to further reduce influences from data acquisition [46,47]-effectively caused by the data acquisition converting the initially periodic vibration signal into a non-periodic signal. For each specimen j, sidebands and carrier amplitudes (blue dots in the topright panel of Fig. 5), A j;iAE and A j;Ca , respectively, are detected by searching for the maximum signal within a small range of the signal position determined by f Ca AE nf P . The lower-right panel of Fig. 5 denotes how the amplitude of the carrier and (exemplarily) the first nine sidebands to both sides were stored in a Pandas DataFrame matrix format denoted by X and used as input for a neural network in the lower-right panel. Bonding type and shear strength are stored as labels for the classification and regression in a second Pandas DataFrame denoted by the vector y together with the sample identifier. Since amplitudes in dB-scale are not optimal for the optimisation of the ANN, the distribution of amplitudes is scaled so that the mean of all amplitudes is zero and the standard deviation is one (Z-score normalisation).

Machine learning application
The identification of defects can be implemented on the basis of a classification of the bond defect or, as it is shown afterwards, as a regression of the resulting shear strength. In this study, both approaches are trained using slightly different ANN. All ANNs were implemented with Keras 2.3 [48] referencing to a TensorFlow 2 [49] installation. Optimal network architecture and learning parameters are determined from a randomised grid search [50] in which the number of hidden layers ranging between 1 and 4 and the number of neurons per layer are varied. Furthermore, the number of input values (as given by the input vector s j ¼ ½A jÀ ; . . . ; A Ca ; . . . ; A jþ in Fig. 5), the range of used probe frequencies and the signal pathways between the piezoceramics was permuted. Partly, this procedure was done to evaluate the information gain adding more the sidebands than usually analysed and, on the other hand, to identify the most information-rich frequency range and piezo separations for classifying the bond (defect) and predicting the shear strength.
The evaluation metric for training a classification ANN is the classification accuracy and for a regression ANN the mean absolute error (MAE) of the prediction. These metrics are calculated as accuracy ðy;ŷÞ ¼ 1 n ∑ n i¼1 1 y i ðŷ i Þ and where i runs over the total number of samples n;ŷ i is the predicted label of the i-th sample, y i the corresponding true label and 1 y i ðŷ i Þ is the indicator function which returns 1 if the predicted label corresponds to the true label. Furthermore, reliability and robustness of the predictions are ensured by employing a ten-fold cross-validation by training ten ANNs on differently composed training sets, where the data set is split randomly into a training set (80%) and a test set (20%). Obviously, evaluating the ten ANNs trained on different train sets where all measurements from one specimen are either in the test or training set increases the reliability of the resulting prediction. The differences in the ANN used for classification or regression were basically in the output layer and loss function. A softmax output-layer 1 was chosen for the classification and a single neuron 2 was chosen for the regression. Over-fitting of the ANNs was inhibited by implementing a dropout of nodes in the ANN with a probability of p ¼ 0:1 for every hidden layer of the ANN and early stopping-effectively stopping the training at the smallest error [52]-with a delay of 100 epochs on the validation loss for the training process. After the initial search for the hyper parameters, the neural network architecture was set to four hidden layers with a minimum of [40,40,30,20] neurons. The architecture was the same for all ANNs in each cross-validation, but between permutations, the minimum number of each layer was multiplied by a random number between one and two to test whether, for example, slightly larger networks performed better.

Feature importance
In this work, the influence of each sideband on the prediction is calculated to assess their individual importance. For this purpose, the feature importance is calculated with two methods. Firstly, gametheoretical shapely values were calculated with the DeepExplainer from the SHapley Additive exPlanations (SHAP) package [53]. Here, the marginal contributions of each feature are calculated across permutated input values. Obtained results from SHAP are compared to Garsons' algorithm [54,55] in which, explained rather briefly here, the weights of the neural network are analysed to obtain a feature importance of the input values. Since the feature importance within each of the ten different cross-validation runs differs, a mean value was calculated.

Mechanical testing
The shear strength measurements were performed on a universal testing machine Z100 from ZwickRoell equipped with a 100 kN load cell. Due to the small fracture strain of the adhesive bond, the displacement rate was set to 2 mm/min. The strain was monitored with a mul-tiXtens extensometer from ZwickRoell at a distance of 50 mm. The shear strength of each specimen is calculated using the measured tensile strength normalised by the (initial) cross-section of the adhesive joint overlap.

Mechanical testing
Shear strength and modulus are shown in Fig. 6 for different defect types. As expected, shear strengths of the pristine specimens are highest with 22.3 AE 1.5 MPa-near the 25 MPa specified in the datasheet. In the presence of a defect, the shear strengths of release agent specimens are reduced to 20.7 AE 1 MPa and in case of PTFE to 19.7 AE 0.9 MPa. Although standard deviations of the latter two are similar, it is assumed that the stress concentration due to the finite thickness of the PTFE-film leads to a further reduction of the shear strength as compared to the release agent. The shear strength reduction due to the release agent is only half as severe as compared to scarf bonded joints tested by Harder et al. [17] where a 26% decrease was observed. This difference can be explained by stress concentrations at the edges of the single-lap shear joints compared to the more homogeneous stress distribution in scarf bonded joints. Defects in the central region of scarf bonded joints have thus a higher influence on the shear strength of the bond. In contrast to the shear strengths, no significant differences in the shear modulus have been observed for contaminated specimens. Corresponding stress-strain curves can be found in the Supplementary Information.

Vibroacoustic measurements
The vibroacoustic measurements are evaluated firstly by calculating the "traditional" damage indices mentioned in detail in the methods part. Box plots illustrating the distributions of the modulation index and modulation intensity coefficient are shown in Fig. 7. Indicated by the position of the mean value of the modulation index or intensity, both indices would eventually enable a detection of the inserted PTFE-film in principle-if their standard deviation would not prohibit this. However, as expected, distinguishing between pristine and release agent specimens is impossible, regardless if the modulation index or modulation intensity coefficient is used. The prominent difference of PTFE is in accordance with the work of Chen et al. [13] where a PTFE-film was detectable inside a laminate.

Adhesive bonding classification
To improve the classification based on the damage indices presented in the previous chapter, a more sophisticated defect detection employing neural networks is proposed in the following. In general, accurate classifications from an ANN depend on the network architecture, the number of used sidebands, the frequencies and the signal path between piezoceramic actuator combinations as it is shown in Fig. 8. Concerning the network architecture, best results were achieved with four hidden layers with ½60; 70; 50; 40 neurons. The inclusion of n sidebands in the ANN analysis of the VAM signal is denoted by s j ¼ ½A jÀ ; . . . ; A Ca ; . . . ; A jþ as given on the horizontal axis of Fig. 8 (left). Each point shown in Fig. 8 (left) is the mean accuracy resulting from a ten-fold cross-validation based on different train/test compositions (at a fixed ratio of 80/20). For each input parameter s j , 1 with categorical cross-entropy lossand an Adam optimiser [51] with an initial learning rate set to 0.01 2 with ReLu activation, mean squared error loss and Adam optimiser with an initial learning rate set to 0.0001 the ten-fold cross-validation process was repeated several times, each time with VAM samples from a different randomly selected probe frequency range. Each time, the ANN was trained on VAM samples from a probe frequency range randomly selected by choosing two values from an interval between 200 À 220 kHz with an equal spacing of 500 Hz and at least five values in between. The results for a certain frequency range that provided the most accurate classifications are shown in Fig. 8. In general, a dependency on the signal pathway is observed and higher accuracies for the damage type classification are obtained using more sidebands-implicating relevant information in the higher order sidebands. However, a maximum is reached for s 5 (five sidebands on each side plus the carrier signal). One reason for the accuracy decreasing after s 5 is probably the amplitude decrease of the higherorder sidebands and the aggravation of the signal-to-noise ratio. The ANN eventually identifies artificial patterns in the noise and overfits the data, resulting in a reduced accuracy. On the other hand, an increasing number of input parameters negatively impacts the optimisation with a gradient descent method as well, which would in turn require more data. By splitting the data set in three distinct signal paths between the piezoceramics, a difference in information quality and resulting classi-fication accuracies can be observed in the right panel of Fig. 8. Higher accuracy is achieved if the distances of piezoceramics to the bonded area are asymmetric (P1-P3, P2-P4) with a wider spacing than a symmetric positioning (P2-P3). Furthermore, asymmetric alignments appear to be more frequency dependent which might be an effect of the signal path length, the distance to the bond or more generally the eigenmodes of the specimen. Note that in Fig. 8 (right), different probe frequency ranges (which possibly also overlap) were used to train the ANNs, as previously discussed, rather than individual frequencies as it might be anticipated. Hence, each point in Fig. 8 (right) is the highest achievable accuracy of a frequency obtained for training a neural network with input data from a certain frequency range. Thus, if a wider frequency range is beneficial for training, this would in principle lead to more horizontal parts in Fig. 8; but may also have other reasons.
More detailed illustrations of the classification results using the above mentioned optimised input parameters (in percentage) are shown in Fig. 9 (left) as a confusion matrix. For this classification, ten different frequencies per specimen (5 from P4-P2 and 5 from P2-P4 in the range 202:5 À 204:5 kHz) and for each 11 input values (i.e., s 5 : 5 sidebands to each side and the amplitude of the carrier) are used for the corresponding ANN training which resulted in the highest possible classification accuracy. Classification accuracies are again estimated from a ten-fold cross-validation each using a randomly chosen train/test composition at fixed 80/20 ratio. The diagonal represents the true-positive classifications while the off-diagonal values represent false classified samples. Measurements from specimens with a PTFE-film in the bonded area are detectable with an accuracy of 99%. As already suspected, distinguishing between pristine and a release agent contamination is less precise with an accuracy of 91.6%. Nevertheless, an overall accuracy of 93.4% is achieved for the ternary classification.
Since PTFE-films can already be detected by ultrasonic testing without the use of a sophisticated ANN, it is natural to remove all PTFEfilm samples from the training and test set and use a second binary ANN alone to distinguish the latter two defects. This further improves the detection of weak-bonds, as shown in the right confusion matrix of Fig. 9. This network is based on a similar randomised input parameter   variation, as mentioned earlier. With input data from the same probe frequency range and number of sidebands, this ANN achieves an accuracy of 93.1% for a binary differentiation between pristine and a release agent contamination. Hence, combining the ability to detect PTFE-sheets from the ternary-ANN (or even with ultrasonic inspection) with a further binary-ANN, the overall classification is improved and reaches an accuracy of 96.7%.
Most false predictions can be attributed to samples from only a few specimen, as is illustrated in Fig. 10 (upper) and only a single specimen is completely falsely classified when using all samples of the ten different frequencies. This specimen out of the 31 tested accounts for 24% of all falsely classified samples. Furthermore, the ANN from the binary classification also predicted this specimen poorly with 8 out of 10 false classifications and accounts again for 23% of the falsely classified samples. Both the ternary and the binary ANN identified the actual pristine specimen as contaminated with release agent. Indeed, if we compare the actual shear strength of this specimen in Fig. 10 (star in lower panel), it actually behaves more like a specimen with a release agent contamination. Hence, this probably indicates that even non-obvious manufacturing errors in pristine specimens could be detected by our ANN. In addition to the one pristine specimen, seven other specimens are marked in Fig. 10 (triangles in lower panel) where more than ten percent of the classifications are wrong. The color code of the upper triangle marks the predicted class while the color code of the lower triangle marks the actual class. In general, the ANN is confusing mostly samples from pristine specimens with a low shear strength as release agent contamination and release agent specimens with a relatively high shear strength as pristine.
In summary, the ANN is able to identify the actual contamination and even predicts the more general shear strength class of the specimen to which it truly belongs. In conclusion, when applying a majority decision with a threshold that 75% of all classifications from one specimen must be correct, only the first mentioned pristine specimen is falsely classified as contaminated with release agent. More information on the training and robustness of the ANNs, e.g., learning curves, etc., are found in the Supplementary Information.

Shear strength prediction
Rather than just classifying the samples into three classes, the previous results suggesting that the shear strength could be an additional measure to differentiate the specimen in more detail and furthermore provides an additional indication of the robustness of our ANN approach. The shear strength σ s of each sample was predicted in a regression using a similar ANN architecture as before but with a linear output unit. As with the classification mentioned above, different number of sidebands as input values, ranges of frequencies and signal paths between piezoceramic combinations were used, permuted and shuffled to evaluate robustness and the dependence of the information gain on these parameters.
Similar to the classification results, the MAE decreases when more sidebands are included in the ANN analysis (not shown). Moreover, an analogous frequency dependence is observed, although the areas of minimal MAE span over more frequencies as depicted in Fig. 11 (upper). This suggests that more frequency information is beneficial for an, in general much finer, regression as compared to a ternary classification. In contrast to the previous results, the lowest MAE was achieved with the symmetric piezoceramic layout (P2-P3). The shorter distance between piezoceramic in this case possibly indicates that higher signal strengths are moreover beneficial for a regression, as opposed to better results with a larger distance in the classification. While the predicted σ s with the lowest MAE of 0.63 MPa shows a reasonable correlation (R 2 = 0.79 in Fig. 11, lower), a strong deviation between predicted σ s for each individual specimen is observed (using in this case input data from 13 different probe frequencies in the range of 202-208 kHz for piezoceramics P2-P3). The pristine specimen, which was entirely wrong classified before, is marked with dark stars in Fig. 11 (lower) and shows on averaging a reasonable agreement with the experimentally determined shear strength. Consequently, the regression of shear strengths is a complementary tool in interpreting classification results and aids in the identification of weak-bonds.

Feature importance
The decision process in neural networks is highly complex and it is not trivial to generate an understanding of the causal relationship between input and output. However, to make this in a way possible, two established methods are used and compared to assess and extract Fig. 9. Confusion matrix from the most accurate prediction based on the mean of a 10-fold cross-validation when differentiation between all resulting samples, i.e., vibroacoustic modulation signals, using an ANN trained on all three defect classes (left). A second binary ANN (right) is trained solely to differentiate between samples from pristine and release agent (RA) specimen by removing all samples associated with a PTFE-film specimen from training and classification. the information relevance of the sidebands. Results of the feature importance analysis are shown in Fig. 12 in which Shapley Additive Explanations (SHAP) [53] and Garsons' Algorithm [54] are compared. Shown in Fig. 12 are mean values of the feature importance calculated from a ten-fold cross-validation. For each cross-validation sidebands of either s 3 ; s 5 or s 8 are used as inputs which yielded the three highest classification accuracies. It is worth to note that the three ANNs are trained on slightly different frequency ranges (202:5 À 204:5 kHz, 202:5 À 204:5 kHz and 204 À 206:5 kHz). Surprisingly, according to both feature importance methods, not the first sidebands around the carrier are the most important features in Fig. 12, rather more important are the subsequent sidebands. This result underlines the importance of higher order modulations for detecting bond line flaws in contrast to the amplitude modulation which affects mainly the first sidebands. It further underlines the shortcomings of a traditional evaluation based on damage parameters, such as the modulation index or the sideband ratio, which are typically calculated using the carrier and the first sidebands.

Conclusion
To the authors best knowledge, there is currently no nondestructive testing (NDT) method available which can identify weakbonds or a low interfacial adhesion robustly. Although introduced PTFE-films in the adhesive bonding are detectable by ultrasonic Cscans and could eventually be detected by a global damage evaluation from a vibroacoustic modulation (VAM) analysis and corresponding damage indices alone, the contamination of the adherent surface by a release agent is, however, much more challenging to distinguish from a pristine bonding using traditional NDT methods or damage indices based on VAM. In case of the latter, according to the feature importance analysis, this can be attributed to the low relevance of the first two sidebands to detect bond line flaws.
A new way of evaluating VAM results is proposed by training an artificial neural network (ANN) on the higher order sidebands in the modulated signal to overcome this limitation. The ANN approach exploits more information than conventional VAM approaches by considering more than a summed value or a linear combination of sideband amplitudes, enabling the detection of specific patterns in the sideband amplitudes. The information gain is evident by the increased classification accuracy when training the ANN on more sidebands. With this unique approach, a differentiation between the two defect types and pristine samples was achieved with an accuracy of 93.4%. Since PTFE-films are relatively easy to detect, additionally an ANN for a further binary classification between surface contaminated and pristine samples was trained. By a combination of both ANNs an overall accuracy of 96.7% was reached. Moreover, high accuracy of the algorithm typically implies the presence of fairly strong patterns in the measured signals. As a consequence, the analysis of the VAM with machine learning, if done thoroughly, also provides an indication of which of the specific vibrations interact more strongly with the introduced bond line defects and are more relevant for further experiments. Swapping the inputs of the most accurate previously trained ANN by interchanging the left and right sidebands results in an average decrease in classification accuracy of 28%. We are therefore convinced that furthermore the asymmetry in the sidebands is a crucial factor for the detection of weak-bonds.
Most precise classification results could be achieved with an asymmetric placement of the piezoceramic actuators in contrast to the shear strength prediction, where the symmetric placement led to the most accurate results. Unfortunately, it was not possible to elaborate this  further in this work due to the destructive testing of the specimen. Thus, an assessment of whether symmetrical or asymmetrical placement, or generally the distance to the bond (or even the different proximity to the clamping points) are relevant factors, are a good starting point for follow-up work and will be investigated in the future. Moreover, instead of focusing on the importance of individual sidebands, possible correlations between several sidebands will be examined in more detail.
Although the prediction of the shear strength varies strongly for different frequencies, which is mostly associated with our rather small training set for this regression purpose (cf. learning curves in the SI), it shows promising capabilities for bonded structures. Due to the nature of Lamb-waves, it would be interesting to investigate if a similar accuracy is achievable for adhesively bonded joints which are not in the load path, like stringers or other stiffening elements.
In summary, combining the vibroacoustic measurements with a machine learning approach as presented in this work shows promising capabilities for the detection of bond line defects. This study confirms that nonlinear modulations of ultrasonic waves can detect damages or adhesive properties in the interface between adhesive and substrate although the wavelength is in the cm-range and hence much larger than the adhesive layer thickness [19]. This is probably associated with the wavelength (and obviously the wave characteristics) of Lamb-waves, being in the range of the defect cross-section, combined with the increased sensitivity due to the cyclic stress change in the bond due to the high-strain pump frequency. It is expected that larger contaminated areas would increase the nonlinear amplitudes of the signal and ultimately improve the ANN accuracy, while contaminations smaller than the Lamb wavelength might be more challenging to detect. Nevertheless, we are optimistic about identifying even smaller defects with the same setup, since microcracks could be successfully detected with VAM, despite the fact that those are much smaller than the Lamb wavelengths as shown in Ref. [27].
Worth to mention is that this supervised ANN approach is well suited for analysing similar specimens, e.g., in mass production, once it is trained carefully. However, differences in specimen design, manufacturing process, or application in unique civil structures may make it necessary to train individual networks for each application or would require an even more sophisticated machine learning approach.

Funding
All authors thanks the Hamburg University of Technology for funding the I 3 -Lab VAM.

Author contributions
BB: Design and execution of the experiments, analysis of results, writing of the draft; EW&BF: Help with development of the experiments, editing and reviewing of the draft; RM: Developing the concept, discussion of results, editing and reviewing of the draft. All authors red and approved the final manuscript.

Data availability
All data is stored on a secure server at the Institute of Polymer and Composites, Hamburg University of Technology. The raw data, the python codes for processing and the processed data required to reproduce these findings are available from the corresponding author on reasonable request.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.