On the hierarchical Bayesian modelling of frequency response functions

For situations that may benefit from information sharing among datasets, e.g., population-based SHM of similar structures, the hierarchical Bayesian approach provides a useful modelling structure. Hierarchical Bayesian models learn statistical distributions at the population (or parent) and the domain levels simultaneously, to bolster statistical strength among the parameters. As a result, variance is reduced among the parameter estimates, particularly when data are limited. In this paper, a combined probabilistic FRF model is developed for a small population of nominally-identical helicopter blades, using a hierarchical Bayesian structure, to support information transfer in the context of sparse data. The modelling approach is also demonstrated in a traditional SHM context, for a single helicopter blade exposed to varying temperatures, to show how the inclusion of physics-based knowledge can improve generalisation beyond the training data, in the context of scarce data. These models address critical challenges in SHM, by accommodating benign variations that present as differences in the underlying dynamics, while also considering (and utilising), the similarities among the domains.


Introduction
The current work is focussed on developing population-based structural health monitoring (PBSHM) techniques, for a group of nominally-identical structures (i.e., a homogeneous population [1,2]), with consideration for the effects of temperature changes, to improve understanding of the variability in the health states of the population and the individual members.(In contrast, a heterogeneous population [3,4] will have greater discrepancies among the members, such as different suspension bridge designs, and may require further processing such as domain adaptation [5][6][7][8]).Even among nominally-identical structures, variations caused by manufacturing differences, ageing parts, and changes in testing conditions can introduce uncertainty in the underlying dynamics.Cawley et al. [9] performed vibration testing on filament-wound carbon fibre-reinforced plastic (CFRP) tubes.Of the 18 tested, six tubes were considered 'normal' (i.e., having the same microstructure, with a ±45 • fibre winding angle) [9].They found that the first and second natural frequencies of the 'normal' tubes varied by as much as 4% [9].When the remaining 12 tubes were also considered (which had intentional defects in their microstructure, such as slightly misaligned fibres, changes in volume fraction, etc.), the natural frequencies varied by as much as 18% [9].Changes in bolt tightness can similarly cause variations in natural frequency.Zhu et al. [10] performed modal testing on an aluminium three-bay space-frame structure with bolted joints, before and after manually loosening several bolts to hand tight.They found that natural frequencies changed by as much as 8%, depending on the mode considered [10].
Global variations, such as changes in ambient temperature, can also affect modal properties.For example, increased temperature may reduce stiffness (depending on the specific material properties of the structure, and the duration/temperature of exposure), which may decrease natural frequency.Colakoglu [11] tested a polyethylene fibre composite beam from -10 • C to 60 • C, and found that the first natural frequency decreased by 12% over the measured temperature range.In addition, for the same polyethylene beam, Colakoglu [11] found that modal damping increased with rising temperature.They concluded that the relationships between temperature and natural frequency, and between temperature and damping, appeared to be functional and monotonic for the polyethylene beam, over the temperature range considered [11].Similar effects of temperature on stiffness/natural frequency and damping have been noted for a magnesium alloy [12] and a composite honeycomb structure [13].Accounting for these benign fluctuations is important for the practical implementation and generalisation of SHM technologies, as features commonly used for damage identification may be sensitive to harmless changes as well as damage [14][15][16][17].(Indeed, it is well known that structural damage can reduce stiffness, often manifesting as a reduction in natural frequency.Benign variations can then either mimic or mask damage, depending on whether they exhibit a stiffening or softening effect [15][16][17].Similar effects have been noted with damping, and multiple studies have shown that damping tends to increase with damage, particularly with crack growth [18].) Data scarcity and sparsity present additional challenges for SHM systems that rely on machine learning.Data scarcity refers to incomplete information regarding the damage-or normal-condition states of structures, particularly those newly in operation, and can impair model training and development.Likewise, sensing networks are prone to data loss (causing sparse data), because of sensor failure caused by harsh environmental conditions or insufficient maintenance.Transmission issues make wireless-sensing networks particularly susceptible to loss, and can be caused by large transmission distances between the sensors and base station [19], software/hardware problems [20], and other issues such as weather changes, interference from nearby devices, or installation difficulties [21].Furthermore, modern systems that produce large amounts of highresolution data can suffer losses resulting from a data transfer bottleneck [22].Significant losses higher than 30% have been reported [20,21,23], and a 0.38% data loss was found to have similar effects on power spectral density (PSD) as 5% additive noise [24].Differences in sample rate among sensors could have a similar presentation, with the data captured at a lower rate seemingly missing data relative to that captured at a higher rate.Again, PBSHM addresses these scarcity/sparsity issues via knowledge transfer among similar structures, so that data-rich members can support those with more limited information.
Homogenous populations can be represented using a general model, called a population form.The form attempts to capture the 'essential nature' of the population and benign variations among the members [1].The form was first introduced in [1,2], where a conventional or single Gaussian process (GP) was applied to frequency response functions (FRFs), to develop a representation for a nominally-identical population of eight degree-of-freedom (DOF) systems.Then, to accommodate greater differences among the nominally-identical members, an overlapping mixture of Gaussian processes (OMGP) [25], was used in [1,26], to infer multivalued wind-turbine power-curve data, with unsupervised categorisation of the data.The OMGP approach [25] was again used in [27] to develop a population form for real and imaginary FRFs, obtained from four nominally-identical, full-scale, composite helicopter blades.Recently, hierarchical modelling was used to improve the predictive capability of simulated [28] and in-service [29] truck-fleet hazard models, and wind-turbine power curves [29].Specifically, the work presented in [28,29] used partial pooling, where data groups are treated as belonging to different domains, and each domain can be considered a realisation from global or population-level distributions.(This approach is in contrast to complete pooling, where all data are considered as belonging to a single domain, or no pooling, where each domain is treated as fully independent of the other domains).It was shown that when populations of structures were allowed to share correlated information, model uncertainty was reduced [28,29].In addition, domains with incomplete data were able to borrow statistical strength from data-rich groups [28,29].In [30], multilevel modelling with partial pooling was used to learn a 2D map of arrival times for waves propagating through a complex plate geometry, via GP regression, using a series of acoustic-emission experiments performed on the same plate but with differing experimental designs.Domain expertise regarding the positive gradient for the expected intertask functions was encoded into the model, via a linear mean function and appropriate priors [30,31].GP kernel hyperparameters were learnt at the domain level, to allow variation in the response surface among the different tests, and were correlated via a higher-level sampling distribution [30].

Research aims of the current work
The current work addresses critical SHM challenges, including uncertainties in the underlying dynamics of structures, and data-scarcity/sparsity issues, which can impair generalisation of SHM technologies.In line with the population forms developed in [1,2,[26][27][28][29], this paper presents generalised, probabilistic FRF models that were developed for the helicopter blades in [27], using a hierarchical Bayesian structure.Two case studies are presented.The first case used FRFs collected from all four blades, at ambient laboratory temperature, with variations among the blades resulting from manufacturing differences (e.g., small discrepancies in material properties and geometry) and boundary conditions.Limited training data that did not fully characterise the resonance peaks were taken from two of the FRFs, while sufficient training data were taken from the remaining two FRFs, so that information could be shared with the data-poor domains via shared distributions over the parameters.This situation is representative of incomplete data in the time domain, which would reduce the number of spectral lines in the frequency domain.Independent models were generated for comparison, to visualise the variance reduction from the combined model.The second case considered vibration data from one of the helicopter blades collected at various temperatures in an environmental chamber.A probabilistic FRF model was again developed using a hierarchical approach with partial pooling.Functional relationships between temperature and natural frequency, and between temperature and damping, were approximated via Taylor series expansion, and polynomial coefficients were learnt at the population level.A subset of temperature-varied FRFs were used to train the model.Populationlevel modal parameters and polynomial coefficients were then used to generalise to temperature-varied FRFs not used to train the model, and model accuracy was evaluated by comparing the results to FRFs computed via measured vibration data.

Paper layout
The layout of this paper is as follows.Section 2 summarises existing research related to population-level monitoring of systems.Section 3 outlines the novelty and contribution of the current work.Section 4 presents the theoretical basis for this research, including modal analysis and hierarchical Bayesian modelling techniques.Section 5 briefly describes the datasets used to develop the models.Sections 6 and 7 discuss the models developed and analysis results for the first and second case studies, respectively.Conclusions are presented in Section 8, and acknowledgements are presented in Section 9.

Related work
Most literature related to population-level monitoring of engineering structures is focussed on transfer learning, which aims to improve predictions in a target domain given a source domain with more complete information.Some cases have involved fine-tuning the classifiers and weights of a pre-trained convolutional neural network (CNN) according to a new dataset.For example, CNNs have been used to detect cracks, as in [32][33][34][35], and other defects (e.g., corrosion, staining), as in [32,36].Other cases have used domain adaptation (DA), a subcategory of transfer learning, where the source and target domains have the same feature space, and the target domain is mapped onto a shared space.Predictions are then made from a single model.For example, domainadversarial neural networks have been implemented for condition monitoring of a fleet of power plants [37], and for fault detection in a gearbox and three-story structure [38].A kernelised linear projection approach to DA has also been studied, between simulated source and target structures [5,6], and between a simulated source structure and experimental target structure [7].DA has also been used to transfer damage detectors across systems using vibration data, in a group of tail planes [39].Lastly, statistic alignment has been shown to improve the performance of established DA methods, by making them more robust to class imbalance and data-sparsity issues [8].
Population-level modelling can also be viewed in the context of multitask learning (MTL), where multiple tasks are solved simultaneously, while considering similarities (and differences) across tasks.In MTL, a combined inference allows sharing of correlated information among domain-specific models, to improve accuracy in data-poor domains [29,40].In the context of modelling engineering infrastructure, MTL has been used as part of a multioutput Gaussian process (GP) regression to learn correlations between data obtained at adjacent sensors from the Canton Tower, to reconstruct missing information [41].(Note that using MTL in this context differs from the current work.In [41], missing time-domain data from temperature and acceleration sensors were reconstructed using MTL, and all data were sourced from a single structure.The current work considers frequency-domain data from multiple structures, as in the first case, and incorporates the relationships between temperature and domain-specific tasks into the model structure, as in the second case.)A similar approach was used in [42], where GP regression assisted with missing data recovery from a faulty sensor, by capturing the correlation among the remaining sensors on a hydroelectric dam.Likewise, in [43], GPs were used to transfer information between spatial temperature or pressure profiles at axial stations within an aeroengine.
Hierarchical Bayesian modelling is an MTL framework, where (lower-level) domain-specific tasks are correlated by conditioning on (higher or populationlevel), shared variables.The technique was first introduced in an SHM context in [44,45], where a multilevel hierarchical Bayesian model was developed to identify damage on a single structure, by inferring stiffness loss from changes in modal parameters, given noisy, incomplete modal data.(Note that this approach differs from the current work.In [44,45], modal parameters were learnt for different damage scenarios using a series of coupled linear regressions to approximate the eigenvalue problem.In [45], the similarities between acceleration responses at adjacent sensors were exploited for data-loss recovery, by modelling the measured signal as a linear combination of basis functions.Both cases considered data from a single structure.The work presented in the current paper, in the first case study, models the entire (band-limited) FRF from multiple structures, using a likelihood function based on modal parameters.The second case presented in the current work models the entire FRF using measurements at various temperatures from a single structure, and incorporates functional relationships between temperature and the modal parameters, enabling predictions at 'unseen' temperature conditions.)Hierarchical Bayesian models were again used in [46], to estimate corrosion rates via data from multiple sensors on the same test structure, to support decision-making in the absence of complete data.In [47], hierarchical Bayesian models were developed using strain measurements from multiple tensile tests to inform the material properties of the samples, in a manner that considered the inherent variability in material properties among the samples and uncertainties related to experimental repeatability.Likewise, in [48], hierarchical Gaussian mixture models were used for combined inference in a simulated population of structures, for the purpose of damage identification.

Novelty and Contribution
Unlike previous efforts, the current work is concerned with developing a probabilistic FRF model, using an FRF equation based on modal parameters (natural frequency, mode shape, and damping) as the likelihood function.As with [28][29][30], a hierarchical Bayesian (or MTL) framework was used.However, the focus is data sharing in the presence of low-resolution frequency information (i.e., given sparse data).The inability to localise both time and frequency to a fine resolution, per the uncertainty principle, means that missing time-domain data (e.g., acceleration, velocity) measured from SHM sensors will result in fewer spectral lines in the frequency domain.This decreased frequency resolution could impair proper characterisation/identification of modal peaks, which may be features of interest in a damage-detection application.By using a combinedinference approach, structures whose FRFs have many spectral lines in a band of interest can lend statistical support to those whose FRFs are limited by missing data.In addition, the current work incorporated functional relationships to describe changes in environmental conditions.This inclusion of functional relationships allows for extrapolation to temperature states not used in model training, which increases the amount of normal-condition information available, to address data-scarcity challenges.In contrast to [30], parameters were learnt over the experimental campaign, rather than hyperparameters, giving greater physical interpretability of the results.

Background theory
In this work, combined probabilistic FRF models were developed using a hierarchical Bayesian modelling approach, to support PBSHM research.This section provides an overview of the methodologies used, including hierarchical multilevel modelling from a Bayesian perspective, linear modal analysis, FRF estimation, and model evaluation.

Hierarchical Bayesian modelling
Hierarchical models can be used to make combined inferences, whereby domains are treated as separate; but, at the same time, it is assumed that each domain is a realisation from a common latent model.This modelling structure involves partial pooling, and is beneficial in that population-level distributions are informed by the full dataset, comprised of multiple domains.In partial pooling, certain parameters are permitted to vary between domains (i.e., varying parameters); which are correlated by conditioning on parent variables at the population level.In the current work, the natural frequency would be a varying parameter.Other parameters can be considered shared among members of a population (e.g., additive noise) and are learnt at the population level (these shared variables can still be sampled from parent distributions, which are also learnt at the population level).In contrast, a complete-pooling approach would consider all population data as having originated from a single source, while a no-pooling approach would involve fitting a single domain independently from the other domains.
Hierarchical models with partial pooling are particularly useful for PBSHM.Because parameters are allowed to vary at the domain level (as opposed to complete pooling), this approach can represent benign variations within a population.In addition, population-level variables are informed by the full dataset, rather than data from a single domain.This increase in statistical power is especially important in situations where one or more domains have limited data [28,29,49].In such cases, parameters from the data-poor domains exhibit shrinkage towards the population mean (therefore borrowing information from the other domains), which tightens the parameter variance [28,29,49].The differences among the data-pooling techniques are shown graphically in Figure 1, using a simple linear regression example.From a general perspective, data from a population comprised of K groups can be denoted via, where y k is the target response vector for inputs x k , and {x i,k , x i,k } are the ith pair of observations in group k [29].Each group is comprised of N k observations, giving a total of K k=1 N k observations [29].The objective is to learn a set of K predictors (one for each domain), related to the regression task, where the tasks satisfy, In other words, for each observation i, the output is determined by evaluating one of K latent functions, f k (x i,k ), plus additive noise, i,k [29].
While each of the k groups can be learned independently, a combined inference can be used to take advantage of the full K k=1 N k population dataset.For example, consider a population that can be expressed using K linear regression models, as in Figure 1, where α k and β k are the intercept and slope for domain k, respectively; x k is a vector of inputs for the kth domain, with length N k ; and k is the noise vector for the kth domain, with length N k , and is assumed to be normally-distributed.Then, the likelihood of the target response vector is given as, A shared hierarchy of prior distributions can be placed over the slopes and intercepts for the groups k ∈ {1, ... , K}, in line with a Bayesian framework.To allow information to flow between groups, parent nodes µ α , σ 2 α and µ β , σ 2 β can be learned at the population level.Note that sometimes, it may be appropriate to learn certain parameters at the population level rather than the domain level.If, for example, the same hardware was used for data collection within the population, one could assume a global noise variance σ 2 that is the same for each domain.Consider the directed graphical models (DGMs) shown in Figures 2a and 2b. Figure 2a shows the DGM for the linear regression example, given an independent (no-pooling) approach.Each model is learnt independently and no information is shared.In contrast, Figure 2b shows the DGM given a partial pooling approach.The slopes and intercepts are indexed by k, and plate notation is used to show that these nodes are repeated.Shared parent nodes µ α , σ 2 α and µ β , σ 2 β are outside of the plates and not indexed by k, meaning that these are population-level variables, and information is permitted to flow between these and the domain-specific parameters.The noise variance σ 2 is not indexed by k, and is also learnt globally, at the population level.The current work considers a hierarchical probabilistic FRF model, that uses an accelerance FRF estimate (based on modal parameters), as the mean of the likelihood function.A brief introduction to modal analysis and FRF estimation is provided.

Modal analysis and FRF estimation
The equation of motion for a multiple DOF system can be written as, where ü(t), u(t), u(t), and z(t) are acceleration, velocity, displacement, and force, respectively.In most cases, the mass M, damping C, and stiffness K matrices are coupled.For linear systems, and in the absence of viscous damping, the equation of motion can be decoupled, such that the system is represented by multiple single degree-of-freedom (SDOF) oscillators.This decoupling is performed via the eigenvalue expression, and yields the natural frequencies in radians, ω n , and mode shapes, Ψ, of the system.
The physical equation of motion can then be cast in modal space to give the uncoupled modal equations, with modal coordinates p(t), written as, where, FRFs aid visualisation of the natural frequency components of a system, and are computed by normalising the response signal at a given location to the excitation force.This work used the H 1 estimator to compute FRFs.For a response at location h resulting from excitation at j, the H 1 estimator is computed as, where, and, The asterisk * denotes complex conjugation, ω is frequency in radians, and F is a discrete Fourier transform (this work used a fast Fourier transform or FFT).The input force in the time domain at location j is z j (t), and the output response (i.e., acceleration, velocity, or displacement) in the time domain at h is u h (t).With assumed linear behaviour and proportional damping, the (complex) accelerance FRF (i.e., given acceleration response data) can also be estimated using modal parameters, where hj is the residue for mode m, defined as the product of the massnormalised mode shapes at locations h and j (A (m) hj = ψ hm ψ jm ) [50].The natural frequency associated with mode m is ω nat,m , and the modal damping associated with mode m is ζ m [50].The real and imaginary parts of the FRF can be computed independently, by multiplying Eq. ( 12) by its complex conjugate.This work considers FRFs from multiple domains (different structures in Case 1, and a single structure in Case 2 with data obtained at varying temperatures), with each domain indexed by k.In each case, only one FRF is assigned to each domain, from a given measurement location (so, subscripts h and j can be neglected).Thus, the real and imaginary components of the FRF from the kth domain, given a vector of frequency inputs ω k , can be estimated via, Note that residues A m are not indexed by k in Eq. ( 13).For the models presented in the case studies below, mode shapes were shared among the population, to address identifiability concerns during sampling, and because mode shapes are often less sensitive to global variations compared to other modal parameters.
The current work has focussed on the real part of the FRF, and Eq. ( 13) provided the mean of the likelihood function for the models developed.The residues, modal damping, and natural frequencies for each domain were then learnt as hyperparameters for the hierarchical model.Learning the real part of the FRF was sufficient for demonstrating the proposed technology.However, the imaginary part could be learnt using the same methods, with Eq. ( 14) providing the likelihood function, or could be inferred (at least in part), by exploiting the causal relationship between the real and imaginary components of the FRF [50].

Model evaluation for generalisation beyond training data (Case 2)
The normalised mean-squared error (NSME) was calculated to evaluate the accuracy of the extrapolation to temperatures beyond the training data, for the second case presented herein, via where M is the number of test data points, y is the test data, σ 2 y is the variance of the test data, and y * is the predicted function.Normalising to the test data variance allows for comparison of results on a consistent scale, regardless of signal magnitude.According to convention, an NMSE less than 5% suggests that the model fits the data well.

Dataset summary
The population dataset was comprised of vibration data collected from four healthy, nominally-identical, full-scale composite1 blades from a Gazelle helicopter (referenced in this paper as Blades 1-4).Data were collected using Siemens PLM LMS SCADAS hardware and software at the Laboratory for Verification and Validation2 (LVV) in Sheffield, UK.The first case used FRFs calculated from data that were collected on all four blades at ambient laboratory temperature, as described in [27].The second case used FRFs calculated from data that were collected on a single blade (Blade 1) at multiple temperatures in an environmental chamber.

Data collection at ambient laboratory temperature
The first case used data collected at ambient laboratory temperature, with the blades in a fixed-free boundary condition, which was approximated by placing the root end of each blade in a substantiated strong-wall mount.Ten uniaxial 100 mV/g accelerometers were placed along the length of the underside of each blade.Note that the same accelerometers were used on each blade, and care was taken to ensure that they were attached to approximately the same locations on each blade.An electrodynamic shaker with force gauge was mounted to a fixture bolted to the laboratory floor and attached to the blade 0.575 metres from the root.The shaker was attached to the underside of the blade in the flapwise direction.A continuous random excitation was generated in LMS (note: LMS refers to Siemens PLM LMS SCADAS hardware and software) and applied to excite the blade up to 400 Hz, with a step size of 4.88e-02 Hz.Throughput time data were collected for each test, and the data were divided into 20 blocks.Hanning windows were applied and FRFs were computed in LMS, which were then averaged in the frequency domain.The experimental setup is shown in Figure 3, and the accelerometer positions on the blades are shown in Figure 4.
To simplify the analysis, a narrow frequency band was selected between 24 and 61 Hz, with the fourth and fifth bending modes of the blades dominating the response in this band.Although other modes appear to have a small influence in this band, a 2DOF assumption was imposed.(This assumption results in smoothing of the FRF over the band, and might result in some loss of interpretability, but is acceptable for these preliminary analyses).The real part was modelled as a probabilistic FRF, using the FRF estimate from Eq. ( 13) as the mean of the likelihood function, as described in Case 1, presented in Section 6 of this paper.The real parts of the averaged FRFs for each blade, at the second accelerometer from the blade root (corresponding to the drive-point location), are shown in Figures 5a and 5b. Figure 5a shows the full measured bandwidth, and Figure 5b shows the FRF in the bandwidth of interest, between 24 and 61 Hz.Figures 5a and 5b show increasing variability with respect to frequency, which is an expected result, given that higher-frequency modes are more sensitive to small physical changes than lower-frequency modes.For modes less than 80 Hz, the maximum frequency difference among the blades was approximately 2.5 Hz; for modes greater than 80 Hz, the maximum frequency difference was approximately 6.3 Hz.Note the grouping visible at several of the peaks, where Blades 1 and 2 appear closely aligned in frequency while Blades 3 and 4 appear closely aligned.These results are quite relevant for PBSHM.All of the helicopter blades are healthy, and represent a normal-condition state of the population.Consider a situation where only FRFs from one of the groupings are available for training a model (or FRFs from the other groups are missing data).The normal condition could be heavily biased towards the training set, and incoming FRFs could be flagged as damaged, even if they are healthy.Further details regarding the data collection and processing for these tests can be found in [27].

Data collection at various temperatures in an environmental chamber
The second case used data from Blade 1, collected at temperatures ranging from -20 to 30 • C in increments of 5 • C in an environmental chamber.The blade was tested in an approximately fixed-free boundary condition, with the blade root mounted on a fixture that was substantiated with a large concrete block.These tests used the same accelerometers and sensor layout, as the previous tests at ambient laboratory temperature.Likewise, the same dataacquisition parameters and processing methods were used.The shaker was mounted to the bottom plate of the test fixture, and the shaker and force gauge were connected to the underside of the blades to excite the blades in the flapwise direction.A thermal jacket was used to protect the shaker when testing at lower temperatures.Prior to data collection, the environmental chamber was set to each of the temperatures of interest, after which the blades were allowed to soak for at least two hours to reach the desired temperature.Throughput force and acceleration data were collected for each test.The data were then divided into 20 blocks, and a Hanning window was applied to each data block.FRFs were then computed in LMS, and averaged in the frequency domain.The experimental setup, with the helicopter blade inside the environmental chamber, is shown in Figure 6.
To simplify the analysis, a narrow frequency band was selected between 135 and 155 Hz, with a higher-order bending mode of the blade dominating the response in this band.An SDOF assumption was imposed.(Again, this as-sumption was considered acceptable for these preliminary analyses, to avoid data-separability issues.)FRFs captured at temperatures -10, -5, 10, and 25 • C provided the training data.The remaining temperature-varied FRFs were used to evaluate the extrapolation results to other temperatures.As with the previous case, the real part was modelled as a probabilistic FRF, using the FRF estimate from Eq. ( 13) as the mean of the likelihood function.The real parts of the averaged FRFs for each temperature, from the fourth accelerometer from the blade tip, are shown in Figures 7a and 7b. Figure 7a shows the full measured bandwidth, and Figure 7b shows the FRF in the bandwidth of interest, between 135 and 155 Hz.

Figures 7a and 7b show a proportional decrease in frequency corresponding
to each incremental temperature increase, with discrepancies more noticeable at the higher modes than the lower modes.These results are expected, as the higher-frequency modes are more sensitive to environmental and other changes.The maximum frequency difference among the tests was approximately 15.3 Hz (for modes less than 400 Hz), found in the band between 335 and 375 Hz, as obtained via peak-picking.In the same band, the average frequency difference for each 5 • C increment was approximately 1.5 Hz.For the mode at approximately 145 Hz shown in Figure 7b, the maximum frequency difference was approximately 3.8 Hz, and the average frequency difference for each 5 • C increment was approximately 0.38 Hz.Further details regarding the data collection and processing for these tests can be found in [52].

Case 1: population-based modelling of FRFs for nominally-identical structures
The first case used FRF data from four nominally-identical helicopter blades, collected at ambient laboratory temperature.A population form for the FRFs of the helicopter blades was developed using the aforementioned hierarchical (partial-pooling) approach.Models were developed using the probabilistic programming language Stan.Analyses were performed using MCMC, via the no U-turn (NUTS) implementation of Hamiltonian Monte Carlo (HMC) [53,54].HMC uses approaches based in differential geometry to generate transitions that span the full marginal variance, which allows the algorithm to accommodate large curvature changes in the target distribution (which are common with hierarchical models) [54].This allows the sampler to efficiently explore the posterior distribution, without being susceptible to the random-walk behaviour that can occur with other samplers [54].

Model development
The population of helicopter blade FRFs, with frequency inputs, ω k , and accelerance outputs, H k , was re-written from the general representation in Eq. ( 1), (16) where {ω ik , H ik } was the i th pair of observations in domain k.Then, considering only the real component of the FRF, Eq. ( 2) was re-written as, (17) where for each observation i, the output was determined by evaluating one of K latent functions, f k (ω i,k ), plus additive noise, i,k .For this work, there were four helicopter blades and four domains.Therefore, the population model included four latent functions (i.e., K = 4).The (real) FRFs were modelled probabilistically with an assumed Gaussian-distributed likelihood, where f k (ω k ) was equal to the real component of the FRF, calculated using modal parameters via Eq.( 13).The additive noise variance σ 2 H of the FRFs was assumed to be the same for each blade.This assumption was reasonable, as the same data acquisition system and sensors were used among the different tests.Note that ω k was permitted to vary depending on the given FRF.This allowed for flexibility, to consider FRFs with different numbers of spectral lines or missing data points, and to represent population uncertainty.
Natural frequencies and modal damping were learnt at the domain level, and allowed to vary among the different helicopter blades.Residues were shared among the different domains and learnt at the population level, to mitigate model identifiability issues.Shared, population-level prior distributions (with hyperpriors) were also placed over the modal parameters to capture/infer the similarity among the FRFs.Domain-level natural frequencies, , were sampled from a truncated normal parent distribution, with higher-level expectation and variance sampled from truncated normal distributions, , was sampled from a beta parent distribution, with higher-level shape parameters sampled from truncated normal distributions, Likewise, shared modal residues, A = {A m } 2 m=1 , were sampled from a normal parent distribution, with higher-level expectation and variance sampled from normal and truncated normal distributions, respectively, For simplicity, normal distributions were chosen for most of the parameters (or truncated normal distributions, if the parameter was restricted to be positive), although other distributions could be used.A beta distribution was chosen for damping, as beta distributions have support x ∈ [0, 1], which is suitable for lightly-damped systems.Hyperpriors were chosen based on the physics that could be interpreted by visual inspection of the training data (e.g., natural frequency), or by fitting the data-rich domains independently (discussed below).Note that hyperpriors for ω nat are shown in rad/s.Shared noise variance, σ 2 H , was sampled from a truncated normal parent distribution, with higher-level expectation and variance also sampled from truncated normal distributions, Note that a non-informative prior was used for the noise variance, to stabilise computation.To incorporate more prior information, the half-t family of distributions (e.g., half-Cauchy distribution) can be used instead [55].A graphical model displaying the parameter hierarchy is shown in Figure 8. Prior to fitting the FRF model, random Gaussian noise was generated, with a magnitude equal to 5% of the absolute peak value of the FRF from Blade 2. (Noise was added to the training data to avoid over-fitting small deviations/lightly participating modes within the band of interest.)The noise was then added to each FRF, and 100, 100, 7, and 20 training points were randomly selected from the FRFs belonging to Blades 1 to 4, respectively.The intent was that the two data-rich FRFs (Blades 1 and 2) would lend statistical support to the sparser domains (Blades 3 and 4), thereby reducing uncertainty compared to an approach without pooling.Recall the grouping of the peaks from the FRFs shown in Figures 5a and 5b.Both data-rich FRFs belonged to one of the groupings; likewise, the two data-poor FRFs belonged to the other grouping.This represents a challenging situation where the model can be heavily biased towards the available training data.The hierarchical/partial-pooling modelling structure allows the target distributions to be informed by the data (for very data-poor domains, this will of course be limited), but with the help of the full population dataset (including the data-rich domains).
The Stan HMC sampler was run using four chains, with a target average proposal acceptance probability rate of 0.99 (this parameter controls the target acceptance rate of the NUTS algorithm, and setting it to a value close to 1 can reduce the number of false-positive divergences [56]).The sampler was run for 10000 samples per chain (and an additional 5000 warm-up samples were discarded per chain to diminish the influence of the starting values [49]), for each parameter.In addition, models were run for each set of blade data independently (no pooling), for comparison with the partial-pooling model.

Model results
FRFs for each blade were computed via Eq.( 13), from the samples of the modal parameters.Total variance was estimated by adding the standard deviation of the FRFs to the expectation of the noise variance.Posterior predictive mean and 3-sigma deviation for the partial pooling and independent models are plotted in Figures 9a to 10b, respectively.Indeed, Figures 9a to 10b show that the sparser FRFs improved significantly for the combined model, compared to the independent models.With only 7 data points, Blade 3, shown in Figure 10a in green, was missing most of the information necessary to characterise the two modes.The independent model for Blade 3 relied heavily on the user-specified prior, while the partial-pooling model borrowed information from the other three blades to inform the shared latent model.Similar (albeit less severe), results were seen for Blade 4, shown in Figure 10b in purple.With 20 data points, the FRF for Blade 4 was less sparse than that for Blade 3, but was still somewhat lacking near the resonance peaks, especially for the first mode.Figure 10b shows a significant reduction in variance with the partial-pooling approach.The improvements in the data-rich FRFs, Blades 1 (blue) and 2 (red) were not as apparent, as expected.
Variance reduction from the combined-inference (partial-pooling) approach can also be visualised by plotting the marginal distributions of the hyperparameters.For the population-level variables, marginal distributions were approximated by sampling the expectation and variance for each variable (or shape parameters, for damping), and then drawing from a distribution (normal for natural frequency and residue, and beta for damping) with the same statistics as the parameter draws.Kernel density estimation (KDE) was then used to approximate marginal distributions using the population-level draws.Marginal distributions for the domain-level variables were approximated via KDE of the samples.For the natural frequencies, the population-and domain-level distributions for each mode are shown in Figures 11a and 11b.For the (shared) residues, which were learnt at the population-level, the parent-and lower-level distributions for each mode are shown in Figures 12a and 12b.For damping, the population-and domain-level distributions for each mode are shown in Figures 11c and 11d.
Figures 11a to 12b show that when the domains were able to share information via common higher-level distributions, the variance was reduced compared to a no-pooling approach.In Figure 11a, for the first natural frequency, the population-level distribution of the partial-pooling model was taller and with lower variance than the independent models, which was likely because the datarich domains dominated the available data for that mode, compared to the data-poor domains (forcing the distribution towards the natural frequencies of the data-rich domains).This data sharing was also evident at the domain-level, where the KDEs of the data-poor domains aligned closely with those from the data-rich domains, for the partial-pooling models.The no-pooling KDEs of the data-poor FRFs differed significantly, and were primarily informed by the priors.In contrast, Figure 11b shows that for the second natural frequency, the population-level distribution for the partial-pooling model was flatter than those for the independent models.This was likely to have occurred because more data were available for the second peak from Blade 4, which was from the second grouping (widening the distribution to accommodate the greater differences in natural frequency).Again, this conclusion is supported at the domain-level, where the KDE for Blade 4 is very similar for the partial-pooling and independent models (although there is still a small variance reduction for the partial-pooling model).For Blade 3, which was quite data-poor at the second mode as well as the first, the KDE in Figure 11b shows that the natural frequency was estimated to occur in between those of the other, more data-rich blades.Again, for Blade 3, the no-pooling KDE differed significantly from the others, as this parameter was primarily informed by the priors, because of a lack of data at the second mode.Figures 11c and 11d show that the population-level distributions for damping did not vary significantly among the partial-pooling and independent models, likely because the priors were fairly strong and also accurate to the data.At the domain-level, KDEs for the data-rich FRFs showed high similarity for the different pooling models, while KDEs for the data-poor FRFs tended towards the priors.Similar results were seen for the residues, in Figures 12a and 12b, except that the residues were shared among the domains for the partial-pooling model, so there was only one lower-level distribution for each mode.Close alignment of the domain-level distributions, of the data-rich domains, suggests that sharing the residue among the different blades was a suitable assumption.
The results presented in this case demonstrate the development of a population form, where a combined hierarchical FRF model is learnt for a group of nominally-identical helicopter blades.Two domains (Blades 3 and 4) had limited data, and were especially sparse near the resonance peaks.Such a situation could occur if an insufficient frequency resolution was selected during the data acquisition process.By borrowing data from data-rich domains within the population, variance was reduced compared to an independent modelling approach.

Case 2: population-based modelling of FRFs with temperature variation
The second case also used a hierarchical modelling structure with partial pooling, but assumed non-exchangeable models whereby parameters of the FRF varied with respect to temperature, i.e., the parameters were conditioned on the data.The goal was to learn functional relationships (at the population level), between temperature and natural frequency, and between temperature and damping, so that inferences could be made at temperatures not used in model training.As with the first case, all models in the second case were developed using Stan, and analyses were performed using MCMC, via the no U-turn (NUTS) implementation of Hamiltonian Monte Carlo (HMC) [53,54].

Identification of appropriate functional relationships between temperature and the modal parameters
Prior to developing the partial-pooling model, it was useful to determine appropriate functions for the temperature relationships.Independent models for each (real) temperature-varied FRF from Figure 7b were fitted, and groundtruth values were estimated by computing the expectation for each modal parameter.(Note that in the second case, the modal parameters from the independent models for FRFs at all measured temperatures were considered to be the ground truth, as each FRF was data-rich.This process of using the same data to train and develop the model structure is called post-selection inference [57], and must be used cautiously, as models that employ the technique are at risk of overfitting [30,49].However, it was assumed for this work that because the number of candidate models was small, the bias resulting from using the data twice was also small [30,49].)The ground-truth estimates for natural frequency and damping are plotted against temperature, and shown with least-squares fits (polynomial for natural frequency, and linear for damping), in Figures 13a and  13b.In the figures, the ground-truth estimates are plotted as asterisks, and those associated with the FRFs used to train the combined-inference model are shown in red, while those at 'unseen' temperatures are shown in black.The residue\mode shape was assumed to be constant with respect to temperature.Figure 13a shows that a second-order polynomial appears to be an appropriate fit for natural frequency in the temperature range of interest.From Figure 13b, a linear fit was considered most appropriate for damping.Although there is likely a (weakly) nonlinear relationship between temperature and damping for this dataset, a linear assumption provided good results for the experiments used here.A higher-order Taylor series expansion for the temperature-damping relationship would require many coefficients, which would significantly increase the number of hyperparameters in the model.Further studies, including materialproperties evaluation, may help elucidate the nature of this relationship specific to the helicopter blades, which may allow for better inferences from increased physics-based knowledge.

Model development and results
As with the first case, the population model for the second case included four latent functions (i.e., K = 4), as four temperature-varied FRFs provided training data, i.e., those at -10, -5, 10, and 25 • C. Likewise, only the real part of the FRF was fitted for simplicity.The real components of the FRFs were modelled probabilistically with an assumed Gaussian distribution, using the same accelerance FRF estimation as a likelihood function, as in Eqs. 13 and 18.The additive noise variance σ 2 H of the FRFs was assumed shared among each of the domains (again, the same data acquisition system and sensors were used among the different tests), and the modal residue was assumed constant with respect to temperature.
Shared, higher-level distributions were placed over natural frequency, Note that hyperpriors for ω nat are shown in rad/s.Also note that for this preliminary model, normal distributions were assumed for all parameters (with truncated normal distributions assumed for variance, damping, and natural frequency).The residue, A, was assumed constant over all temperatures, with distributions, A second-order Taylor series expansion was used to approximate the functional relationship between temperature and natural frequency, with coefficients a = {a 1 , a 2 }, where the domain-level natural frequencies were defined as temperature-shifted realisations from the population-level distributions, via, Likewise, the relationship between temperature and modal damping was approximated as linear, with slope b, over the measured temperatures, via, where T k was the temperature associated with the kth FRF.Priors were placed over the shared polynomial coefficients, with assumed distributions, The shared noise variance σ 2 H was expressed via, A graphical model displaying the parameter hierarchy is shown in Figure 14.Prior to fitting the FRF model, random Gaussian noise was generated, with a magnitude equal to 5% of the absolute peak value of the FRFs.The noise was then added to each FRF, and 100 training points were randomly selected from each of the FRFs captured at temperatures -10, -5, 10, and 25 • C. The intent was to use a sub-set of the temperature-varied FRFs, to learn the populationlevel coefficients necessary to describe the relationships between temperature and the modal parameters, so that these population-level variables can be used to make predictions at 'unseen' temperatures (of course, these predictions were validated using experiments that did not contribute to the training data).As with the previous case, the Stan HMC sampler was run using four chains (and with a target average proposal acceptance probability rate of 0.99), for 10000 samples per chain (with an additional 5000 warm-up samples per chain, which were discarded), for each parameter.The expectation for the domain and population-level parameters were obtained from the samples.The expectation of the domain-level natural frequencies and damping, and shared residue and noise variance, were used to compute the FRFs via Eq.(13)  residue and noise variance were assumed to be temperature-invariant, they were also used to extrapolate to other temperatures).

Extrapolation to other temperatures
After training the model, predictions were made at 'unseen' temperatures, not used in training, using the expectation of population-level variables (shown in Table 1), and compared to measured FRFs to evaluate model accuracy.All shared variables (i.e., residue, polynomial coefficients, and noise variance), were assumed equal to the expectation of the variables computed using the model.Natural frequency and damping were estimated for all the measured temperatures, T = {T i } 11 i=1 , including the four used in training, via the relations, The extrapolated FRFs are shown in Figure 16, ranging from -20 to 30 • C in increments of 5 • C, with decreasing temperature from left to right, as natural frequency increases with decreasing temperature.Model accuracy was evaluated by calculating the NMSE via Eq.( 15) for each measured FRF with added noise, as shown in Table 2.Note that the NMSE computed for the prediction at temperatures -10, -5, 10, and 25 • C, which correspond to the data used in model training, are included in the table for completeness.However, the specific training points were excluded from the calculation.To further visualise the relationships between temperature and the modal parameters, the functions were computed for each sample.The mean and variance of the functions were plotted with the parameters estimated via independent modelling (in this case, these results were considered to be the best approximation of the ground truth, as each FRF was data-rich).These plots are shown in Figures 17a and 17b.
In Figure 16, the extrapolated FRFs are shown as solid blue lines, with a shaded blue region indicating the variance bounds.The measured FRFs used to train the model, without added noise, are shown as solid red lines, and the measured FRFs used to test the model at 'unseen' temperatures are shown as dashed black lines.From the figure, it is clear that excellent agreement was achieved between the extrapolated and measured FRFs at the temperatures used to train the model, as expected.Likewise, the FRFs at temperatures between those used to train the model (e.g., at 0, 15, and 20 • C) show excellent agreement.Notably, the FRFs at colder temperatures, which were further away from the training data, still show good agreement.The accuracy of the fit is further shown in Table 2, as all NMSE values were less than 5%.
Figures 17a and 17b further show that the extrapolated parameters accurately represented the measured data, as the measured parameters largely fell within the variance bounds of the approximations.Figure 17a shows that for natural frequency, parameter variance increased as the inferred parameters varied further from the training data (i.e., at temperatures further from those used to train the model), as expected.Figure 17b shows that for damping, the relatively consistent variance was the result of the linear assumption for the temperature-damping relationship, and the relatively high deviation of the measured data from the linear fit.Further model development could involve the incorporation of more physics-based knowledge, such as, forcing the relationship between natural frequency and temperature to be monotonically decreasing, or further investigation into the possibly nonlinear relationship between damping and temperature.

Concluding remarks
Current work has involved the development of probabilistic FRF models using a hierarchical Bayesian approach, that account for benign variations as well as similarities among nominally-identical structures.Two cases were presented, to demonstrate the usefulness of this modelling structure.The first case demonstrated how hierarchical Bayesian models with partial pooling can reduce variance in data-poor domains by allowing information transfer with data-rich domains, via shared population-level distributions.The second case showed that incorporating functional relationships (approximated via Taylor series expansion) into the modelling structure to describe temperature variations allows for prediction beyond the training data.The first case addresses data-sparsity challenges in SHM, as missing time-domain data resulting from sensor dropout and other causes will reduce the number of spectral lines in the frequency domain, which can impede dynamic characterisation.Likewise, the second case addresses data-scarcity issues, as encoding physics-based knowledge into the model via functional relationships can increase the amount of normal-condition information available.Future efforts may involve further investigation into the physical relationships between temperature and the modal parameters, particularly damping, to improve model accuracy.

Acknowledgements
The authors gratefully acknowledge the support of the UK Engineering and Physical Sciences Research Council (EPSRC), via grant reference EP/W005816/1.This research made use of The Laboratory for Verification and Validation (LVV), which was funded by the EPSRC (via EP/J013714/1 and EP/N010884/1), the European Regional Development Fund (ERDF), and the University of Sheffield.For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.

Figure 1 :
Figure 1: Comparison of data-pooling approaches, using a simple linear regression example.

Figure 3 :
Figure 3: Helicopter blade in a substantiated wall mount.

Figure 4 :
Figure 4: Sensor locations on the helicopter blades.

Figure 5 :
Figure 5: Real components of the helicopter blades FRFs, (a) full bandwidth and (b) bandwidth of interest, at the drive-point location, from vibration data collected at ambient laboratory temperature.

Figure 6 :
Figure 6: Helicopter blade in an environmental chamber.

Figure 7 :
Figure 7: Real components of the helicopter blades FRFs, (a) full bandwidth and (b) bandwidth of interest, at the fourth accelerometer from the blade tip, from vibration data collected at various temperatures in an environmental chamber.

Figure 8 :
Figure 8: Graphical representation of the hierarchical FRF model developed for the first case.

Figure 9 :
Figure 9: Posterior predictive mean and 3-sigma deviation for independent (no-pooling) and partial-pooling models, for data-rich domains (a) Blade 1 (blue), and (b) Blade 2 (red).Training data are shown in a black scatter plot.Solid lines are the posterior predictive means for the partial-pooling models, and dashed lines are the posterior predictive means for the independent models.The variance is represented by shaded regions, where the independent model variances are shown in a lighter colour than those for the partial-pooling models.

Figure 10 :
Figure 10: Posterior predictive mean and 3-sigma deviation for independent (no-pooling) and partial-pooling models, for data-poor domains (a) Blade 3 (green) and (b) Blade 4 (purple).Training data are shown in a black scatter plot.Solid lines are the posterior predictive means for the partial-pooling models, and dashed lines are the posterior predictive means for the independent models.The variance is represented by shaded regions, where the independent model variances are shown in a lighter colour than those for the partial-pooling models.

Figure 11 :Figure 12 :
Figure 11: Marginal distributions for natural frequency, (a) mode 1 and (b) mode 2; and for damping, (c) mode 1 and (d) mode 2. Population-and domain-level distributions are shown at the top and bottom, respectively.For the partial-pooling model, population-level distributions are shaded and grey, and domain-level distributions are shaded and blue (Blade 1), red (Blade 2), green (Blade 3), and purple (Blade 4).For the independent (no-pooling) models, population-and domain-level distributions are shown as dashed lines, using the same colour scheme as the partial-pooling model.

Figure 13 :
Figure 13: Ground-truth estimates for (a) natural frequency and (b) damping, plotted against temperature.Second-order polynomial (for natural frequency) and linear (for damping) leastsquares fits are shown as solid black lines.Parameters associated with FRFs used to train the model (-10, -5, 10, and 25 • C) are shown in red.

Figure 14 :
Figure 14: Graphical representation of hierarchical FRF model for the second case.

Figure 15 :
Figure 15: Posterior predictive mean and 3-sigma deviation for probabilistic FRF model, for training data at -10, -5, 10, and 25 • C. Training data are shown in a black scatter plot.Solid lines are the posterior predictive means, and the variance is represented by shaded regions.

Figure 16 :
Figure 16: Extrapolated FRFs using population-level variables, from -20 to 30 • C in increments of 5 • C, with decreasing temperature from left to right.The extrapolated FRF means are shown as solid blue lines, with shaded blue regions indicating variance bounds (3-sigma deviation).Solid red lines denote the FRFs that provided training data for the model, while dashed black lines denote the measured FRFs at 'unseen' temperatures.

Figure 17 :
Figure 17: Mean extrapolated (a) natural frequency and (b) damping for Blade 1, at training and 'unseen' temperatures, plotted against temperature, with 3-sigma deviation (shown in blue).Ground-truth model variables are shown as stars, with red indicating the temperatures used in training and black denoting 'unseen' temperatures.
, as shown in Figure 15.The expectation of the population-level variables which were used to extrapolate to other temperatures are shown in Table 1.(Because the shared

Table 1 :
Expectation of the population-level variables.

Table 2 :
Evaluation of model accuracy for extrapolation to 'unseen' data.