Deep learning for frequency response prediction of a multimass oscillator

Noise prevention in product development is becoming more and more important due to health effects and comfort restrictions. In the product development process, costly and time‐consuming simulations are carried out over parameter spaces, for example, via the finite element method, in order to find quiet product designs. The solution of dynamic systems is limited in their maximum frequency and the size of the parameter space. Therefore, the substitution of high‐fidelity models by machine learning approaches is desirable. In this contribution, we consider an academic benchmark: Training a neural network to predict the frequency response of a multimass oscillator. Neural network architectures based on multilayer perceptrons and transformers are investigated and compared with respect to their accuracy in frequency response prediction. Our investigations suggest that the transformer architecture is better suited, in terms of accuracy and in terms of capability to handle multiple system configurations in a single model. The code of this work is available at https://eckerlab.org/code/acoustics_mmo_2023


INTRODUCTION
Understanding and analyzing structure-borne sound is crucial in acoustical engineering.Structure-borne sound refers to vibrations that propagate through solid materials, such as buildings, vehicles, or even human bodies.In many engineering applications, for example, in the automotive or aerospace industries, motor vibrations cause vibrations of the vehicle frame, eventually leading to undesired audible sound inside the passenger compartment.To effectively reduce the transmission of structure-borne sound, advanced modeling and simulation techniques are needed.A widely used simulation technique is the finite element method (FEM), in which the domain is discretized into simple geometric elements and the solution is approximated on a finite-dimensional subspace, consisting of simple ansatz functions.The FEM can provide sufficiently accurate results for complex systems, like full vehicle simulations or aircraft cabin noise simulations.However, analyzing such dynamical systems across a broad frequency range requires a fine discretization to be able to resolve the propagating waves.This fine discretization leads to large system matrices, which induce a high computational burden.
To mitigate this computational burden, machine learning (ML) models are promising candidates to be applied as surrogate models.After successful training, ML models can replace the complex to evaluate FEM simulation, possibly enabling many-query tasks like optimization or uncertainty quantification.ML models have already been successfully applied in several engineering fields, such as fluid dynamics [1], material science [2], or climate modeling [3].In the context of acoustics, [4] gives an overview about recent efforts to incorporate ML techniques in acoustic engineering.Moreover, there are works focusing on the design of acoustic metamaterials [5] by conditional generative adversarial networks.However, the prediction of frequency response functions (FRF) by deep neural networks is much less explored in literature.In [6], the frequency response of a wooden plate is represented in modal space and the eigenfrequencies are estimated via multilinear regression, whereas a deep neural network is used to predict the amplitude for varying material parameters.In this contribution, we aim to apply pure data-driven deep learning models to predict the direct solution of dynamical systems in the frequency domain.We consider a common academic benchmark, a multimass oscillator, and train a neural network to predict the FRFs for different parameter configurations.A standard multilayer perceptron (MLP) network architecture is compared to the more recent transformer architecture, originally developed for language processing.Furthermore, we investigate different approaches to describe the frequency domain: Either the FRF is predicted on a fixed grid or the frequency is passed as an input parameter to the neural network.

DYNAMIC BENCHMARK MODEL
Throughout this work, we consider a multimass oscillator (discrete mass-spring-damper system) as a dynamic benchmark model.Although this model mainly has academic relevance, it still contains important dynamical features such as parameter-dependent resonance and antiresonance peaks, which are observed in more complex models as well.In addition, the linear dynamical system (Equation ( 5)) is a general structure also arising after applying discretization techniques, such as the FEM, to linear continuous problems.Specifically, we consider a four degrees of freedom multimass oscillator, which is excited by a harmonic foot point excitation () (Figure 1A).The equations of motion describing the dynamical system read: This system of second-order linear differential equations can be written in matrix vector notation resulting in: With harmonic excitations of the form  () = f exp (Ω), it is reasonable to also consider a harmonic solution () = x exp (Ω).Here, f and x denote the complex-valued amplitudes of the force and displacement vector and Ω is the excitation frequency.By using this harmonic ansatz, we can transform Equation ( 5) to the frequency domain, yielding: Solving this linear system for discrete frequencies Ω 1 , Ω 2 , … Ω  is equivalent to evaluating the FRF x(Ω).The frequency response function for all four degrees of freedom is plotted in Figure 1(B).The FRF depends on the parameters  = []] of the system.The goal of this work is to construct an accurate surrogate of the system response amplitude where (Ω, ) denotes the true model and g (Ω, ) the surrogate model parameterized by the parameter vector .

Neural network architectures
To predict the frequency response, we employ neural networks that receive the mass ([]), spring ([]), and damping ([]) parameters of the multimass oscillator as input.Before processing in a neural network, the parameters as well as the frequencies are rescaled to range between [−1, 1] for the whole dataset by considering the sampling range of values.In a first step, the scalar parameters are embedded by a linear layer to a size of .Specifically, we have separate linear layers for mass, stiffness, and damping parameters resulting in a  parameters ×  sized embedding.
We differentiate between a grid-based and a query-based approach.The former predicts the frequency response on a fixed interval grid at once, that is, () ∈ ℝ 4×200 (four masses and 200 frequencies).The query-based approach receives the target frequency as an input and makes a prediction for this frequency only, that is, (Ω, ) ∈ ℝ 4 .We obtain results for the full discretized frequency band by querying the networks with all frequencies individually.The query-based approach follows implicit models designed for shape and radiance field representation [7,8].
Besides the distinction between grid-based and query-based frequency response prediction, we employ MLPs and transformers [9] as building blocks.Combining both variants, once with the query-based and once with the grid-based approach, yields four architectures: • Grid-based MLP.We employ a six-layer MLP with a constant width of 256 neurons, mapping to 4 × 200 output values.
The MLP receives as input the  parameters ×  sized embedding.• Query-based MLP.We employ a seven-layer MLP with a constant width of 256 neurons, mapping to four output values.
This MLP additionally receives the query frequency encoded by a linear layer as input.• Grid-based transformer.We take the embedded parameters as tokens and process them with a transformer encoder with depth 6, four heads, an embedding dimensionality of 128, and sinusoidal positional encoding.After processing in the transformer, we take the four tokens stemming from the mass parameters and separately perform a linear mapping from 128 to 200 frequency predictions.• Query-based transformer.For query-based frequency response prediction, we first employ a transformer encoder in the same way as for the grid-based prediction with three heads, a depth of four and a token dimensionality of 66. Afterward, we again only take the tokens stemming from the mass parameters and process them in a second transformer.In this decoder, instead of using a positional encoding denoting the order of the tokens, we add after each transformer block a learnable linear encoding of the query frequency to the tokens.The decoder consists of three blocks with a depth of three and a token dimensionality of 99.After passing through the decoder, we separately perform a linear layer on the tokens, mapping from 99 values to one prediction for the query frequency (see Figure 2).
Notably, the transformer architectures are capable of processing parameters of differently configured multimass oscillators, that is, with different numbers of masses with exactly the same architecture.This is possible, because the transformer F I G U R E 2 Query-based transformer architecture.The parameters for mass ([]), spring ([]), and damping ([]) are separately embedded with linear layers.Then, the embedded parameters are processed with a transformer.Only the tokens resulting from the mass parameters are then processed in a second transformer with a positional encoding generated from the query frequency.The processed mass tokens are then finally mapped onto a predicted frequency response.architecture can process arbitrary numbers of tokens.The specific configurations of the candidate architectures are obtained by choosing hyperparameters that yield the lowest test error.

Training
The network is trained using the AdamW optimizer [10] with  = [0.9,0.95].We apply a weight decay with a value of 5e-3.As a loss function, we employ the mean-squared error on the amplitude.We further choose a cosine learning rate schedule with a warm-up, which is known for its robustness to hyperparameters and good convergence properties [11].In a direct comparison, cosine learning rate scheduling led to an order of magnitude smaller validation loss than a constant learning rate (0.045 * 1e-3 vs. 0.415 * 1e-3).The initial learning rate, batch size, and number of epochs vary for the different architectures and are chosen to minimize the validation loss.

RESULTS
In this section, we present the results of the proposed deep learning models to predict the FRFs.We consider the four degrees of freedom multimass oscillator as described in Section 2 and chose an experimental setup as described in Table 1.
In total, the input space  ∈ ℝ 14 consists of 14 input parameters, four masses, five damping elements, and five spring elements, respectively.The parameters are sampled uniformly in the given interval and a training dataset of 4000 samples and a validation and a test dataset with 1000 samples, respectively, are constructed.For each sample point in the design space, the FRF at 200 equidistant steps on a log scale is computed.
To evaluate the accuracy of the neural network prediction, we evaluate the root mean square error at each frequency on the test dataset.This error will be referred to as test error in the following.
TA B L E 1 Design space variation: the mass, damping, and stiffness parameters are varied uniformly in the given interval.For each point in the design space, the system response is calculated at 200 steps in a logspace:  = logspace(−1.5,1, 200).

Design space
Test error of the grid and query approach over the frequency domain.The mean as well as the 90% interval is plotted.(B)-(E) Worst prediction of the deep learning model on the test set for all masses.

Query-based MLP versus grid-based MLP
We start by comparing a query-based MLP architecture against a fixed-grid MLP architecture.To this end, we compare the test error of both architectures.The mean value and the interval bounded by the 5% and 95% quantiles (called 90% interval in the following) over the frequency domain are depicted in Figure 3(A).For both models, the error is the largest in the frequency range where the resonance peaks occur due to parameter variations.Overall, the query-based approach yields more accurate predictions.This might be explained by the fact that the query-based architecture can use all its capacity to learn a single output, whereas the grid-based architecture has to predict all discretized frequency steps at once.Moreover, Figures 3(B)-(C) show the worst prediction (measured in the highest test error) for the frequency response of  1 .In Figures 3(D)-(E), the corresponding plot for the degrees of freedom  2 ,  3 , and  4 is shown.Again, we observe a higher accuracy achieved by the query-based approach.

Query-based MLP versus query-based transformer
In this experiment, we investigate the applicability of a transformer architecture for the prediction of FRFs of the multimass oscillator.Therefore, we train the transformer on the same dataset as used in the last section and compare the test error with that of the MLP architecture.Following the results from Section 4.1, we only investigate a query-based approach.In Figure 4, the test error as well as the worst predictions on the test dataset are depicted.The transformer architecture clearly outperforms the MLP architecture in terms of accuracy.We point out that the transformer architecture has even less parameters than the MLP (574k vs. 642k).However, training the transformer is

Predicting different multimass oscillator with a single transformer
A strength of the transformer architecture is the capability to process sequences of arbitrary size.This feature makes it possible to train a single transformer network on multiple multimass oscillators with different numbers of masses.In Figure 5(A), we show the test error from training a query-based transformer to predict the frequency response of four different multimass oscillators at once.As expected, the test error increases with the number of masses, because the number of resonance peaks increase and the dynamical behavior becomes more complex.Figures 5(B)-(C) depict the worst prediction of the FRF of  1 for all mass configurations.The results are promising and open up new avenues for research, for example: Can a transformer generalize to unseen mass configurations?

CONCLUSION
In this contribution, we apply data-driven deep learning methods to predict the frequency response of a dynamical benchmark model.Neural networks are trained to predict the FRFs of all degrees of freedom in a single model.Our investigations show that a query-based incorporation of the frequency parameter yields a smaller error than the prediction on a fixedfrequency grid.In addition, we apply the transformer architecture to the dynamical benchmark model.The transformer architecture can further reduce the prediction error compared to a standard MLP architecture, although the number of trainable parameters is of a similar order.Moreover, the transformer architecture is capable of handling a different num- ber of masses in a single model.This is a remarkable feature because the characteristics of the FRFs change significantly if the number of masses changes, for example, in number of resonance and antiresonance peaks.

Limitations and future work
As a next step, we will increase the complexity of the physical model.This includes continuous models on simple geometries, where analytical solutions exist or continuous models on complex domains, where an FEM approximation is necessary.Switching to more complex models will also necessitate more complex input design spaces.These design spaces then parameterize geometry modifications, such as stiffening bead patterns or acoustic measures, such as spatially distributed damping material.Since data-driven deep learning relies heavily on a large amount of training data, we will also investigate physics-informed deep learning techniques in the future.

A C K N O W L E D G M E N T S
1 (A) Schematic description of the four degrees of freedom multimass oscillator and (B) frequency response function for all four degrees of freedom.The system is simulated for nominal parameter   = 5,   = 0.05, and   = 1 for  = 1, … , 4.
U R E 4 (A) Test error of the query MLP and query transformer architecture over the frequency domain.The mean as well as the 90% interval is plotted.(B)-(E) Worst prediction of the deep learning models on the test set for all masses.computationally more expensive, as training for one epoch takes four times as long (approx.4 s vs. 1 s with an Nvidia GeForce RTX 2080 Ti).
5 (A) Test error (rmse) of the transformer architecture for the prediction of different degrees of freedom multimass oscillators.(B)-(C) Worst prediction on the test dataset for all mass configurations.