An evaluation of noise reduction algorithms for particle-based fluid simulations in multi-scale applications

Filtering of particle-based simulation data can lead to reduced computational costs and enable more eﬃcient information transfer in multi-scale modelling. This paper compares the effectiveness of various signal processing methods to reduce numerical noise and capture the structures of nano-ﬂow systems. In addition, a novel combination of these algorithms is introduced, showing the potential of hybrid strategies to improve further the de-noising performance for time-dependent measurements. The methods were tested on velocity and density ﬁelds, obtained from simulations performed with molecular dynamics and dissipative particle dynamics. Comparisons between the algorithms are given in terms of performance, quality of the results and sensitivity to the choice of input parameters. The results provide useful insights on strategies for the analysis of particle-based data and the reduction of computational costs in obtaining ensemble solutions.


Introduction
Numerical simulation is an essential tool for gaining a better understanding of many physical phenomena that can be difficult to describe with analytical methods or experimental studies. The statistical mechanics of complex systems is often analysed with molecular dynamics (MD) [1], Monte Carlo methods, e.g. direct simulation Monte Carlo (DSMC) [2] or dissipative particle dynamics (DPD) [3]; a comprehensive summary of all the modelling strategies can be found in Karniadakis et al. [4]. These procedures can be used to resolve accurately the dynamics at atomistic, meso-and micro-scales and are widely used to simulate nano/micro flows confined in channels such as carbon nanotubes [5,6]. In addition, information obtained from molecular simulations forms the basis of new and emerging hybrid multi-scale modelling methods for physical and biological applications (see [7] for a review). Examples demonstrating the ubiquity of multi-scale, multi-physics applications include the dynamics of complex fluid flows [8], the classical turbulence problem [9], meteorological predictions [10], chemical and biological reactions [11]. Moreover, there is significant potential to apply multi-scale techniques to sociological problems, such as crowd and traffic flow [12].
The central problems with all particle-based and multi-scale modelling are the computational cost of the simulations and the accurate measurement and transfer of information across disparate length and time scales; there currently exist many sources of uncertainty and noise disturbing this intra-scale transfer, with an associated loss of simulation fidelity. Circumventing this problem often requires large samples and long averaging periods, resulting in computationally expensive calculations.
The objective of this paper is to investigate the capabilities of a number of mathematical algorithms introduced in the literature to assist noise reduction in particle-based modelling. A number of benchmark fluid flow problems, performed with molecular dynamics and dissipative particle dynamics, are used to investigate the usefulness of the considered methods and provide guidelines on how the algorithms can be successfully applied. The main focus of this paper is on novel procedures that provide rapid, adaptive, noise-free coarse-graining of micro-scale phenomena, and can further be employed in molecular-continuum simulations.
In this paper, new algorithms are proposed that combine the strengths of proper orthogonal decomposition (POD) with other techniques, hereafter referred to as POD+ methods, in order to achieve better efficiency in processing time-dependent fields. This work directly tackles the important challenge of extracting information from the data without significant additional computational cost.
The paper is organised as follows: the basic theory for the methods is described in Sec. 2. A comparison of the performance of each technique in de-noising particle-based simulations is presented in Sec. 3, followed by remarks and recommendations for the use of the methods under investigation.

Theoretical background
In the following section we briefly review the numerical methods employed. First, we discuss algorithms based on singular value decomposition (or eigenvalue decomposition) and QR decomposition. The second part of the review focuses on strategies using wavelet transforms, wavelet thresholding and the WienerChop filter [13]. We also discuss the application of empirical mode decomposition to noise reduction. Novel couplings of proper orthogonal decomposition to these algorithms are introduced at the end of the Section.

Proper orthogonal decomposition
Define an element A(τ s , x) of the real N × M matrix as a measurement from the x-th probe taken at the τ s -th time instant. Proper orthogonal decomposition can be done either by eigenvalue decomposition (EVD) of the symmetric matrix 1 or by singular value decomposition of A: where, in the case of full SVD, U is an N × N orthogonal matrix, V is an M × M orthogonal matrix, and is an N × M diagonal matrix. Columns of U and V are left and right singular vectors, respectively. The diagonal entries of , called singular values, are the square roots of eigenvalues, s n = √ λ n for s 1 ≥ s 2 ≥ . . . ≥ s n ≥ 0, where λ n is the n-th eigenvalue of the diagonal matrix.
If A is a collection of measurements corrupted by additive noise, it can be represented in the form A t is a matrix that contains the true signals, and B denotes the noise. Given the decomposition in Eq. (1), the rank-k approximation of A can be written in vector form as where 1 ≤ k ≤ min(N, M), u n and v n are the orthonormal (temporal or spatial) POD modes corresponding to the n-th columns of the matrices U and V , respectively. The key property of POD is that A k is optimal in the sense that where . F is the Frobenius norm. In this paper, the rank k is referred to as the number of dominant modes. A practical implementation of POD based on time-windows (WPOD) described by Grinberg [14], can be used for particle-based simulations. The main challenge in estimation of A t is the choice of the truncation parameter. One possible approach for defining the rank k is to select a cumulative percentage of total variation which modes should contribute. Unfortunately, such cut-off is often insufficient, as important aspects of the observables can be present in the direction of low variance modes. Therefore, in addition to studying the energy content of eigenvalues, most of the reported data is analysed here with the log-eigenvalue diagram (LEV), log 10 (λ n = s 2 n ), based on the assumption that if higher singular vectors represent uncorrelated noise, then the corresponding λ n should decay exponentially with increasing n [15]. We also considered the smoothness of the temporal modes [14] as the ordering of eigenvalues also corresponds to an ordering of the vectors u n (for A(τ s , x)) from low to high frequency. In some cases, the choice of k is verified with a recently proposed singular value hard threshold (SVHT) [16]; the principal algorithm was evaluated based on the asymptotic mean square error for matrix sizes larger than the underlying rank.

Singular spectrum analysis (SSA)
While proper orthogonal decomposition can successfully extract the signal from its noise-contaminated measurements, when applied to data that is temporally (or spatially) steady, it appears to be no more efficient in separating unwanted components from the mean ensemble than statistical averaging. In such cases, the SSA of Broomhead and King [17,18] can be applied. The method has been widely used in the filtering and forecasting of climatic, meteorological, geophysical, and electrical data series.
The basic scheme of SSA is described in [19]. The algorithm consists of four main steps (two for the decomposition stage, and two for the reconstruction): embedding, SVD, eigentriple grouping, and diagonal averaging. Instead of the matrix A ∈ R N×M we consider one of the vectors forming its column, X ∈ R M . In the embedding stage, the series is broken into a sequence of lagged vectors of size L by sliding In the second key step of the decomposition process, the trajectory matrix is subject to SVD forming a collection of ele- where r is the rank of Y . In the following step, a low-dimensional approximation, Y = n∈I n s n u n v † n is formed, where I n denotes grouped subsets, e.g. I n = {2, . . . , 5, 10}. For simplicity, in this work, SSA approximation is formed for I n = {1, . . . ,k}, similar to POD analysis. The reconstructed matrix does not exhibit the same elements along its descending diagonals. The subsequent averaging over its anti-diagonals replaces these differing entries and yields a new series X = x 1 , . . . ,x M with reduced noise.
The window length, L, is the only parameter to be determined prior to the SSA. Its choice can result in a weaker (or stronger) separability between information and noise, influencing the effectiveness of the signal reconstruction process. There is no universal rule regarding the optimal value of L, but several principles have been described by Golyandina and Zhigljavsky [19]. In general, smaller windows lengths are recommended for detailed trends, while relatively large values of L are preferred for simple profiles. 2 In addition, the window should not be greater than half of the length of the analysed series, L ≤ M/2 [20]. If there is a known periodic component in the processed data, then L should be chosen to be proportional to that period. For long signals and large values of L, the method is computationally very intensive.

Random QR de-noising (u/rQRd)
Random sampling has received considerable attention as an alternative dimensionality reduction tool which is significantly less expensive than SVD or EVD [21][22][23]. In the recent paper by Chiron et al. [24], a new method based on random sampling has been described which, similar to SSA, seeks a low-rank approximation of the trajectory matrix.
In random QR de-noising, a matrix Y , containing most of the significant information of the trajectory matrix Y ∈ R L×K , is obtained by calculating the product of Y and a set of random vectors stored in a matrix ∈ R K ×Z : As the number of vectors Z ≤ L, and L ≤ K , the matrix Y is smaller in size than the trajectory matrix. A factorisation Y = Q R is performed in order to construct a projection of Y onto the reduced rank orthonormal basis Q , where Y has a rank equal to Z . A de-noised time series X is then obtained by diagonal averaging. An approximation error for such a procedure satisfies Y − Y 2 ≤ 1 + 9 √ Z · √ L s k+1 , with probability of at least (1 − 3ζ −ζ ), where ζ = Z − k is the oversampling parameter, s k+1 is the (k + 1) greatest singular value of Y [22], and . 2 denotes the L 2 operator norm. The recommended action is to have a small ζ ; however, in real situations, the number of components k is unknown a priori.
The solution given by random QR de-noising, rQRd, may be less optimal than the approximation constructed with SVD, but is obtained much faster, particularly for large data-sets [24]. An improved algorithm for very long signals, called uncoiled random QR de-noising (urQRd) [24], based on fast Fourier transforms (FFT) is also used in this article.

Signal processing with wavelet transforms
The orthogonal wavelet transform uses a shifted and scaled mother wavelet, ψ , to form a basis (children) of where j, m ∈ Z represent resolution and translation, respectively, and ψ satisfies the admissibility condition, [25,26]. To ensure that the integral is finite, ψ(0) = +∞ −∞ ψ(t)dt = 0. The wavelet is continuously differentiable, i.e. ψ has sufficient time decay, which implies smoothness, +∞ −∞ (1 + |t|) |ψ(t)| dt < +∞. By enforcing the vanishing of higher order moments, +∞ −∞ t b ψ(t)dt = 0, polynomials up to degree b are reproduced. A complement in L 2 (R) of the linear space spanned by the wavelets is determined by the translations and dilations of a scaling function, φ j, Decomposition of a signal f (t) can be written as where where J is the maximum number of decompositions and j ≤ j 0 .
Through multiresolution theory, Mallat [27] and Meyer [28] established a link between an orthogonal wavelet basis and the conjugate mirror filters that describe the interaction between wavelet and scaling function. This allows for the fast implementation of a discrete wavelet transform (DWT), where the signal is convolved with the low-pass and high-pass filters, followed by a dyadic decimation, after which the procedure is recursively repeated on the decimated low-pass output.
To overcome the lack of translation-invariance of the DWT, the stationary wavelet transform (SWT) was introduced [29]. In this strategy, an undecimated transform is performed with upsampled filter coefficients at each level of the decomposition. The scheme contains redundancy which is seen as advantageous in de-noising, but increases the computational We note that the SWT is closely related to cycle-spinning [30].

Wavelet thresholding
Motivated by the idea that a wavelet transform provides a sparse representation of data, estimate functions are obtained by inverting thresholded coefficients whose magnitude exceed an estimated noise level. Donoho and Johnstone [31] have proposed simple procedures for recovering information from noisy data, namely hard and soft thresholding. A non-linear hard threshold estimator is defined by where 1 and sgn are the indicator and sign functions, respectively, and T u is a threshold value. In this paper, to avoid over-smoothing [25], the first approach is applied together with a universal threshold: The white noise level estimate is defined as σ n = M AD/0.6745, with MAD being the median absolute deviation of the fine scale wavelet coefficients [32]. As most of the simulation results reported later are corrupted by correlated noise, a level-dependent variance estimation introduced by Johnstone and Silverman [33] is used in our paper.

WienerChop
Wiener [34] formulated an optimal estimation analysis of time series which has been used in a range of applications, such as signal detection and noise reduction [35]. However, the main practical problem in the implementation of a Wiener filter is that a desired signal needs to be known a priori. Ghael et al. [13] proposed a straightforward estimate of the signal and noise by using wavelet transforms. Wavelet-based empirical Wiener filtering (referred to as WienerShrink or WienerChop) performs two wavelet transforms in order to estimate the filter parameters and perform de-noising. In the wavelet domain, a noisy signal The goal is to estimate the true signal wavelet coefficients, f t w , from the noisy observation, f w . An approximation, f t w 1 , of the signal's coefficients is obtained in the wavelet space by thresholding the wavelet coefficients, The noise level, σ n , is calculated from the finest details [32], and an inverse transform is applied to the data, f t (t) = I W T 1 (f t w 1 ). A second transform is then performed, and the estimation of the signal in W T 2 with the noise variance is used to construct the wavelet-based Wiener filter: In Eq. (8), the subscript w 21 indicates that W T 2 was applied to the signal estimate obtained from W T 1 , f t w 21 = W T 2 (f t (t)). The W T 2 of the original signal, f w 2 = W T 2 ( f (t)), allows to filter the coefficients with WienerChop. After de-noising, the inverse transform is applied to recover the final estimation, f t (t). The rationale behind applying Wiener filtering to wavelet coefficients arises from the fact that the wavelet transform decorrelates data [36]. Assuming perfect decorrelation of noisy coefficients, the filter is optimal in the sense of minimising the mean squared error.

Empirical mode decomposition
Huang et al. [37] pioneered a nonlinear technique, referred to as empirical mode decomposition, for adaptively decomposing an unsteady signal into a finite sum of zero-mean amplitude modulated and frequency modulated elements, called intrinsic mode functions (IMFs). The method is adaptive, with the basis of the decomposition purely data-driven. Empirical mode decomposition looks at the evolution of a signal, f (t), between two consecutive extrema, e.g. two minima occurring at t 1 − and t 2 − , and defines a local high-frequency part (detail), h(t), and a low-frequency element (trend), r(t) [38,39], The algorithm of EMD can be summarised in the following steps.
Step 2: Obtain the envelopes, e min (t) and e max (t), by interpolating between t − and t + , respectively.
Step 3: Compute the mean of the two envelopes, m 1 Step 4: Extract the detail, Examine whether the residual, h 1 1 (t), satisfies the definition of IMF according to a stopping criterion. If not, repeat n s -times Step 2 → Step 5 until the conditions are met (sifting process). Then: The first IMF is found, The necessary conditions for existence of intrinsic mode functions (in Step 5) are that the functions are symmetric with respect to the local zero mean, and have the same numbers (or differing at most by one) of zero crossings and extrema. Flandrin et al. [40] proposed that the average number of zero-crossings in a mode is a meaningful way of defining its mean frequency. The original signal can be recovered by Multi-dimensional versions of the empirical mode decomposition is still an active research area [41,42].
In this work, a wavelet-inspired thresholding of IMFs proposed by Kopsinis and McLaughlin [43] is used. In the procedure referred to as EMD interval thresholding (EMD-IT), the intervals between zero-crossings in each function are analysed to determine whether they are noise-or signal-dominant, based on the single extrema that corresponds to this interval. In the paper, multiples of the universal threshold are applied, i.e. T multi × T u (see Eq. 7 in [43]) with the noise estimation obtained from the energy of the first IMF.

POD+
We propose an improvement in processing non-stationary fields by combining POD with all the algorithms described in this paper. Filtering procedures, e.g. wavelet thresholding, are applied within the domain of SVD to decrease the number of observations required for clean data recovery, while preserving dimensionality reduction. These hybrid strategies are called POD+ techniques, such as POD + SSA; POD + wavelet thresholding is referred to as WAVinPOD (or SWAVinPOD if based on SWT) and has been first introduced by Zimoń et al. [44]. In cases where N M, the left singular vectors arising in the POD+ techniques are left unchanged to not weaken the orthogonality property of the shorter modes. However, for less thin matrices, modifying both sets of vectors can add to the performance. For large data sizes, the results obtained with POD and POD+ methods converge and improved filtering can be obtained by processing temporal and spatial modes [44]; this is shown in Fig. 1, where error in L 2 norm, δ 2 = A t − A k 2 / A t 2 of different approximations is plotted for synthetically generated signals.
The following steps summarise all POD+ procedures.
Step 1: Perform SVD on the matrix of data A.
Step 2: Define the number k and set s n = 0 for n > k.
Step 3: Perform additional filtering with e.g. wavelet thresholding of the retained k modes.
Step 4: Reconstruct the de-noised data according to Eq. (1). The methods can be applied during simulations run by following the strategy described by Grinberg [14].

Simulation results
In this section, we report the results of applying the filtering methods to data from several typical particle-based flow problems. The aim is to investigate the benefits of using de-noising techniques for filtering simulation measurements. Performance of POD+ is compared with POD for extracting information from molecular simulations [14] and, where applicable, with widely used statistical averaging. Unless specified, the following input parameters were used for the filtering methods; the oversampling coefficient for the rQRd procedure was kept constant, ζ = Z − k = 2, while the number of dominant modes, k, for the SSA and POD analysis was established based on the tests discussed in Sec. 2.1.1. De-noising with empirical mode decomposition was performed with n s = 7 sifting procedures and the hard thresholding approach. Following Donoho and Johnstone [31], wavelet de-noising was performed using the Symlets filter; for both wavelet transforms sym2 was employed. For all MD results, the wavelet-based algorithms used hard thresholding with level-dependent variance estimation [33] for coloured noise 3 (as opposed to a single value established from the finest details); only in the case of DPD simulations, due to coarse-graining and the use of local random thermostats, we assumed that the data was corrupted with white noise rather than correlated fluctuations. For estimating the coefficients of WienerChop, we performed a decimated wavelet transform with the same parameters. The second filter for W T 2 in the WienerChop algorithm contained double the number of vanishing moments of the W T 1 , i.e. sym4, and the frequency resolution was one level higher to aid the filtering. For MD data containing correlated noise, we introduced our modification of WienerChop algorithm, by employing level-dependent thresholding.
The quality of the extracted information was verified with an averaged signal-to-noise ratio (SNR); for a de-noised matrix A k ∈ R N,M and a set of true signals A t ∈ R N,M , SNR is calculated as In order to relate the results to an initial noise level, we introduce the overall dimensionless gain in signal-to-noise ratio: where SNR noisy and SNR approx are the SNR values of the original corrupted signal and de-noised data, respectively. In addition, the processing time required to obtain a de-noised solution was assessed by recording the internal time at execution of each procedure. We note that this measure was biased as it depended on the implementation of the algorithms and the operational system used. For clarity, we report the most successful application of POD+ for each case (based on SNR values or visual quality) with recommendations for treating particle data resembling the analysed problems. In many cases, we limited the use of wavelet transform only to a decimated strategy, DWT, which is computationally the least expensive.

Simulation description
The molecular dynamics simulations were carried out using the open-source mdFOAM solver, built in OpenFOAM [45][46][47]. Modelling with DPD was performed with DL_MESO [48]. As a first test case, we considered molecular dynamics simulations of unsteady argon flow through a converging-diverging nanochannel, as shown in Fig. 2. The channel had a length L x = 68 nm in the flow direction, a depth of L z = 5.44 nm, and channel height varying between 3.4 nm and 3.04 nm as shown in Fig. 2(a). More details on this case can be found in Borg et al. [47]. We applied three types of body force to the argon molecules with the same magnitude F = 0.487 pN to achieve various time-scale separated flow conditions ( Fig. 2(b)-(d)), to which statistical averaging would be difficult: 1. Harmonically pulsating flow. An oscillation force with time period t = 108 ps was used. In addition to liquid argon, a water flow case was also considered: 4. Water flow through a carbon nanotube (CNT). Each water molecule inside the single-wall CNT (see Fig. 3) was subject to a harmonically oscillating force with an amplitude of F = 0.034 pN and period of t = 0.199 ns. We used the rigid TIP4P/2005 water model described by Abascal and Vega [49]. The Lennard-Jones parameters of the water-carbon interactions for the CNT were taken from Werder et al. [50].  A DPD simulation was also performed to show versatility of the considered algorithms: 5. Phase separation phenomenon. A DPD simulation of a solvent mixture of oil, water and surfactant was undertaken. The system consisted of 192000 DPD beads, made up of three bead types: water, hydrophobic oil group and hydrophilic head group. The oil molecules were made of three bonded oil beads, the surfactant molecules were of three bonded beads, consisting of two oil beads followed by a head bead, and each water molecule was represented by a single non-bonded bead. For both oil and surfactant molecules the bonds consisted of harmonic potential on the bond length. A snapshot of the simulation system is presented in Fig. 4. In all the problems considered, the spatial distribution of the observables was calculated using the binning method [51].

Converging-diverging channel
The first three simulations produced sets of velocity profiles varying in both time and space. Such measurements impose restrictions on the length of time and the number of spatial micro-elements (bins) over which the atomistic data can be averaged. One way of enhancing the processing of non-stationary fields is to replicate the computational domain with identical flow conditions which is an inherently expensive solution. It is also possible to perform phase averaging, if the flow exhibits a limit cycle, and integrate over a large number of repeating periods of oscillation. However, constructing the results based on a number of realisations, N r , improves the accuracy only by a factor of √ N r [52]. To overcome this difficulty, Grinberg [14] introduced the window proper orthogonal decomposition which was also employed by Borg et al. [47] to extract smoother distributions (see Fig. 7 in [47]). Therefore, we compared the performance of POD+ with the results obtained by applying POD alone.
Harmonically pulsating flow. The velocity profiles generated with simulation no. 1 formed a data-set consisting of 200 spatial measurements over 160000 time samples; this was equivalent to 864 ps of real time. Calculations were performed on the matrix A(x, τ s ) with N = 200 signals of length M = 160000. Proper orthogonal decomposition over the entire set recovered mostly low frequency signals which described well the dynamics of the system. Fig. 5 shows how the POD approximation with k = 3 compared with the original noisy signal. The previously described tests were applied to recover the optimal rank of the matrix. As the velocity was changing over time and across the domain, statistical averaging could not be applied without loss of information. We assumed that the POD solution for N = 200 and M = 160000 was the targeted result and used it as A t (Eq. (9)) for further analysis. To reduce the cost of modelling the flow in the channel, we examined if similar velocity profiles could have been obtained but with only N = 20 micro-elements spread across the domain (10 times fewer); we also analysed how POD and POD+ techniques performed with smaller time-windows over which the filtering was performed. The shorter the time after which an optimal approximation can be obtained, reduces the communication bottleneck in multi-scale modelling. Figs. 6(a) and 6(b) compare POD and WAVinPOD for N = 20, showing that the latter was more efficient in extracting information from a smaller noisy data-set with an average SNR = 9.24 dB. The reconstructed profile from the first bin obtained with WAVinPOD had SNR = 34.07 dB and Gain ≈ 3, almost twice as much as POD, for which SNR = 17.44 dB. It took WAVinPOD t ≈ 0.48 s to produce comparable results to information extracted with POD from the 10× larger system in t ≈ 7.82 s. Having a well predicted velocity distribution from a smaller number of bins or micro-elements also results in a less expensive simulation. Applying wavelet thresholding alone takes double the time of WAVinPOD and provides only an average Gain = 1.5. Comparably good approximations to WAVinPOD were obtained with other POD+ methods. However, as expected POD+EMD-IT and POD+SSA were the most expensive. In addition, for the empirical mode decomposition a larger multiple of the universal threshold T multi had to be applied. As there is no general rule for determining an optimal value of T multi , the threshold was chosen based on experience. All the measures are summarised in the Fig. 7(a), where the overall Gain in SNR is plotted against the processing time. To show how much improvement is achieved by combining the algorithms with POD, we report the results of applying the methods alone in Fig. 7(b).   Fig. 7. Comparison of all the methods applied to noisy velocity data from the simulation of a periodically oscillating flow. Note that SSA is not listed as the processing time was too high.
Additional analysis examined how much improvement could be obtained by applying the WAVinPOD method in comparison to POD for smaller moving time-windows during the simulation run (similar to WPOD [14]). Each method was applied to a velocity matrix formed from N = 20 spatial measurements and data length divided into M ∈ {50, 100, 200, 400} blocks, i.e. the shortest time-length consisted of M = 160000/400 = 400 instances. The window moved until the profiles of the full length were reconstructed. Fig. 8 compares the averaged gain in SNR recovered with WAVinPOD and POD for the same input parameters as used before 4 and k = 2 (smaller number of modes was needed for decreased data size). The hybrid method outperformed POD allowing for a much higher gain in SNR with only t = 2 s of processing time for the biggest window size. all the bins resulted in unwanted over-smoothing, losing the step-change present in the instantaneous data (see Fig. 9(b)). Small, abrupt changes in the time-dependent flow state can be much less energetic than the long-time trend and might not be represented by the first POD eigenmode (for non mean-centred data). The change in the mass flow rate in Fig. 9(a) had only 0.06% of the energy, while the first eigenvalue representing the mean contained over 90%. Therefore, applying POD analysis can outperform averaging in terms of information extraction. In the case of unsteady simulations, further improvement in filtering high frequencies is provided with POD+ methods. The techniques were applied to velocity measurements with amplitude varying along the channel. For the large system, all methods generated comparable outputs (only visually smoother). However, when less spatial sampling was available, POD clearly started recovering more noise, burying small scale dynamics. In order to extract the acoustic response from the velocity measurements, k = 10 modes were needed from a smaller ensemble with N = 20. Figs. 10(a) and 10(b) compare the velocity profiles from the first and 5th bin filtered with POD and WAVinPOD with the same input parameters as in the previous simulation. Clearly, the latter performed better producing an approximation with SNR = 37.93 dB (the whole data-set had Gain = 1.04), while POD obtained SNR = 24.43 dB with the overall Gain = 0.29. This example showed that POD+ methods allow for retaining small changes in the flow, which can otherwise be lost in POD analysis or statistical averaging. Mixed period flow. The last simulation of the converging/diverging channel generated M = 920000 time samples equivalent to t = 4969 ps. No averaging was applied prior to POD. The de-noised profile, reconstructed with only k = 3 modes, is plotted with the original velocity distribution in Fig. 11(a). Again POD successfully separated unwanted fluctuations from the correlated flow of particles. It should be stressed that POD does not require any a priori information regarding the nature of oscillations. All the methods were tested on the original signal length but with only N = 20 bins. Apart from the window size for SSA and urQRd, L = M/2000 = 460, all the other input parameters were kept the same as in previous simulations.
For calculation of SNRs, the POD approximation obtained from the full matrix was used as A t . It should be noted that it was a biased estimation as POD+ algorithms, even for smaller systems, were returning smoother profiles than POD. Fig. 11(b) presents how well the dynamics of the system were recovered with WAVinPOD; the two velocity profiles show acceleration of the flow in the first half of the channel. Recovered SNR = 29.16 dB, translating to Gain ≈ 1.4, was higher than for POD, SNR = 21.64 dB (Gain = 0.75); processing time of WAVinPOD, although 1.5× slower than POD, was only t = 2.72 s. Again WAVinPOD obtained similar information for smaller data-set as POD with N = 200; processing the full size matrix took POD 20× longer. The results are summarised in Fig. 12. In addition, Fig. 13 shows that 1.5× higher Gain was achieved with WAVinPOD than POD constructed from shorter signal portions.

Water flow through a carbon nanotube
Molecular dynamics can be applied to many real-life problems, e.g. for studying water flow in nanostructured membranes. However, such modelling is quite challenging; the system requires small time-steps and the simulations often contain substantial noise which is computationally demanding to reduce. In order to assess how de-noising techniques can improve the analysis of such data, we performed an additional study on velocity measurements from oscillating water flow through a CNT. The initial matrix contained M = 4000 observations and N = 512 velocity measurements at each time-step. There were 4 complete oscillations in the ensemble, which were used to obtain the mean solution. Noisy velocity distribution from the 10th bin is shown in Fig. 14(a), while Fig. 14(b) and 14(c) compare the quality of POD and POD+ results. The latter clearly extracted smoother profiles. Calculating a cumulative mean (the average of 4 cycles, M = 4000) did not reduce the noise as well as POD from only M = 1000 velocity measurements (see Fig. 15(a)). Further improvement was achieved with POD+ techniques which removed the remaining fluctuations as shown in Fig. 15(c). To obtain a comparable  level of de-noising with statistical averaging, much more data would have to be collected, increasing the computational cost.
In this case, k = 7 modes were used to extract the velocity field after performing previously introduced analysis.

Phase separation phenomenon
In the last simulation, as the system was evolving from a mixed state until visible separation was achieved, statistical averaging could not be used to analyse how the particle distribution was changing over time. At first, we applied POD to 3 density fields of size 128 × 128 at instances τ s = {1, 50, 100} to extract density contour plots. Visible improvement in smoothness was achieved with POD+ methods; the contours of approximations obtained with SWAVinPOD are compared with POD in Fig. 16, and additional results generated with POD + SSA, POD + EMD-IT and POD + rQRd are shown in Fig. 17 for completeness. In this case, filtering was performed on both left and right singular vectors as no significant aliases were expected for N = M. Data size was small, therefore only J = 4 levels of decomposition were performed for SWAVinPOD. In the case of EMD-IT, the threshold multiplier was similar to the CNT simulation, T multi = 1. Dominant modes for POD and SSA were automatically established using previously described SVHT analysis as the noise present in DPD simulations was assumed to be less correlated than in MD modelling. Coarse-graining in DPD reduces the number of degrees of freedom for the particles, neglecting some of the atomistic details that are captured in MD simulations. Modelling a number of MD particles as one DPD bead results in fewer statistically dependent measurements, which should be easier to process for POD and POD+ methods. Moreover, in DPD the temperature is controlled locally (unlike commonly adopted Berendsen  thermostat used in MD simulations) with the use of random and dissipative forces. The variable that defines the strength of the random force produces a Gaussian distribution. This stochastic local thermostat plays a role in relaxing the correlations in the data and enables a more efficient de-noising than in the case of a globally thermostatted system.

Conclusions and remarks
The main goal of this paper was to review, develop, and evaluate new methods for solving the problem of noise reduction in computational nanofluidics. The filtering tools were studied and applied to a wide range of numerical results. The fluid flows under consideration were modelled separately via particle-based simulations using MD and DPD. Novel procedures were also proposed that outperform the other techniques in extracting significant information from instantaneous measurements. This paper has shown that applying sophisticated de-noising tools to particle-based simulations can reduce the computational time and memory required to obtain acceptable ensemble solutions.
All the noise-reduction methods offer a balance between different properties. Rather than choosing a universal approach, this paper aimed to give a comparative overview and guidelines on how to benefit from each procedure. Furthermore, to improve certain common weaknesses (e.g. computational intensity) all of the algorithms have been combined with POD. In our proposed POD+ methods, additional de-noising is performed on the dominant modes of SVD to enable more efficient filtering of unwanted frequencies, which would not be possible with POD alone for the same number of observations. Each coupling provides different benefits; the common feature is that the POD+ approaches are fast, and more successful in recovering signals buried in noise than when the techniques are applied separately. The hybrid of POD and wavelet-based thresholding appeared to be the most flexible and efficient.