On the extraction of instantaneous frequencies from ridges in time-frequency representations of signals

The extraction of oscillatory components and their properties from different time-frequency representations, such as windowed Fourier transform and wavelet transform, is an important topic in signal processing. The first step in this procedure is to extract the ridge curve: a sequence of amplitude peak positions (ridge points), corresponding to the component of interest. This is not a trivial issue, and the optimal method for extraction is still not settled or agreed. We discuss and develop procedures that can be used for this task and compare their performance, based on both simulated and real data. Additionally, we propose a fast algorithm for optimization of a path functional of particular form over all possible ridge sequences. By achieving this in O(N) steps, it runs much faster than the computationally expensive simulated annealing used previously. Finally, we investigate the advantages and drawbacks that synchrosqueezing offers in relation to curve extraction. The codes used in this work are freely available for download.


I. INTRODUCTION
Separation of amplitude and frequency-modulated components (AM/FM components) in a given signal, and estimation of their instantaneous characteristics, is a classical problem of signal analysis. It can be approached by projecting the signal onto the time-frequency plane, on which the changes of its spectral content can be followed in time. Such projections are called time-frequency representations (TFRs). Typical examples are the windowed Fourier transform (WFT) and the wavelet transform (WT). If the construction of the TFR is well-matched to the signal's structure, then each AM/FM component will appear as a "curve" in the time-frequency plane, formed by a unique sequence of TFR amplitude peaks -ridge points. By extracting these curves, i.e. finding the appropriate peak sequences, one can recover the time-varying characteristics of the corresponding components (such as amplitude, phase and instantaneous frequency), an idea that was first expressed in [1].
Methods for the reconstruction of the component's characteristics from its ridge curve are discussed in [2], and their performance is compared in [3]. However, to estimate the component's parameters one first needs to extract the corresponding ridge curve. This is not a trivial issue, since in real cases there are often many peaks in the TFR amplitude at each time, and their number often varies. In such circumstances it can be unclear which peak corresponds to which component, and which are just noise-induced artifacts. In the present paper, we concentrate solely on the problem of curve extraction, i.e. finding an appropriate sequence of amplitude peaks in a given TFR.
Identification of the ridge curve represents an inseparable part of ridge analysis, which is widely used for e.g. machine fault diagnosis [4], fringe pattern analysis [5], studies of cardiovascular dynamics [6] and system classification [7], [8].
Although curve extraction has been addressed explicitly in the past [6], [9], [10], [11], [12], there seems to be no agreement as to the optimal procedure to be used for this task. Here we discuss and generalize some existing algorithms, present new ones, and compare their performance; the effects of the recently proposed synchrosqueezing [13], [14], [12], [15] on curve extraction are also studied.
The plan of the work is as follows. After reviewing the background and notation in Sec. II, we discuss different schemes for curve extraction in Sec. III. In Sec. IV we compare the performance of these schemes, while the advantages and drawbacks of synchrosqueezing in relation to curve extraction are studied in Sec. V. We draw conclusions and summarize the work in Sec. VI. A computationally efficient O(N ) algorithm for fast optimization of a path functional of particular form over all possible peak sequences is presented in the Appendix.

II. BACKGROUND AND NOTATION
In what follows, we denote byf (ξ) the Fourier transform of the function f (t), with f + (t) as its positive frequency part; and we denote its time-average and standard deviation as f (t) and std[f (t)], respectively: (II.1) where T denotes the overall time-duration of f (t) (in theory T → ∞).
By an AM/FM component (or simply component) we will mean a signal of the form: x(t) = A(t) cos φ(t) (∀t : A(t) > 0, ν(t) ≡ φ (t) > 0) (II.2) In real cases, a signal usually contains many such components, and the goal of ridge analysis is to extract them, either all or only those of interest, from the signal's TFR.
The two main linear TFRs suitable for components extraction and reconstruction are the windowed Fourier transform (WFT) G s (ω, t) and the wavelet transform (WT) W s (ω, t).
The main difference between the two TFRs mentioned is that the WFT distinguishes the components on the basis of their frequency differences (linear frequency resolution), while the WT does so on the basis of ratios between their frequencies (logarithmic frequency resolution). In effect, while the timeresolution of the WFT is fixed, for the WT it is linearly proportional to frequency, so that the time-modulation of the higher frequency components is represented better than that for the components at lower frequencies. Detailed information about our implementation of the WFT/WT computation, practical aspects of their use and related issues is provided in [2], while the respective codes can be found in [16].
In numerical simulations we use a Gaussian window for the WFT and a lognormal wavelet for the WT: where f 0 is the resolution parameter determining the tradeoff between time and frequency resolution of the resultant transform (in simulations we use f 0 = 1). While the methods developed below are applicable for any window/wavelet with real, positive and unimodal (single-peaked)ĝ(ξ) andψ(ξ > 0), the forms (II.4) seem to be the best choice [2], at least for the extraction and reconstruction of components.
To better match the methods with the resolution properties of a given window/wavelet, we will also use the minimal distinguishable time-lags ∆τ g,ψ and frequency difference ∆ξ g (WFT) or log-difference ∆ log ξ ψ (WT) [2]. Thus, due to the finite time and frequency resolution of the WFT/WT, the time variations occurring on time scales smaller than ∆τ g (WFT) or ω∆τ ψ /ω ψ (WT) cannot properly be represented. The same applies to components with frequency difference < ∆ξ g (WFT) or log-difference < ∆ log ξ ψ (WT), which then appear as a single component in the TFR. Following [2], we determine the minimal distinguishable differences as the widths of the regions in time and frequency encompassing 50% of the window/wavelet functions: where τ 1,2 ( ) and ξ 1,2 ( ) are defined as: (II.6) For the Gaussian window and lognormal wavelet (II.4) one has {∆τ g , ∆ξ g } ≈ {1.35f 0 , 1.35/f 0 } and ∆ log ξ ψ = 1.35/2πf 0 , respectively. Note, that the values (II.5) represent the minimal differences for which two time or frequency events can be distinguished in the TFR, while to resolve them reliably, i.e. separate and accurately reconstruct them, one needs larger values, obtained by setting 0.5 → 0.05 in (II.5) [2].
In practice, a typical signal consists of many AM/FM components x i (t) of the form (II.2), and some noise ζ(t) (that can be of any form, not necessarily white and Gaussian): If the noise is not very strong, and the resolution properties of the WFT/WT (determined by the parameters of window/wavelet used) are appropriate for the characteristic timemodulation and frequency separation between components then, at each time, there is a unique peak in the TFR amplitude for each x i (t). Thus, in the signal's TFR the components are represented as "curves", i.e. time sequences of close peaks, as illustrated in Fig. 1. Having found such a sequence, the value of x i (t) can then be reconstructed, either directly from the TFR at the peaks (ridge reconstruction), or from the widest unimodal regions around them (direct reconstruction): see [2] for the corresponding formulas. Therefore, the problem of ridge curve extraction lies in selecting from among all possible trajectories the sequence of peaks that corresponds to a single component. This sequence will be called the ridge curve, while the corresponding frequency profile will be denoted as ω p (t). It can also be defined as that peak sequence from which the component can be reconstructed in the most precise way, and it is usually represented by the peaks closest to the component's actual frequency (but the latter is not usually known a priori for real signals). In practice, it is convenient to extract the most dominant among x i (t) (II.7), i.e. that having largest x 2 i (t) . One can then reconstruct the corresponding component, subtract it from the signal and repeat the procedure to find other components.
In the following, we denote the ridge points, i.e. positions of the amplitude peaks at each time, as ν m (t), the corresponding TFR amplitudes as Q m (t), and their number as N p (t): where H s (ω, t) is the chosen TFR of a given signal (WFT G s (ω, t) or WT W s (ω, t)). Then the ridge curve ω p (t) = ν mc(t) (t), where m c (t) is the sequence of selected peak indices at each time t, which we need to find. Note, that the number of peaks N p (t) can vary in time and in practice is often greater than the number of components present in the signal, with the additional peaks being attributable e.g. to noise.
Remark II.1. Because in practice the frequency scale for WFT/WT is discretized, the positions of the peaks ν m (t) also take discrete values. As a result, e.g. the time-derivative dω p (t)/dt cannot be reliably calculated, because its numerical estimate become "quantized" in steps determined both by the width of the frequency bins and the signal sampling frequency f s . Thus, given high enough f s , numerical dω p (t)/dt will be zero for most of the time (and exceedingly high at some moments). Therefore, the performance of the curve extraction methods, which are often based on the differences between the frequencies of the consecutive ridges, might depend on numerical parameters such as the sampling frequency. To avoid this, we use a parabolic (three point) interpolation to better locate the precise peak positions ν m (t). The numerical dω p (t)/dt then become continuous (rather than discretized), making all procedures more meaningful and universal.

III. CURVE EXTRACTION SCHEMES
Some schemes for ridge extraction were suggested in [9], [10], [6], [11], [12]. In the present work, however, we will consider only procedures with computational cost not higher than O(N ), but we will show below that some computationally expensive schemes based on simulated annealing can be also performed in O(N ) operations.
The most straightforward way to extract the ridge curve is to first choose some starting point ω p (t 0 ), and then follow from it forward and backward in time, selecting next ridges as those maximizing some suitably chosen functional of the corresponding peak amplitudes and the previously selected ridges. This approach, which we will call one-step optimization, can be formulated mathematically as for n = n 0 + 1, . . . , N do: (III.1) and similarly backwards in time, for n = n 0 −1, n 0 −2, . . . , 1. In (III.1), n 0 denotes the discrete index of the starting time t 0 (for which ω p (t n0 ) is known), and the F ... is the chosen functional of the current discrete time t n , the peak positions ν m (t n ) and amplitudes Q m (t n ) at this time, and all previously selected ridge points {ω p (t n0 ≤ t ≤ t n−1 )}. For scheme (III.1) to be O(N ), the functional F ... should either depend on the finite number of previously selected ridges, or on the set of parameters which can be updated in O(1) steps whenever new points becomes available (e.g. the moments [ω p (t)] a ).
To implement (III.1), one needs to choose the starting time index n 0 and the corresponding ridge ω p (t n0 ). A popular idea is to use n 0 = 1 and select ω p (t n0 ) as the frequency corresponding to the maximum among Q m (t n0 ), m = 1, . . . , N p (t n0 ). However, such an approach might be inaccurate due to boundary errors, as the WFT/WT is not well-defined near time ends [2]. Additionally, components might undergo amplitude variations, so that the one which is dominant overall might not be of maximum amplitude at particular times, e.g. t 1 . A more accurate approach is therefore to select the starting point, among all times and ridges, as being that for which the functional in (III.1) is likely to attain its maximum. This can be done as where F 0 [. . . ] denotes a "zero-step" version of the original functional, obtained from the latter by taking its maximum among all the other parameters. Thus, if one has ; if additionally f (Q m (t n ), ν m (t n )) does not depend on ν m (t n ) and is proportional to Q m (t n ), then (III.2) will correspond to the highest TFR amplitude peak over all times. The criterion (III.2) works well in most cases, although it could still provide a "bad" starting point when the sharp time events are present or the noise is too strong. A serious drawback of the outlined one-step approach (III.1) is that even a single wrongly selected point might completely change all the following curve being extracted. Consequently, it is more accurate to optimize the functional not over each consecutive point, as in (III.1), but over the whole profile ω p (t), selecting the ridge curve as that which maximizes the full integral of F [. . . ] over time: 3) This approach, where the optimization is performed over all possible sequences of peak numbers {m 1 , m 2 , ..., m N }, will be referred to as the path optimization. Generally, however, it is computationally very expensive and so the optimization is often carried out by simulated annealing [9]. Nevertheless, it appears that if the functional depends on only a finite number of previous points {ω p (t n−i ), ..., ω p (t n−1 )} rather than the full history, then the optimal path in terms of (III.3) can be selected in O(N ) computations: the algorithm for doing so is presented in the Appendix. As will be seen, the path optimization approach (III.3) is usually much more accurate than one-step optimization (III.1), and should therefore always be preferred to the latter. Furthermore, path optimization has no problem associated with the selection of the starting point (III.2), as all the trajectories are explored.
What remains is to select an appropriate functional in (III.3). Below we consider a few curve extraction schemes, defined by a particular classes of F [. . . ]. In all cases, the fast path optimization algorithm discussed in the Appendix remains applicable, so that the following procedures are each of O(N ) computational complexity. We first consider the schemes for the WFT, and then discuss their adjustment for the WT.

A. Scheme I(α): penalization of frequency jumps
A popular approach is to penalize the frequency difference between two consecutive ridge points, so that where w(∆ξ, α) is some weighting function, suppressing frequency jumps, and α is its set of adjustable parameters. In (III.4), one can choose another function of Q m (t n ) instead of the logarithm, e.g. |Q m (t n )| 2 ; however, the logarithm seems to be the most appropriate because, in this case, the path functional (III.3) depends on the product of all amplitudes and thus can be significantly influenced even by a single "wrong" point, making selection of the latter less probable.
The class of functionals (III.4) is the most widely used one. For example, the approach of [11] corresponds to w(∆ξ, α) = 0 for ∆ξ ∈ [−1/α, 1/α] and = −∞ otherwise, while the implementation of [12] is based on w(∆ξ, α) = −α|∆ξ|; the scheme used in [6] can be also regarded as a modified variant of (III.4). In all the cases mentioned, however, a less accurate one-step approach (III.1) is used for optimization. Additionally, the choice of parameters for these methods can be crucial and is usually non-trivial.
To make the parametrization more universal, the weighting function should utilize the resolution properties of the WFT, determined by the window function being used. Thus, if the WFT has high frequency resolution (large f 0 in (II.4)), then smaller time-variability is allowed for the components (and therefore one expects smaller frequency jumps), and vice versa. The characteristic time-derivative of the ridge frequency can be estimated from the window resolution properties as ∆ξ g /∆τ g (= 1/f 2 0 for the Gaussian window (II.4)), where ∆ξ g and ∆τ g are defined in (II.5). We therefore penalize the frequency differences based on their relationship to the characteristic value for the current WFT, choosing where f s is the signal sampling frequency. The parameter α in (III.5) is then expected to be quite universal, and the same value should work well for different window functions. In (III.5), we choose a simple w(r) = −|r|, but one can instead use another function. However, for any reasonable choice, the method remains qualitatively the same, i.e. one expects it to suffer from the same drawbacks and have similar issues. Note, that scheme I corresponds to the simple cases of "global maximum" and "nearest neighbour" curve extraction at α = 0 and α → ∞, respectively: , so that the maximum peak will be selected at each time, taking no account of the previous ridge points. • Nearest Neighbour (α → ∞). This case differs for onestep optimization (III.1) and path optimization (III.3). The former approach corresponds to selecting at each new step the peak which is nearest to the previous one, taking no account of its amplitude. The latter approach will give simply the least frequency-varying curve.

B. Scheme II(α,β): adaptive parametrization
In the previous scheme, there is an adjustable parameter α that determines the suppression of the frequency variations. Although some choices (e.g. α = 1) appear to be relatively universal, they still remain highly non-adaptive, so that a particular parameter value might be suitable for one type of the signal, and a different value for another type. For example, in the case of chirps it is clear that one should penalize not the frequency jumps, but their difference from the actual frequency growth rate. To make the scheme adaptive, the parameters of the functional should be matched to the properties of the component being extracted, such as the typical variations of its instantaneous frequency. The latter can be characterized by the averages and standard deviations of the ridge frequencies ω p (t) and their differences ∆ω p (t n ) ≡ ω p (t n ) − ω p (t n−1 ). Based on these parameters, one can construct an adaptive functional by suppressing not the absolute frequency jumps, as before, but the relative deviations of the component frequency and its derivative from their typical values: (III.6) where we choose the penalization functions to be By maximizing the path integral (III.3) based on the functional (III.6), one is in fact trying to extract the curve which is most consistent with itself. Thus, the strength of the respective frequency variations becomes not important, and it is only their agreement and similarity at different times that matters. Even the most adaptive method can be parametrized to tackle special cases, and in (III.6) we have introduced the adjustable parameters α and β controlling the strengths of suppression of the corresponding relative deviations. However, although there are now two parameters, they are in fact more universal than the single parameter of scheme I. Thus, the particular choice of α, β for scheme II is expected to work well for a larger class of signals than the particular choice of α in the scheme I, as will be seen below. This is because in (III.6) we take explicitly into account the actual properties of the component being extracted, penalizing deviations from its typical behavior rather than simply the frequency jumps. Additionally, by suppressing the relative deviations of the component's frequency from its mean, scheme II stabilizes the curve in its characteristic frequency range (thus lowering the possibility that it will "escape" and switch to another component), while there is no such mechanism in scheme I. Note, that w 2 (. . . ) (III.7) is specially selected to be of higher order in its argument than w 1 (. . . ) to avoid rapid "breakouts" of ω p (t) from its typical range.
The functional (III.6), however, depends on the whole timeevolution of ω p (t), so the path optimization (III.3) cannot be performed using a fast O(N ) algorithm (see Appendix); nor is it evident how to update the functional at each step if using the one-step method (III.1). Nevertheless, one can approach the approximate optimum curve ω p (t) iteratively. Namely, given some initial guess ω  p (t) (in this case the path optimization algorithm discussed in the Appendix is applicable). Then the (fixed) averages and deviations are updated to those for the ω (1) p (t) and, based on them, the next approximation ω (2) p (t) is found. The procedure is repeated until the curves obtained in two consecutive iterations coincide, indicating convergence (usually requiring only a few iterations).
As an initial guess ω (0) p (t), we take a simple Global Maximum curve, formed by the positions of the highest peaks at each time. However, because such a trajectory might be composed from the parts of curves belonging to different components, it is better to calculate initial averages and deviations using the medians m[. . . ] instead of means . . . . Thus, at first iteration we take and similarly for ∆ω p . This makes the initial parameters more meaningful, while at the next iterations we switch back to means. Note, that the choice (III.8) works well only in conjunction with peak interpolation (see Remark II.1), as it requires ω p and ∆ω p to take continuous values; otherwise, the usual means should be used.
Remark III.1. Apart from the frequency and its difference, one can additionally suppress the relative deviations of the component's other parameters (e.g. the amplitude and its timederivative, higher order differences etc.) by introducing the corresponding terms into (III.6). However, as will be seen below, the performance of the scheme II is already very good, so there is no need to complicate it.
C. Scheme III(α,β): adaptive functional One of the drawbacks of the preceding scheme is that, while the parametrization of the functional is indeed chosen adaptively, the choice of the form of penalization functions (III.7) remains rather empirical. To make the scheme fully adaptive, instead of functions of ridge frequencies and their differences one can use the distributions of frequencies P 2 (ν, {ω p (t)}) and their differences P 1 (∆ν, {ω p (t)}), based on the full timeevolution of ω p (t). The functional for scheme III then takes the form where P 1,2 are estimated as (III.10) with D[x, δx] = 1 if x ∈ [−δx, δx] and = 0 otherwise. Since the frequencies at which the WFT is calculated are discretized as ω k = ω min + (k − 1)∆ω, k = 1, .., N f , it seems natural to take the widths of the bins in (III.10) as the step of this discretization: δξ = δ∆ξ = ∆ω. Then, since ω 1 ≤ ω p ≤ ω N f and |∆ω p | ≤ ω N f − ω 1 , the P 1 and P 2 appear as a vectors of lengths N f and 2N f − 1, respectively.
Similarly to scheme II, for (III.9) we iteratively approach an approximate optimum ω p (t) by repeating the extraction of the ridge curve using the distributions obtained at previous iteration. As the initial trajectory ω (0) p , here we also take the Global Maximum curve (scheme I(0)). However, while we calculate the initial P This is needed in order to not over-restrict possible values of ω p (t) at the beginning and also to better select the curve's characteristic frequency band.
Three additional adjustments are needed to make the performance of the method adequate and stable. They are: (i) set P 1 (|∆ξ| > ∆ξ g ) = 0; (ii) set all unit entries P 1,2 = 1 to zero (which does not, however, apply to the initial P 2 (III.11)); (iii) in (III.9), interpolate the infinite values of log P 1,2 (corresponding to zero entries of P 1,2 ) to make them small but finite. The first modification is useful because frequency jumps larger than ∆ξ g (II.5) often (though not always) indicate that the curve has switched to another component; it is especially advantageous for the first iteration, when the initial maximumbased ω (0) p (t) can contain many such "switches". The second adjustment avoids making statistics out of one point, which might decrease the accuracy of the scheme. Finally, the third, most important, improvement is needed to avoid prohibiting completely any values of the ridge frequency or its difference if ω p (t) obtained at some iteration does not contain them (or if the corresponding entries are set to zero due to adjustments (i) and (ii)). We interpolate infinities in log P 1,2 based on the two previous and the two next finite values, so that two linear interpolations are constructed, and their maximum is taken at each point. For example, if originally log P 1 = {−∞, 1, 3, 2, −∞, −∞, −∞, 2, 4}, then interpolated version would be {−1, 1, 3, 2, 1, 0, 0, 2, 4}.

D. Adjustments for WT
Due to the logarithmic frequency resolution of the WT, one should consider not the frequencies but their logarithms, which is the only significant difference from the WFT case. Thus, in the case of the WT one uses the same schemes and functionals, but now everything is taken on a logarithmic frequency scale (ω p (t n ) → log ω p (t n ), ∆ω p (t n ) ≡ ω p (t n ) − ω p (t n−1 ) → ∆ log ω p (t n ) ≡ log ω p (t n ) − log ω p (t n−1 ), and similarly for all the other frequency variables). We now summarize briefly the required adjustments.
Scheme III. One uses the distributions of frequency logarithms and their differences: Since frequencies for the WT are also discretized on a logarithmic scale, log ω k = log ω min + (k − 1)∆ log ω, k = 1, ..., N f , one chooses in (III.10) the bin widths δ log ξ = δ∆ log ξ = ∆ log ω. Then nothing changes in terms of the length of numerical P 1,2 or the assignment of values to their entries.

A. Test signals
We now test the relative performances of the different methods on two signals. The first test signal is an AM/FM component with simple sinusoidal amplitude modulation and two-sinusoidal frequency modulation, plus a weaker component: The second test signal is taken from real life, representing the central 200 s part of a 30 min electrocardiogram (ECG) signal recorded from a 30 year old man. The WFTs for both signals are shown above in Fig. 1.
The main complications that arise in curve extraction relate to the appearance of other WFT amplitude peaks near ω p (t), which can be due to noise or to other components. We model these complications by corrupting the signal with colored noise η(t): where the noise in the Fourier domain has amplitude while the phases of its Fourier coefficients are random. Being asymmetric, the noise amplitude at frequency 0.5 Hz is around 2.5 times higher than at 1.5 Hz, corrupting the simulated components (which have a mean frequency around 1 Hz) unequally in frequency on both sides. This gives an opportunity to study reliably the relative performance of the different methods, as colored noise can additionally model the effect of other components that are asymmetrically distributed in frequency around the component of interest. The WFTs of the two test signals corrupted with noise are presented in Fig. 2.
Because the ridge points are not exactly equal to the true instantaneous frequency profile, even in the absence of noise [3], we compare the extracted ridge curves ω p (t), not with the true instantaneous frequencies, but with the ridge curves extracted in the noise-free case. Thus, we characterize the quality of the extracted frequency profile by the error f : where ω p (t) is the ridge curve extracted from the original signal, without noise, which is well-defined in this case and the same for almost all methods. Since the noise changes the ridge profile as it appears in the WFT, there is always a small deviation between the extracted profiles with and without noise; this deviation is unrelated to the inaccuracy of curve extraction. Therefore, we only compare the performance of different methods, without aiming to extract the profile as it would be without noise. Note that, while (IV.4) quantifies the error of the component's frequency estimation, results qualitatively similar to those presented below were also obtained for the error of the full signal reconstruction from the extracted ridge curve. In simulations, both test signals are sampled at 20 Hz for 200 s. We will test only curve extraction from the WFT, but qualitatively the results are the same for the WT as well. To eliminate boundary effects in the WFT (see e.g. [2]), we simulate the first test signal (IV.1) for t < 0 and t > 200 and use the corresponding values for padding; the same procedure is applied for the ECG signal (we calculate the WFT of only its central 200 s section, while the rest is used for padding). We use a Gaussian window (II.4) with f 0 = 1 and calculate the WFTs at frequencies ω k = k∆ω ∈ [0.25, 2.25], where ∆ω = ∆ξ g /25 ≈ 2π × 0.008. For both signals, we use 40 noise realizations, which are the same for each method, parameters and deviations σ being tested.
Remark IV.1. Note that, for the first test signal (IV.1), if one extracts ω p (t) corresponding to the weaker component 0.8 cos 2π ×1.75t+0.5 sin 2πt 5 , this can also be regarded as a not-bad result. However, we are mainly interested in testing the accuracy with which the parameters of the dominant component (around 1 Hz) can be recovered. Therefore, if the ridge profile ω p (t) extracted from the WFT of the first test signal lies closer to the frequency of the non-dominant component, we discard the corresponding peaks and re-extract the curve. These considerations do not apply to the second test signal.

B. Results
Results for the first test signal (IV.1), showing curve extraction from the WFT, are presented in Fig. 3. The performance of each method can be quantified by its maximum tolerable noise level σ max , indicated by vertical dotted lines in Fig. 3: this is the level at which the mean error f (IV.4) plus its standard deviation over noise realizations reaches 0.5, implying that in many cases the resultant ω p (t) is inaccurate. Note that, in each case, the default path optimization (III.3) approach has clear and significant advantages over the one-step approach (III.1), with the mean errors for the latter being shown by dashed gray lines in Fig. 3(b,d,f). From Fig. 3, it can be seen that the worst performance in the case of the first test signal (IV.1) is exhibited by the I(0) (Global Maximum) method, which is to be expected, given that the amplitude of the weaker component is sometimes higher than that of the dominant one. With increasing α above zero, the performance of the method I(α) greatly improves, reaching its optimum at some 0 < α < 10, and then deteriorating again. Thus, for scheme I and the parameters tested, the best results are achieved at α = 1.
Nevertheless, much better results are obtained with schemes II(1,1) and II (10,10), which can trace the ridge curve reliably even in the presence of very strong noise. Methods II(10,1) and II(1,10) do not work so well, indicating that large asymmetries between α and β are not advantageous. Note that, although scheme II(1,10) appears as good in terms of f (Fig. 3(d)), its performance is not really satisfactory because of too many frequency jumps (Fig. 3(c)). This situation arises as a result of giving excessive weight to the typical value of frequency, which leads to selection of the noise-induced ridges in its vicinity and thus causes large inaccuracies in the reconstructed amplitude and phase.
Scheme III, on the other hand, demonstrates relatively poor performance for each of the parameter choices considered. Thus, despite its "complete" adaptivity and all the improvements made (see Sec. III-C), this method seems to be highly unstable in practice and does not even outperform scheme I.
Results for the second test signal, the ECG, are presented in Fig. 4. Clearly, the situation there is similar to what was seen  Fig. 2(a)). for the first test signal in Fig. 3. However, now the performance of methods I and II is almost independent of their parameters (except I(0)), at least for the parameter values considered.
Summarizing, by far the best results were achieved with schemes II(1,1) and II (10,10), each of which significantly outperformed all other methods. Scheme I(α) seem to be most accurate for α = 1 (at least for the parameters tested), while the Global Maximum method, corresponding to I(0), is largely useless and should not be used. Apart from the latter, the worst performance was demonstrated by method III. In all cases, the path optimization (III.3) approach was superior to the one-step optimization (III.1).  Fig. 2(b)) .
We have reviewed and further studied the properties of SWFT and SWT in [2], [3], so the reader is referred to these works for all information about our implementation and related issues. Although synchrosqueezed TFRs look much more concentrated and are more appealing visually, it has been found [3] that they do not seem to possess better time or frequency resolution, i.e. do not allow for better reconstruction of components that are close in frequency or have high time variability (as compared to original WFT/WT). Thus, in this respect they are somehow similar to the WFT/WT skeletons (the corresponding transforms with only their amplitude peaks left). Nevertheless, it still remains to be established whether or not synchrosqueezing provides any advantages in terms of curve extraction, i.e. allows more accurate identification of the amplitude peak sequences.
Evidently, the schemes developed for WFT/WT can be straightforwardly applied for tracing the ridge curves in SWFT/SWT. Nothing qualitatively changes, except that now one uses the amplitude peaks of the synchrosqueezed transforms. However, an immediate drawback is that, because of the non-smoothness of SWFT/SWT, one cannot use the parabolic interpolation to better locate the peaks (see Remark II.1), so that ω p (t) take discrete values. This decreases the accuracy of all the procedures and makes them less meaningful, as well as restricts the choice of techniques that can be used (e.g. one can no longer utilize the medians in (III.8), at least for ∆ω p (t)).
A more serious drawback is that, in contrast to the case of the WFT/WT, the amplitudes of the peaks in the synchrosqueezed transforms are generally not universally proportional to the amplitudes of the corresponding components, and depend on the frequency discretization [2]. This is illustrated in Fig. 6 by snapshots of the WFT and SWFT amplitudes of the first test signal at a noise level of σ = 0.6. It can be seen that, even if one component has a smaller amplitude than the other, it may still have a much higher peak in the SWFT (see Fig. 6(a,c)); additionally, the relationship between the peaks will generally depend on the discretization step ∆ω, as can be seen from Fig. 6(b). The characteristics of (nonlinear) proportionality between the width of the frequency bin, and the SWFT/SWT amplitude within it, vary with time, being determined by the instantaneous amplitude and frequency modulation of the component, as well as by interference with other components and noise. This latter feature greatly increases the likelihood of inadvertently jumping from the ridge curve of one component to that of another during the process of curve extraction, as the corresponding SWFT/SWT peaks might be amplified/reduced in comparison to each other at different times. Furthermore, the performance of different methods when applied to synchrosqueezed transforms will generally depend on the widths of frequency bins used. Therefore, the use of SWFT/SWT peak amplitudes for discriminating between the components is in general not appropriate and can lead to unpredictable results, introducing considerable instability.
We have found most of the methods considered to perform significantly worse if applied directly to the SWFT/SWT instead of the original WFT/WT (related results for the first and second test signals can be seen in Supplementary Figs. 1  and 2, respectively). The exceptions are the schemes I(1,1) and I(10,1) which still work well (for the second test signal even slightly better than without synchrosqueezing). However, for the first test signal these schemes might sometimes select the ridge curve formed from the parts corresponding to different components (in Supplementary Fig. 1, this is indicated by "spikes" in the ensemble-mean of f (IV.4)). Although such situation is very rare, occurring for only one or few out of 40 noise realizations, it nicely reflects the fact that the use of the SWFT/SWT peak amplitudes for curve extraction is not appropriate in general, even if it works well in particular cases (e.g. when there is only one pronounced component, as for the second test signal).
In the case of synchrosqueezed transforms, the physically meaningful amplitude is that of the overall sum of the SWFT/SWT over the (time-dependent) frequency region where the component is concentrated, and not the amplitude at the peak [2]. The problems attributable to use of the latter can therefore be solved by using a more appropriate estimate. Hence, at each time t we break the SWFT/SWT into the widest regions of non-zero amplitude [ω Then, instead of peak values, as Q m (t) and ν m (t) we use in all procedures the amplitudes and frequencies reconstructed from these regions by the direct method [2]: This modification makes curve extraction much more meaningful, because Q m (t) (V.2) do not depend on the frequency discretization and reflect the true amplitudes of the components, so that they can appropriately be compared; additionally, ν m (t) now take continuous values, as desired. However, the amplitude and frequency estimates (V.2), obtained from the full component time-frequency support in SWFT/SWT, are more sensitive to noise than the estimates (II.8) obtained from the WFT/WT peaks [3]. In effect, even with a modification (V.2), we have found that the extraction of the curves from the synchrosqueezed transforms by any method gives results that are at best comparable with, but usually worse than, extraction from the usual WFT/WT. The corresponding results for the SWFT are shown in Supplementary Figs. 3 and 4, to be compared with Figures 3 and 4 here. The best performance is still demonstrated by methods II(1,1) and II (10,1).
Note that, in contrast to the usual WFT/WT, synchrosqueezed transforms often contain a lot of "spikes" with small Q m (t), separated by regions of zero amplitude (see Fig. 6(a,b)). These small peaks occur both due to noise and as a side effect of amplitude/frequency modulation or interference. Consequently, at any given time, there are many more candidate ridge points ν m (t) for the SWFT/SWT than for the originating WFT/WT. Due to this, the computational cost of curve extraction from synchrosqueezed transforms is considerably higher than from conventional smooth TFRs. Furthermore, such structure might have an adverse impact on the performance of different methods, confusing them by suggesting many unphysical peaks as a candidates for the next ridge point.

VI. CONCLUSIONS
We have presented and compared a few techniques for ridge curve extraction from the WFT/WT. Additionally, we have devised a fast O(N ) algorithm for finding the frequency profile that maximizes the path functional (III.3) (if it belongs to a particular class, see Appendix), a problem that had previously been solved by computationally expensive simulated annealing. The MatLab codes for curve extraction are available in [16] together with other useful programs and information.
Our results suggest that the most accurate scheme in general appears to be II(α,β) with α ≈ β. Scheme I(α) is less accurate and lacks adaptivity, while scheme III(α,β), although adaptive, usually gives inferior results. In scheme II(α,β), the parameters β and α control the strengths of suppression of the relative deviations of frequency and its time-derivative from the corresponding average values, respectively. Although they can be adjusted to better match any specific problem, the choice α = β = 1 (default in the codes), seems to work well in the majority of cases. Thus, scheme II(1,1) appears be of almost universal utility, being a type of "just apply" method which does not require any tuning from the user.
We have also tested the effects of synchrosqueezing [13], [14], [12], [15] in relation to curve extraction, and found that the drawbacks heavily outweigh the advantages. Thus, only the schemes II(1,1) and II(10,1) work reasonably well if applied to SWFT/SWT, and in some specific cases synchrosqueezing might even improve slightly their performance. Generally, however, the structure of the SWFT/SWT seems to be less suitable for curve extraction compared to that of the WFT/WT, at least for the methods considered. F Q mn (t n ), ν mn (t n ), ν mn−1 (t n−1 ) , (VI.1) after which the ridge curve is recovered as ω p (t n ) = ν mc(tn) (t n ). This is the case utilized in all schemes presented in this work.
It is clear that at each time t n for each ridge ν m (t n ) there exists a unique history of previous peaks { m c (m, t n , t 1 ), . . . , m c (m, t n , t n−1 )} which maximizes the integral to this point (VI.2) What makes a fast path optimization possible is that, for functionals depending only on the current and previous points, if the profile {m c (t)} maximizing (III.3) includes ν m (t n ), then it should include the best path to ν m (t n ) as well: {m c (t 1 ), . . . , m c (t n )} = { m c (m, t n , t 1 ), . . . , m c (m, t n , t n−1 ), m}. This is because the behavior of m c (t i=n+1,..,N ) does not influence the integral over the previously extracted points m c (t i=1,..,n−1 ). Therefore, at each step we can leave only the best paths to each peak ν m (t) and discard all the others.
It is useful to express m c (m, t n , t i ) through the matrix q(m, t n ) which maps the peak number m at time t n to the previous peak number in such a way that (VI.2) is maximized. We therefore introduce 3) What remains is to find at each time t n (starting from t 1 ), and for each ridge m = 1, . . . , N p (t n ), the maximum value U (m, t n ) of the integral up to this point and the index of the previous ridge q(m, t n ) for which this maximum is achieved: for n = 1, . . . , N and m = 1, . . . , N p (t n ) do: Then U (m, t N ) represents the full integral (VI.1) to each of the last ridges ν m (t N ), and one has m c (t N ) = argmax m U (m, t N ), with the sequence corresponding to this index being the optimal path: For example, for the functional F [. . . ] = log Q m (t n ) + w(ν m (t n ) − ω p (t n−1 ), α) (scheme I (III.4)), we calculate t 1 : for m = 1, .., N p (t 1 ) q(m, t 1 ) = 0, U (m, t 1 ) = log Q m (t 1 ), (VI.5) where q(m, t 1 ) is set to zero because there are no peaks before the starting time.
Numerically, the q(m, t n ) and U (m, t n ) represent M p × N matrices, updated at each step, where M p = max n N p (t n ) is the maximum number of peaks; the excess entries q({N p (t n ) + 1, .., M p }, t n ) and U ({N p (t n ) + 1, .., M p }, t n ) are set to Not-a-Numbers (NaNs). Since at each time t n we need to calculate for each of the N p (t n ) peaks the functional with each of the N p (t n−1 ) of the previous peaks (to find the one maximizing it), the overall computational cost of the procedure is O(M 2 p N ). The outcome of the algorithm is illustrated below on a schematic example: Note, that in this example there are two ways of going from the second peak at time t 1 : either to the second row (m c (t 2 ) = 2), corresponding to U (2, t 2 ) = 2.0, or to the third one, corresponding to U (3, t 2 ) = 2.4. The one-step scheme (III.1) would select the third peak, but using the path optimization scheme we explore all the possibilities, and find out that going through the second one leads at the end to the higher path functional (III.3). The algorithm outlined for optimizing (VI.1) can be adapted for functionals depending on any finite number of previous peak positions. However, the longer the history that one needs to take into account, the more computationally expensive it becomes. For example, if functional F [...] depends on two previous points ω p (t n−1 ) and ω p (t n−2 ), then one will need to apply the same procedure but instead of single ridges treat their one-step sequences. Thus, in this case one selects the trajectory maximizing the path functional (III.3) to each of the N p (t n−1 ) × N p (t n ) point combinations {ν k (t n−1 ), ν m (t n )}.  Fig. 4 in the manuscript, but the curve is extracted from the synchrosqueezed WFT using "integrated" ridge points (see Eq. (5.2) in the manuscript).