Maximum-likelihood parameter estimation in terahertz time-domain spectroscopy

We present a maximum-likelihood method for parameter estimation in terahertz time-domain spectroscopy. We derive the likelihood function for a parameterized frequency response function, given a pair of time-domain waveforms with known time-dependent noise amplitudes. The method provides parameter estimates that are superior to other commonly-used methods, and provides a reliable measure of the goodness of fit. We also develop a simple noise model that is parameterized by three dominant sources, and derive the likelihood function for their amplitudes in terms of a set of repeated waveform measurements. We demonstrate the method with applications to material characterization.


Introduction
At the heart of most applications of terahertz time-domain spectroscopy (THz-TDS) is a mathematical procedure that relates raw THz-TDS waveform measurements to parameters of scientific and technological interest [1][2][3].Typically this analysis is formulated in the frequency domain, since it provides the most natural description of any linear, time-invariant system of interest.But since a THz-TDS waveform describes the electric field as a function of time, it must be Fourier-transformed for use in any frequency-domain analysis.The Fourier transform scrambles the THz-TDS measurement noise, which is more naturally represented in the time domain, and produces artifacts that can degrade the overall measurement quality and yield misleading results [4][5][6][7][8][9][10][11][12][13].Furthermore, the standard approaches to parameter estimation in THz-TDS involve the ratio of two noisy waveforms, which further distorts the noise spectrum and can yield noise distributions that are far from Gaussian.Other approaches have emerged that improve on the standard procedures [14][15][16], but so far all of the existing approaches to THz-TDS analysis lack clear, model-independent statistical criteria for establishing parameter confidence intervals or for deciding whether a given physical model describes the data well or not.Here, we introduce a general framework to remedy this [17].We describe a maximum-likelihood approach to THz-TDS analysis in which both the signal and the noise are treated explicitly in the time domain, together with a frequency-domain constraint between the input and the output signal.We show that this approach produces superior results to existing analysis methods.

Signal and noise in THz-TDS
A Monte Carlo simulation of an elementary THz-TDS analysis illustrates the basic problems that our method solves; it also allows us to develop notational conventions that we will use to describe our main results.Figure 1(a) shows an ideal, noiseless THz-TDS waveform ( ) normalized to its peak value p [18], together with a time-domain Gaussian noise amplitude ( ) given by 2 ( ) with amplitudes = 10 −4 p , = 10 −2 , and = 1 fs.Physically, the additive noise term is produced by the detection electronics (in units of p ); the multiplicative noise term | ( )| dominates at times when both ( ) and ( ) are small, although in the figure it is barely discernible from zero.Such structured, signal-dependent, time-domain noise is common in THz-TDS waveform measurements, and leads to well-known ambiguities in defining the signal-to-noise ratio and dynamic range for them [5,6,8].
We simulate a single THz-TDS waveform measurement by computing at equally spaced times = , ∈ {0, 1, . . ., − 1}, with = 256 and = 50 fs, where each ( ) is an independent random variable with a standard normal distribution.This sequence has the discrete Fourier transform (DFT) at the discrete frequencies = 2 /( ), ∈ {0, 1, . . ., − 1}, where ˜ ( ), ˜ ( ), and ˜ ( ) denote the DFTs of ( ), ( ), and ( ), respectively, and the ⊛ symbol denotes circular discrete convolution.Figure 1(b) shows the power spectral density ( ) = ( / )| ( )| 2 at the unique frequencies given by ≤ floor( /2) = ⌊ /2⌋ for ten such simulations, where each spectrum is normalized to its peak value.The signal decreases exponentially with frequency, falling by a bit more than 40 dB from its peak power before reaching the noise floor near 5 THz.The red-highlighted spectrum in Fig. 1(b) shows that while the noise is relatively flat between 5 THz and 10 THz, it exhibits oscillatory fluctuations with a period of about 0.75 THz.These also appear in all of the other spectra in Fig. 1(b), and arise because the convolution [ ˜ ⊛ ˜ ] ( ) smooths the noise over a frequency scale given by the inverse width of ( ).The resulting correlations blur the distinction between the signal and the noise, and their presence implies that the true uncertainty in ( ) is significantly smaller than the noise floor suggested by ( ), which neglects the information that noise at one frequency provides about the noise at neighbouring frequencies.

Transfer function estimation in THz-TDS
To measure the response of a system requires measurements of two THz-TDS waveforms: an input, ( ), and an output, ( ) = [ℎ * ] ( ), where ℎ( ) is the impulse response of the system and the * symbol denotes continuous-time convolution.Fourier transforming the ideal relationship ( ) = [ℎ * ] ( ) converts the time-domain convolution to a frequency-domain multiplication, ˜ c ( ) = ( ) ˜ c ( ), where ˜ c ( ) and ˜ c ( ) denote the continuous-time Fourier transforms of ( ) and ( ), respectively, and ( ) = hc ( ) is the continuous-time transfer function of the system.Proceeding as we did for a single waveform, we simulate input and output waveform measurements ( ) and ( ), respectively, by computing ( ) and ( ) at the discrete times and adding time-domain noise ( ) ( ) and ( ) ( ), where each ( ) and ( ) is an independent random variable with a standard normal distribution.After applying the discrete Fourier transform to obtain ( ) = ˜ ( ) and ( ) = ˜ ( ), we can determine the empirical transfer function estimate (ETFE) [19], where ˜ ( ) and ˜ ( ) are the DFTs of the ideal input and output signals, respectively, that we would obtain in the absence of noise, and [ ˜ ⊛ ˜ ] ( ) and [ ˜ ⊛ ˜ ] ( ) are the corresponding frequency-domain noise terms.The ETFE is a common starting point for THz-TDS analysis-it is easy to compute from the raw data, and it provides an estimate of ( ) for any linear, timeinvariant system.Frequently, though, the focus of interest is not ( ) itself but a relatively small number of parameters that specify ( ) through a physical model, such as the thickness and refractive index of a material.In this case, fitting the model directly to ˆ ETFE ( ) can yield ambiguous results, because ˆ ETFE ( ) is biased and has noise that is both correlated and non-Gaussian.
To illustrate these deficiencies, Fig. 2 shows 250 simulations of ˆ ETFE ( ) with nominally identical input and output pulses, corresponding to ( ) = 1.The red-highlighted curves show correlations similar to those shown in Fig. 1, but that extend to frequencies where the signal is much stronger.The average over all simulations reveals the bias in ˆ ETFE ( ): in Fig. 2(a), Re{ ˆ ETFE } falls from the correct value of 1 to the biased value of 0 as the frequency increases above 5 THz, where the signal reaches the noise floor in Fig. 1(b).Also, the noise distribution for ˆ ETFE ( ) develops wide, non-Gaussian tails at frequencies where the signal is small, because noise fluctuation that cause | ( )| → 0 become more likely, and these in turn cause ˆ ETFE ( ) to diverge [20].If the noise is uncorrelated, such large fluctuations are only expected when the signal is small.But in the more typical case that the noise is correlated, they can influence frequencies well within the primary signal bandwidth, as indicated by the red-highlighted curves in Figs.2(c) and 2(d).
Figure 3 shows how these problems distort standard measures of fit quality.It displays the results of weighted least-squares fits to the ETFE simulations in Fig. 2 with the model which rescales the input by an amplitude 1 and shifts it by a delay 2 when we adopt the exp(− ) convention for harmonic time dependence.For each fit, the estimated parameter vector ˆ is chosen to minimize where each 2 = (Var{Re[ ˆ ETFE ( )]}+Var{Im[ ˆ ETFE ( )]}) is determined from the Monte Carlo samples.These fits return accurate estimates for the model parameters-over all simulated samples, we obtain ˆ 1 = 1.000 ± 0.005 and ˆ 2 = (0.0 ± 0.8) fs, which are consistent with the true values, 1 = 1 and 2 = 0, used in the simulation.But the normalized fit residuals, given by show strong deviations from the standard normal distribution, exhibit structure that could easily be mistaken for real physical features, and yield a goodness of fit (GOF) statistic ETFE = ETFE ( ˆ ) that looks nothing like the 2 distribution that we would normally use to evaluate the fit quality.Also, the uncertainty estimates, ˆ 1 = 0.001 and ˆ 2 = 0.2 fs, obtained by the usual method of inverting the curvature matrix of ETFE ( ) around the minimum of each fit, significantly underestimate the values that are actually observed over the set of Monte Carlo samples, 1 = 0.005 and 2 = 0.8 fs.In short, while an ETFE fit with Eq. ( 6) may provide accurate parameter estimates when the underlying model is known in advance, it offers poor discrimination between good models and bad ones, and it yields parameter uncertainty estimates that are unrealistically precise.
An alternative approach is to use a least-squares (LS) procedure that minimizes the sum of the squared differences between the output signal and the transformed input signal [14][15][16], Here, ℎ( ; ) is the model impulse response, equal to the inverse DFT of ( ; ), and [ℎ( ) ⊛ ] ( ) represents the convolution of impulse response with the input vector at the time .The equivalence between the frequency-domain sum and the time-domain sum is assured by Parseval's theorem.The LS procedure avoids the division by ( ) that distorts the noise distribution of the ETFE in the small-signal limit, and the time-domain residuals LS ( ; ˆ ) = ( ) − [ℎ( ) ⊛ ] ( ) are a sensitive indicator of fit quality.But the uniform weighting in Eq. ( 8) implicitly assumes that this noise is constant in time, which as we have noted is frequently not the case.
In principle, we should be able to account for time-dependent noise by assigning appropriate weights to the sum, but this is not as simple as it might seem.The problem is that Eq. ( 8) treats the input and output noise asymmetrically, which we can see more clearly if we express it explicitly in terms of the time-domain signal and noise sequences: This asymmetry results from the implicit assumption that all noise is restricted to the output variable in an LS fit, so that the term [ℎ( ) ⊛ ( )] ( ) can be ignored.When input noise is present-as it always is in THz-TDS-the optimal fit weights will depend not just on the measurement noise sequences but also on ℎ( ; ).Any modification of Eq. ( 8) with constant weights will generically bias the parameter estimates toward values that minimize the input noise contribution, and will cause the GOF statistic, LS = LS ( ˆ ), to have a distribution that also depends on ℎ( ; ).As we discuss below, these problems can all be solved with a fit procedure derived from the maximum-likelihood principle.

Maximum-likelihood estimation of a parameterized transfer function model
The likelihood function is equivalent to the probability density for obtaining the measured data under the assumptions of a given model, and forms the theoretical basis for the standard least-squares fit procedure.To define it in the THz-TDS context, we express the measured input and output signals in vector notation as = [ ( 0 ), ( 1 ), . . ., ( −1 )] T and = [ ( 0 ), ( 1 ), . . ., ( −1 )] T , respectively, and assume that they represent noisy measurements of bandlimited but otherwise unknown ideal signal vectors = [ ( 0 ), ( 1 ), . . ., ( −1 )] T and = [ ( 0 ), ( 1 ), . . ., ( −1 )] T with noise covariance matrices and , respectively.We further assume that and satisfy the linear constraint = ( ) , where is a known matrix function of an unknown -dimensional parameter vector .The likelihood function is then a product of two multivariate Gaussian distributions, where the dependence on on the right side of the expression is implicit in the constraint between and .The signal vectors and appear as arguments in the likelihood function but are otherwise unimportant-the statistical literature aptly describes these as nuisance parameters, which we must eliminate to estimate , the parameters of interest [21].
We can use discrete-time Fourier analysis to obtain explicit expressions for ( ), , and .The time-domain transfer matrix ( ) is where now extends to negative values of , and is assumed even; for odd , the sum runs from = −( − 1)/2 to ( − 1)/2 and the anomalous term at the Nyquist frequency is absent.Similarly, we can define the discrete-time matrix representation of the time-derivative operator, , as by recognizing that the transfer function for the continuous time-derivative operator is ( ) = − .Note that Eq. ( 12) is valid for all values of because Re[− /2 ] = 0, so the anomalous term in Eq. ( 11) is zero.Equation (12) allows us to express the noise variance function 2 ( ) in Eq. ( 1) as a matrix function in discrete time, The covariance matrices for and are then The maximum-likelihood (ML) estimate for the model parameters is given by the vectors ˆ , ˆ , and ˆ that maximize subject to the constraint = ( ) .Alternatively, we can minimize the negative-log likelihood, also subject to the constraint, where the last term now has the familiar least-squares form.The dependence of the covariance matrices on the signal vectors prevents us from using a standard least-squares optimization algorithm to minimize Eq. ( 15), but to a very good approximation we can substitute ( ) ≈ ( ) and ( ) ≈ ( ) to obtain The first three terms are now all constant, so we can focus on minimizing the last term, which we multiply by two to obtain a cost function in what is known as a total least-squares form [22], which is also subject to the constraint = ( ) .Note that the total least-squares cost function treats the input and output variables symmetrically, unlike the conventional least-squares cost function in Eq. ( 8).Geometrically, it can be interpreted as the sum of the squared distances between each measurement point ( , ) and its associated model point ( , ), using the metric tensors ( ) and ( ), respectively, for the input and output variables.As discussed in Sec. 3, the LS cost function in Eq. ( 8) focuses entirely on the deviations in the output variable.Introducing an -dimensional vector of Lagrange parameters to implement the constraint, we can define the modified cost function, which we may now minimize with respect to , , , and .The parameters of interest are ; minimizing with respect to the remaining parameters gives the equations which have solutions Evaluating ˜ TLS at = ˆ , = ˆ , and = ˆ yields a simplified cost function that involves only the parameter vector and the data vectors and , TLS ( ) We can simplify these expressions further by defining Substituting Eq. (26) in Eqs. ( 24) and ( 22) reveals that ˆ is just the weighted average of and ( ) , where the matrices ( , ) and ( ) are approximately equal to the true (but unknown) covariance matrices Cov[ ( ) ] = ( , ) and Cov( ) = ( ), respectively.We emphasize here that although ( ) = , the covariance matrices Cov[ ( ) ] and Cov( ) are generally different, since ( ) transforms the noise in as well as the signal.For example, if ( ) = , ≠ 1, and the noise is purely additive, we have Cov( Substituting Eq. ( 26) in Eq. ( 25) yields Like the simpler LS cost function in Eq. ( 8), the TLS cost function in Eq. ( 28) is expressed in terms of the deviations − ( ) , but is now properly normalized with respect to the associated covariance matrix, ( ) + ( , ).Introducing the matrix square root, 1/2 = ↔ = 2 , we may define the normalized residual vector as which allows us to express the TLS cost function in the compact form, The TLS estimate for the parameter vector, ˆ , may now be determined by minimizing TLS ( ) using a standard least-squares optimization algorithm.Figure 4 shows fit results for the same model and data shown in Fig. 3, but obtained by minimizing TLS ( ) in Eq. (30) instead of ETFE ( ) in Eq. ( 6).The statistical properties obtained with TLS ( ) are clearly superior to those with ETFE ( ).The GOF statistic, TLS = TLS ( ˆ ), approximates a 2 distribution with = − degrees of freedom, as expected.The normalized residual vector TLS ( ˆ ), shown for one of the fits, exhibits a standard normal distribution with no discernible correlations among neighboring time points.The distribution for TLS is more than 7 times narrower than the distribution for ETFE , making it that much more sensitive to a poor fit.Similarly, the lack of structure in TLS ( ˆ ) makes it more useful for residual analysis than ETFE ( ; ˆ ), since structure associated with poor fits may be discerned more readily.Finally, unlike the ETFE fits, the TLS fits yield estimates for the parameter uncertainties that agree with the values observed over the set of Monte Carlo samples, 1 = 0.002 and 2 = 0.4 fs, which are both about a factor of 2 smaller than the values observed in the ETFE parameter estimates.In summary, when compared with the standard ETFE fit procedure, the TLS fit procedure offers better discrimination between good models and bad ones, more precise values for the parameters, and more accurate estimates of the parameter uncertainties.

Maximum-likelihood estimation of the noise model
We have assumed so far that the noise amplitudes , , and in Eq. ( 13) are known, but in general they must also be determined experimentally.One way to do this is to measure the noise at three different points on the THz waveform: as we saw in Fig. 1, the term dominates near the peak, the term dominates where the signal crosses zero with a large slope, and the term dominates where both the signal and its slope are small.Another approach is to determine the variance as a function of time from a set of repeated waveform measurements [5], then obtain the noise parameters from a fit with Eq. ( 1).But both of these methods are susceptible to systematic errors from drift, which produces excess signal variability over the timescale of multiple measurements [8,9,23].In this section, we describe a likelihood method for estimating the noise parameters that accounts for this drift.We consider repeated measurements of an ideal primary waveform, ( ), which has an amplitude and a delay that drift on a timescale longer than the waveform acquisition time.We can then associate each measurement with an ideal secondary waveform, where is the relative amplitude, is the delay, and ∈ {0, 1, . . ., − 1}.We also set 0 = 1 and 0 = 0, so that ( ; 0 , 0 ) = ( ).As before, we sample these waveforms at the nominal times = , ∈ {0, 1, . . ., − 1} to obtain the ideal primary waveform vector = [ 0 , 1 , . . ., −1 ] T and measurement vectors = [ ( 0 ), ( 1 ), . . ., ( −1 )] T , which we can arrange in an × matrix = [ 0 , 1 , . . ., −1 ].We also write the amplitudes and delays in vector form, = [ 0 , 1 , . . ., −1 ] T and = [ 0 , 1 , . . ., −1 ] T , respectively.With these definitions, we can express the sampled secondary waveforms in vector form, where the matrix ( ) is defined to impart a delay by .Using Eq. ( 11) with the transfer function ( ; ) = exp( ), we can write this matrix explicitly as for even , with changes for odd as described for Eq.(11).Following Eqs. ( 13) and ( 14), and arranging the secondary waveform vectors in an × matrix = [ 0 , 1 , . . ., −1 ], we also have where is defined in Eq. ( 12), and depend implicitly on and , and the dependence of on the noise amplitudes is now expressed explicitly.The likelihood function for observing under these assumptions is in which the noise amplitudes , and are the parameters of interest and the signal vectors , and are nuisance parameters.As with Eq. ( 10), it is more convenient computationally to work with the negative-log likelihood, Ignoring the constant first term, multiplying the remaining terms by 2, and expressing the matrix elements explicitly, we can define the ML cost function, ML which we minimize to obtain ML estimates for all of the free parameters in the model.The resulting estimates for the noise parameters ( , , and ) are biased below their true values, which we can attribute to the presence of the nuisance parameters ( , , and ) [21].For example, in 1000 Monte Carlo simulations of = 10 repeated measurements using the waveforms described in Sec. 2, the ratios of the ML estimates to their true values are ˆ / = 0.95 ± 0.02, ˆ / = 0.94 ± 0.03, and ˆ / = 0.89 ± 0.09, all significantly below unity.Increasing the number of observations to = 50 reduces the bias to ˆ / = 0.990 ± 0.007, ˆ / = 0.98 ± 0.01, and ˆ / = 0.93 ± 0.04, but some bias remains in ˆ and ˆ even in the limit → ∞.This residual bias arises because the number of elements in and both grow with the number of observations, which allows us to account for drift but dilutes some of the information that the data provide about the noise [21,24].In principle, we can resolve this problem by integrating out all of the nuisance parameters in Eq. ( 35) to obtain a marginal likelihood that depends only on , , and [21], but this involves computationally expensive integrations for a relatively small benefit.As we discuss below, we can remove much of the bias by simply rescaling the noise parameters by a suitable correction factor.
To determine the bias correction factor it is helpful to consider a simplified example in which there is no drift and only additive noise, so that = 1 and = 0 for all and = = 0.The ML cost function then reduces to which is minimized by This result for ˆ 2 is biased because it is the waveform average of 2 ( ), which in turn is just the biased sample variance of the measurements at .To remove this bias we can apply Bessel's correction, which replaces the factor of 1/ with 1/( − 1) in Eq. ( 40).Alternatively, we can multiply ˆ 2 by /( − 1) in Eq. (39).Returning to Eq. ( 37) and following similar reasoning, we can justify rescaling all of the ML noise parameter estimates by the same factor, With these corrections, the Monte Carlo simulations with = 10 yield ˆ * / = 1.00 ± 0.02, ˆ * / = 0.99 ± 0.03, and ˆ * / = 0.94 ± 0.09, which are all within the statistical uncertainty of the true values.The simulations with = 50 yield lower statistical uncertainty, but with the same average values: ˆ * / = 1.000 ± 0.007, ˆ * / = 0.99 ± 0.01, and ˆ * / = 0.94 ± 0.04.For all practical purposes, the remaining bias in ˆ * and ˆ * is negligible.

Experimental verification
In this section we present experimental results that verify our analysis.Figure 5(a) shows the raw data that we use to specify the noise model, , which comprises = 50 waveforms, each with = 256 points sampled every = 50 fs.With this data as input, we minimize Eq. ( 37) to obtain the ML estimates, ˆ , ˆ , ˆ , ˆ , ˆ , and ˆ for all of the free parameters in the model.The resulting time-dependent noise amplitude, corrected for bias using Eq. ( 41), is In Fig. 5(b) we compare this model to the observed time-dependent noise, which we estimate by using ˆ and ˆ to correct for the drift, then compute the unbiased standard deviation at each time point over the set { ˜ }.The model quantitatively describes most features of the measurements, with small deviations near some of the peaks.As a further consistency check, Fig. 5(c) shows the normalized residuals for one of the waveforms, which demonstrates that they approximately follow a standard normal distribution, with small but statistically significant correlations among neighboring points.Figure 6 shows fits to two sequential waveforms from this set.A fit with Eq. ( 5) in the time domain, obtained by minimizing TLS in Eq. (28), yields ˆ TLS but statistically significant correlations.But as we also found with the idealized Monte Carlo simulations, the residuals of the ETFE fit in the frequency domain are much more structured than the residuals of the TLS fit to the same data in the time domain.
To illustrate the analysis problem that this raises, in Fig. 6(c) we compare an ETFE fit with Eq. ( 5) to an ETFE fit with a more flexible transfer-function model, Since the two input waveforms are nominally identical,we know that the downturns in Re[ ˆ ETFE ( )] and Im[ ˆ ETFE ( )] with increaing frequency near 2 THz are spurious.But if we did not know this in advance, we might conclude from Fig. 6(c) that Eq. (45) describes the measurements better than Eq. ( 5).We would also be able to support this conclusion by comparing the GOF statistic for the fit with Eq. ( 5), (1)  ETFE ≈ 223, with = 254 degrees of freedom, to that obtained with Eq. ( 45), ETFE ≈ 191, with = 252 degrees of freedom.By adding only two additional fit parameters, we reduce ETFE by 33, which erroneously suggests that the added complexity of Eq. (45) captures a real physical effect.The TLS method is more robust against such overfitting: the GOF statistic with Eq. ( 5) is (1)  TLS ≈ 198, while with Eq. ( 45), (2)  TLS ≈ 196.In this case, adding two free parameters reduces TLS by only two, so Occam's razor favors the simpler model, Eq. ( 5).More formally, to select from a set of transfer-function models ( ; ), = 1, 2, . . ., model , each with free parameters, we can minimize a modified cost function  5).(c) Real and imaginary parts of ˆ ETFE ( ) − 1 (dots), fitted with Eq. ( 5) (solid lines) and Eq. ( 45) (dotted lines) by minimizing ETFE in Eq. ( 6).(d) Normalized frequency-domain residuals, ETFE ( ; ˆ ETFE ), for the fit with Eq. ( 5).
based on the Akaike information criterion [21], AIC ( ) = ( ) TLS + 2 , where ( ) TLS is the TLS GOF statistic for the model ( ; ).As we discussed in Sec. 4, this ability to discriminate among competing models is one of the major advantages of the TLS method.
If we divide the 50 experimental waveforms into 25 sequential pairs and fit each pair with Eq. ( 5), the results are qualitatively consistent with the Monte Carlo simulations and support the conclusion that TLS offers the better absolute measure of fit quality.The distribution of ETFE over the experiments has a mean ¯ ETFE ≈ 235 and standard deviation ( ETFE ) ≈ 113, while the distribution for TLS has a mean ¯ TLS ≈ 246 and standard deviation ( TLS ) ≈ 39.For the simulated distributions shown in Fig. 3(a) and Fig. 4(a), we obtain ¯ ETFE ≈ 235, ( ETFE ) ≈ 160, ¯ TLS ≈ 250, and ( TLS ) ≈ 22.As we discussed at the end of Sec. 4, a narrower GOF distribution provides better sensitivity to fit quality.And while the experimental distribution for TLS is nearly twice as broad it is in the simulations, it is still nearly a factor of 3 narrower than the experimental distribution for ETFE .Despite the quantitative differences between the simulations and the experiment, the TLS method consistently shows better performance than the ETFE.

Conclusion
In summary, we have developed a maximum-likelihood approach to THz-TDS analysis and demonstrated that it has numerous advantages over existing methods.Starting from a few simple assumptions, we derived a method to determine the transfer function relationship between two THz-TDS measurements, together with a method to specify a parameterized noise model from a set of repeated measurements.In each case, we also derived expressions for fit residuals in the time domain that are properly normalized by the expected noise.Through a combination of Monte Carlo simulations and experimental measurements, we verified that these tools yield results that are accurate, precise, responsive to fit quality, and generally superior to the results of fits to the ETFE.
We focused on simple examples to emphasize the logical structure of the method, but we can readily apply the same approach to more complex problems.For example, we have successfully used the framework presented here to measure a weak frequency dependence in the Drude scattering rate of a metal, which is predicted by Fermi liquid theory; we have also used it to measure small variations in the THz-frequency refractive index of silicon with temperature [17].In both of these applications we found that maximum-likelihood analysis in the time domain provided better performance than a least-squares analysis based on the ETFE in the frequency domain.We expect similar improvements in other applications, and provide MATLAB functions in Code Repository 1 (Ref.[25]) for others to try.
Funding.JSD acknowledges support from NSERC and CIFAR, and DGS from an NSERC Alexander Graham Bell Canada Graduate Scholarship.

Fig. 1 .
Fig. 1.(a) Simulated time-domain signal (black solid line) and noise amplitude (gray dashed line).(b) Power spectral density (relative to peak) obtained from ten simulated time-domain measurements, with one shown in red and the rest shown in gray.

Fig. 2 .
Fig. 2. Gray dots show the real (a,c) and imaginary parts (b,d) of the empirical transfer function estimate ˆ ETFE , Eq. (4), for 250 pairs of simulated time-domain measurements of the waveform shown in Fig. 1(a).One estimate is highlighted in red, with the dots connected by a thin red line.The solid black line shows the average over all simulations.Panels (a,b) show the full bandwidth and (c,d) show the same data over the primary signal bandwidth.

Fig. 3 .
Fig.3.Measures of fit quality for ETFE fits with Eq. (5), obtained by minimizing Eq. (6).(a) Cumulative distribution of the GOF statistic, ETFE , for the Monte Carlo simulations shown in Fig.2.The 2 distribution for the same number of degrees of freedom ( = 254) is shown for comparison.The red × marks a fit with ETFE ≈ 180, which is just above the median value.The normalized residuals for this fit, ETFE ( ; ˆ ), are shown in (b) as a function of frequency, and in the inset to (a) with a normal probability plot (black dots, which represent both the real and the imaginary parts of ETFE ).The gray dash-dotted line in the inset represents the standard normal distribution.

Fig. 4 .
Fig.4.Measures of fit quality for TLS fits with Eq. (5), obtained by minimizing Eq. (30); compare with Fig.3.(a) Cumulative distribution of the GOF statistic, TLS , for the Monte Carlo simulations shown in Fig.2.The 2 distribution for the same number of degrees of freedom ( = 254) is shown for comparison.The red × marks a fit with TLS ≈ 252, which is just above the median value.The normalized residuals for this fit, TLS ( ˆ ), are shown in (b) as a function of time, and in the inset to (a) with a normal probability plot (black dots).The gray dash-dotted line in the inset represents the standard normal distribution.