Source noise suppression in attosecond transient absorption spectroscopy by edge-pixel referencing

Attosecond transient absorption spectroscopy (ATAS) is used to observe photoexcited dynamics with outstanding time resolution. The main experimental challenge of this technique is that high-harmonic generation sources show significant instabilities, resulting in sub-par sensitivity when compared to other techniques. This paper proposes edge-pixel referencing as a means to suppress this noise. Two approaches are introduced: the first is deterministic and uses a correlation analysis, while the second relies on singular value decomposition. Each methods is demonstrated and quantified on a noisy measurement taken on $\text{WS}_2$ and results in a fivefold increase in sensitivity. The combination of the two methods ensures the fidelity of the procedure and can be implemented on live data collection but also on existing datasets. The results show that edge-referencing methods bring the sensitivity of ATAS near the detector noise floor. An implementation of the post-processing code is provided to the reader.


I. INTRODUCTION
Transient absorption experiments are now frequently performed in the extreme ultraviolet (XUV) and X-ray regimes, taking advantage of the element specificity and the sensitivity to local structural and electronic environments offered by these radiations. The earliest of these experiments emerged more than 30 years ago [1,2] and used laser-produced plasmas as X-ray sources. Now, experiments are commonly performed on gas, liquid and solid targets, for instance at large instrument facilities such as synchrotrons [3,4] and free-electron lasers [5]. While these sources offer freely tunable and high flux Xray radiation, they suffer from a probe bandwidth which is intrinsically narrow (<1 eV) compared to absorption edges. This forces experimentalists to scan the central frequency of the probe pulse at a given time delay, resulting in prohibitively long acquisition times. On the other hand, high-order harmonic generation (HHG) sources provide lower flux but extremely wide bandwidth radiation routinely covering >30 eV. This allows to perform attosecond transient absorption spectroscopy (ATAS), in which multiple absorption edges are covered in one laser shot and where time resolutions are on the order of attoseconds [6,7].
While ATAS is becoming an increasingly useful technique to study both molecular [8,9] and solid-state processes [10,11], the sensitivity of hitherto published experiments has been limited. The detection limit of these experiments expressed in a change of optical density is typically larger than ∆OD = 10 −3 , which has to be compared to experiments performed in the visible and mid- * romain.geneaux@cea.fr † hugo.marroux@epfl.ch infrared spectral ranges, where optical density changes of ∆OD = 10 −5 − 10 −6 can be observed [12,13]. This limitation is mainly due to the high non-linearity of the HHG process, which results in strongly correlated spectral noise.
Recently, two experimental approaches were proposed to tackle this issue. Volkov et al. proposed to use the strong correlation between the fluctuations in the intensity of the driving laser and shifts in the XUV spectrum to correct for correlated noise, which improved their detection limit by a factor of two [14]. Two other works implemented a parallel synchronized measurement of a reference spectrum, which allows for normalization of the probe spectrum. This was performed using either a single spectrometer measuring both transmitted and reference XUV beams [15], or in a dual spectrometer geometry [16]. In the latter experiment, sensitivities below 10 mOD were obtained, showing the promises of the approach. However, because this requires additional hardware and reduces the available XUV flux, it is not readily applicable to any experiment.
Here we take a different approach and explore two different post-processing procedures based on edgereferencing [17,18]. The methods use a general noise suppression scheme taking advantage of strong correlations in the probe spectra, which are first characterized on a calibration dataset. We present two complementary methods and demonstrate their efficiency in an attosecond transient reflectivity experiment, measured on a sample of WS 2 . The edge-referenced data obtained using either methods shows a drastic fivefold increase in sensitivity which in turn allows for the observation of features previously hidden by spectral fluctuations. Our approach is significant in four respects: (1) it radically filters out the correlated part of the noise without requiring any new hardware, (2) it does not require sacrificing a part of the XUV flux for reference detection, and is therefore applicable to virtually any ATAS experiments, (3) the combination of the two methods ensures that no artefacts are introduced during the edge-referencing and (4) the noise calibration step can be constructed from information contained in existing datasets, which makes it valuable to analyze already measured data. We provide a code implementing the procedure which can be readily applied to existing datasets.

II. ATTOSECOND TRANSIENT ABSORPTION
AND REFLECTIVITY SPECTROSCOPY

II.1. Formalism
We consider the most frequent ATAS scheme in which the sample is excited by a pump field in the visible/nearinfrared range (NIR), and probed by a delayed XUV pulse. It is customary to measure the spectrum of the transmitted light twice, with and without the pump beam. From the pump-on I 1 and pump-off I 0 measurements the change in optical density is obtained as: where the measured ∆OD was factorized into additive noise, multiplicative noise F , and the signal of interest ∆OD signal . Optical densities are preferred to transmission because ∆OD signal scales linearly with concentration or thickness.
Multiplicative noise is inherently much smaller than the signal itself and will henceforth be neglected [18]. Additive noise disappears provided that I 1 XUV = I 0 XUV , meaning that the XUV fluctuations between the pumpon/off measurements must be negligible and the measurement infinitely precise. It is therefore customary to measure pump-on and pump-off spectra at the highest frequency possible in order to limit the effect of additive noise. Yet, additive noise is never fully mitigated because the XUV probe has limited stability: it is generated via HHG, whose non-linearity amplifies the driving laser intensity, pointing and mode fluctuations [14,19]. For short driving pulses as used here, carrier-envelope phase noise is an additional source of noise. In the following, two procedures are proposed to suppress this XUV noise so that the measurement becomes limited by the readout noise of the detector, similarly to visible and mid-infrared transient absorption.

II.2. Experiment
To demonstrate our method, we use a dataset of attosecond pump-probe measurement obtained in bulk WS 2 . Carrier-envelope phase stabilized few-cycle (5 fs) pulses of visible/near-infrared light are used to photoexcite the sample and to generate the broadband XUV radiation which serves as probe. Using krypton as a generating medium, the XUV spectrum is optimized to overlap with the 4f (31.4/33.6 eV) and 5p (36.8 eV) core-levels of tungsten, as shown on Fig. 1(a). The XUV pulse probes the transitions from these core-levels to the valence and conduction band regions. The sample was produced by plasma-enhanced atomic layer deposition of 40 nm thick tungsten oxide on a silicon wafer capped with 250 nm thick SiO 2 , and subsequently converting the tungsten oxide to WS 2 by reacting the deposited tungsten oxide with H 2 S and Ar as buffer gas at 550 • C [20]. As the sample is opaque to the XUV, a reflectivity [21] rather than an absorption experiment was performed. While dynamics are in this case usually reported in terms of reflectivity changes, we will keep using changes of optical densities, following Equation 1. Both quantities are proportional to each other in the small signal limit, but optical densities are a more common unit to report experimental sensitivities -we therefore keep using it for the sake of generality.
The XUV probe reflected from the sample is dispersed and measured by a 1340×400 pixels CCD detector (PIXIS 400B, Princeton Instruments) cooled to -40 • C. We begin by determining the noise floor of the experiment, which comprises the detector dark noise (due to thermally generated electrons in the CCD), readout noise (arising when measuring the voltage induced by the electronic charge) and the photon shot noise (associated with the random arrival of photons on the sensor). With the camera settings used (500 ms exposure, 2MHz readout speed), the average signal is 1.5 × 10 5 counts/channel, the readout noise is measured to be 79 counts/channel RMS and the dark noise is < 0.1 counts/pixel, which is negligible. The photon shot noise is N photon with N photon the number of detected photons converted from measured counts using the camera quantum efficiency, gain and number of electrons generated as a function of photon energy. These noises are propagated using the optical density equation and shown in Fig. 1(a). We see that readout noise is the major contribution of the noise floor which lies between 200 and 500 µOD in the tungsten core-level region. Fig. 1(b) shows the raw pump-probe scan where each time delay was averaged 40 times. The only clear feature in the raw data is a narrow negative feature at 35.39 eV as well as two weaker signals at slightly lower photon energies. The atomic-like narrow linewidth of the signal and the fact that it starts appearing at -10 fs suggests a coreexcitonic nature [22][23][24]. However, WS 2 has a ∼1.35 eV bandgap [25], and as such carriers will be photoinjected in the conduction band by the pump pulse which has an energy of ω pump = 1.2 − 2.5 eV. As such we are expecting transient signals in the valence and conduction band regions [21,26] but here the experimental noise is too large to distinguish these features. In addition, the tran- sient spectrum displays a very structured noise, which is typical of data taken with poor HHG stability. In the following section we will use edge-pixel referencing methods to retrieve the hidden information.
III. METHODS

III.1. Edge-pixel referencing
The XUV spectrum used here probes changes across 25 eV at once. This large bandwidth is a unique asset of HHG-based sources, but is actually much larger than the region where pump-induced signals are expected. Even considering excitonic effects and non-linear excitation, the pump pulse will modify the XUV spectrum at most a few eVs away from the energies of the core-levels used in the probe step. Hence, a large part of the XUV spectrum does not contain relevant information for the chemical or physical dynamics investigated.
As the spectral noise of the probe pulse is strongly correlated, the intensity fluctuations of the signal-free region contains information on the fluctuations in the spectral region where the signal is located. This means that the regions without signal -hereinafter referred to as edgepixels regions -can be used to remove fluctuations in the regions with signal -now called the signal-pixels region. This idea was recently used in IR spectroscopy by Robben et al. [18] who adapted the procedure originally developed by Feng et al. [17] for dual detector referencing.

III.2. First approach: correlation matrix
In this approach, a correlation matrix is used to map the XUV fluctuations in the edge-pixels onto the signalpixels. We note n the number of pixels in the spectrum, and m the number of pixels chosen as edge-pixels, with m < n. For each pair of pump-on and pump-off spectra, ∆OD measured is obtained. Then, ∆OD edge measured -which is a subset of ∆OD measured -is used to correct the noise in the entire measurement region: Here ∆OD measured and ∆OD edge measured are vectors of dimensions n and m respectively while B is the (n×m) correlation matrix that is measurement specific and needs to be constructed for each experiments. The first step is to compute B, using a calibration dataset which is a series of probe spectra acquired while blocking the pump beam. This series of transient spectra is denoted ∆OD calib and is used to compute the matrix as follows [18]: where denotes the mean and the quotient represents the matrix inversion operation. The B matrix can only be computed when the denominator of Equ. 3, which is the covariance matrix of ∆OD edge calib , has full-rank. Since ∆OD edge calib has size p × m, with p the number of observations, the rank of its covariance matrix is at most p − 1. Therefore it only has full rank for p > m. In other terms, the calibration dataset needs more observations as there are edge-pixels for Equ. 3 to be used. If this condition is met, computing B is extremely efficient.
For experiments with long acquisition times and possibly varying noise structures, this calibration dataset can be updated periodically during the measurement. The transient data of Fig. 1 was taken without such calibration step and as such this approach is unavailable. However, the data was collected by alternating between pump-off and pump-on measurements. We can therefore use all the pump-off spectra as our calibration dataset for building the B matrix. This approach will not be as accurate as an experiment where the B matrix is periodically computed, but it has the advantage of being applicable to any dataset at the post-treatment stage. In our case, pairing the pump-off spectra two by two yields a total of p = 1220 measurements for the matrix ∆OD calib . The next step is to define the edge-and signal-pixels regions in the spectrum. These ranges are the only parameters that must be chosen in the procedure. Figure 2(a-b) shows the correction obtained using the 766 pixels corresponding to energies ranging from 26-32 and 42-51 eV as edge-pixels. The improvement on signal quality is drastic, with most of the source noise disappearing and revealing salient transient signals. The sharp features at 33.17 eV, 33.5 eV and 35.38 eV become prominent which allow to identify their lineshape and confirm that they appear slightly before the pump-probe overlap. Broader features are now also resolved above 36 eV, at energies where valence and conduction band signals are expected.
While the observed features are intriguing and might bring insight into the dynamics of WS 2 , which has never been studied using transient XUV spectroscopy, their analysis goes beyond the scope of this work and will be the subject of a future publication. Nonetheless, the improvement of the data quality demonstrates the strength of the edge-referencing method. Its performances is now assessed quantitatively as a function of both the size of the calibration dataset p and the number of edge-pixels n. The remaining noise level is defined as the root-mean square of the optical densities measured at negative delays (t < −10 fs). Figure 2(c-d) illustrate the evolution of the noise when varying the number of edge-pixels (darker blue shaded areas indicate less edge-pixels) and restricting the number of calibration ∆OD spectra in ∆OD calib . The trend is very similar to the one identified in Ref. [18]: larger edge-pixels regions increase the performance of the method, but require more calibration measurements to reach the lowest noise. Robben et al. suggested that the noise asymptotic limit was attained for p 10×n, meaning that more calibration measurements might further increase the sensitivity of our experiment. An implementation of the procedure, which can be employed to process most ATAS datasets, is available at [27].

III.3. Second approach: singular value decomposition
The correlation matrix method is optimal by definition: given the definition of the edge-pixels region and that enough spectra are collected to build the B matrix, it provides the best noise correction [17]. The only assumption is that no pump-induced signal is present in the edge-pixel region. However, should this assumption fail, it is important to stress that the correlation matrix approach can create artefacts. Going back to Equation 2, we see that any unforeseen pump-induced signal appearing in ∆OD edge measured will be multiplied by B and directly transferred to other spectral regions. For XUV and X-ray spectroscopy the element edge structure restricts signals in defined edge and pre-edge regions but in the case of systems where multiple edges are covered, knowing where signals will appear is difficult, which could make it troublesome to rely on the B matrix correction alone.
For these reasons, we present a variant of the edge- referencing method which is less efficient to correct source noise but can be applied in a more controlled manner. Instead of computing B, we perform a singular-value decomposition (SVD) of the calibration dataset. This yields the singular vectors, together with their respective singular values, which describe the structure of the noise. The number of significant singular values, i.e. values clearly higher than noise-related ones, helps to identify the dominant noise components. At the correction step, only the components corresponding to the N highest singular values are fitted to the edge-pixels by performing a least-squares minimization. Results from the fit are then subtracted to ∆OD measured in order to remove noise in the signal region. Thus, the impact of each singular vectors of the noise can be separated, and their shape can be compared with real features. The results are shown in Figure 3 for N varying from 1 to 50.
The spectra obtained using the SVD correction show a reduction of the structured noise as the number of singular values used increases. With N > 10, similar features as the ones obtained using the correlation matrix methods become more pronounced, confirming that no artefact were induced by the B matrix correction. In the following section a quantitative analysis of the two correction schemes is discussed.

IV. DISCUSSION
The performances of the two edge-referencing approaches are illustrated in Figure 4. In a signal-free region (below all W core-levels), we report the fluctuations of optical density for various analysis approaches. We obtain the probability density functions which allows us to compare the noise present in each analysis methods. The crudest measurement procedure used in the early days of ATAS consisted of measuring the time-dependent transmitted spectrum I 1 with the pump on at each time delay, and to divide each time step by the probe spectrum measured at a negative time delay (<I 0 >). Fig. 4(a) shows that this approach results in a 40 ∆mOD-wide PDF with a bi-nodal distribution. The noise power spectral density (Fig. 4(c)) shows a very clear 1 f Flicker noise at low sampling frequencies. This noise is removed by modulating the pump using a mechanical shutter to alternate between pump-on and pump-off data at each time delay. This yields an approximately two-fold decrease in the width of the PDF and brings the distribution closer to a normal distribution. The power spectrum shows that noise contributions at lower frequencies than the chopping frequency (2 Hz) are removed. However, the correlation between two different signal free pixels ( Fig. 4(b)) shows that the remaining noise is highly correlated between different probe energies.
When edge-referencing methods are applied, the PDFs further narrow to σ < 6 mOD and the noise spectrum is uniformly reduced by 40 dB. The correlation between pixels is removed which leads to an improvement of the baseline in the transient spectra. The final PDFs for both SVD and correlation matrix methods show a ∼4 to 5 fold width reduction compared to the sole chopping method. These plots represent sensitivities without averaging several scans. Since edge-referencing removed all correlated noise, the width of the PDF is expected to diminish as the square root of the number of averages.
Finally going back to the experimental transient spectrum of Figure 1, which was averaged 40 times, the performance of edge-referencing is ultimately quantified by studying the negative time delays (where no signal is present). As shown in Fig. 4(d), the noise in the corrected spectra at negative delays is flat and has an average standard deviation of 787 µOD, which is consistent with a √ 40 improvement on the single shot PDF of Fig. 4(a) and is remarkably close to the camera noise floor.

V. CONCLUSION
We have presented two general noise suppression schemes for ATAS that can be implemented either during data collection or applied at the post-treament stage. The two noise reduction schemes operate by correcting for the variations in source intensity in spectral regions where transient signal is located, using the fluctuations observed in the signal free regions. Using either a correlation matrix or a SVD method, noise originating from fluctuations in the spectrum of the XUV probe were reduced by a factor of five. This brings HHG-based ATAS closer to the detection limit of more established visible or mid-infrared transient absorption experiments.
The correlation matrix scheme (code made available at [27]) is particularly powerful, as it does not depend on any fitting and is therefore completely deterministic. The only two parameters that have to be decided by experimentalists are the spectral regions to be used for the referencing and the number of spectra to collect to accurately calibrate the B matrix. This is a decisive advantage compared to directly applying filters with arbitrary cutoff frequencies on the measured lineshapes.
While the number of spectra to be collected is easily decided by inspecting the improvement of the transient spectrum baseline, properly choosing the reference region can be the source of artefact if a region containing pump-induced signal is selected. In order to check that no artefact were introduced, a second method based on SVD can be used. If the reference region for the B matrix is properly selected the two methods show similar performances, although the B matrix method is significantly faster to implement.
By implementing these methods in ATAS, the measurement quality is not limited by the instabilities of HHG anymore. With enough calibration datapoints, the detection limit will only be set by the detector. This might provide incentives to develop novel XUV detectors optimized for extremely low-readout noise. Not only does this allows us to observe dynamics with greater accuracy but it also relaxes the experimental constrains on flux and averaging times, which might prove key in the study of fragile solid state or even solvated molecules in liquid phase [28,29]. These procedures can be employed in other regions of the XUV and X-ray spectrum and they will be particularly important for new generation sources operating at higher photon energies, whose lower photon fluxes makes accumulating statistics more difficult [30][31][32].
Finally, our approach can be directly applied to other types of broadband X-ray sources. Laser-plasma sources present similar noise characteristics [33], albeit at much higher photon energies. Likewise, recent developments at free-electron laser sources reported spectral bandwidths up to 15 eV wide [34,35]. Edge-referencing should therefore be directly applicable to such ultrashort and highpower X-ray sources and might become crucial in alleviating their inherently large intensity fluctuations.