High-dynamic-range arrival time control for flexible , accurate and precise parametric sub-cycle waveform synthesis

We introduce a simple all-inline variation of a balanced optical cross-correlator (BOC) that allows to measure the arrival time difference (ATD), over the full Nyquist bandwidth, with increased common-mode rejection and long-term stability. An FPGA-based signal processing unit allows for real-time signal normalization and enables locking to any setpoint with an unprecedented accuracy of 0.07 % within an increased ATD range of more than 400 fs, resulting in attosecond resolution locking. The setup precision is verified with an out-of-loop measurement to be less than 80 as residual jitter paving the way for highly demanding applications such as parametric waveform synthesizers. © 2017 Optical Society of America OCIS codes: (320.7160) Ultrafast technology; (140.3425) Laser stabilization; (190.4970) Parametric oscillators and


Introduction
The ability to synchronize multiple events with ever improving precision has allowed human beings to accomplish increasingly complex tasks.An orchestra conductor, for instance, enables synchronization and control of dozens of musicians with tens of milliseconds precision [1], optimizing one of the most universally acclaimed forms of art.Nowadays global navigation satellite systems allow synchronization on the whole surface of the globe with an accuracy of 100 nanoseconds, bringing inestimable benefits in experimental science as much as in everyday life.On the extremely short time scale, mode-locked femtosecond lasers and high-harmonic generation in gases keep pushing the limit of the shortest man-made events, light pulses with duration down to a few tens of attoseconds [2].Short pulses have also pushed the requirements on synchronization systems down to a time scale that cannot be fulfilled by conventional electronics.
Moreover, ultrafast time-resolved spectroscopy, that allows studying the dynamics of lightmatter interactions with picosecond to attosecond resolution, requires different pulses (with central frequencies ranging from less than a THz to a few EHz) to be synchronized within a fraction of their pulse durations.For instance, in large-scale photon-science facilities, such as X-ray free-electron lasers, the need to stabilize the arrival-time difference (ATD) between pulses generated by different sources has pushed the development of large-scale timing distribution systems [3,4] capable of generating an absolute reference with sub-femtosecond resolution (corresponding to hundreds of nm in length) over several km distances [5].If it is not possible or practical to stabilize the ATD, time-tagging techniques have been demonstrated to measure the ATD with sub-fs resolution over a range of hundreds of fs [6], allowing to sort the data according to their ATD in post-processing.Moreover, also a purely data-analytical approach was recently demonstrated that is capable of extracting from noisy pump-probe data acquired with large timing jitter the dynamics of complex systems on times scales much shorter than the timing uncertainty [7].Although such post-processing techniques are unquestionably extremely valuable in many experimental situations, we believe that real synchronization is still far superior over such techniques: this is particularly true for dynamics occurring on extremely short sub-fs to few fs time scales, as one has to discard all snapshots with ATD outside of the time range of interest, thereby unnecessarily wasting expensive beam time on large-scale photon-science facilities.Moreover, some experiments might simply be incompatible with simultaneously applying time-tagging tools due to experimental details (e.g., pump or probe pulses get already strongly absorbed in the main experiment, or multi-shot averaging is necessary), and the approach used in [7] requires anyway a valid pre-characterization of the jitter via some timing technique.
Aiming to propel ultrafast light-matter interactions into uncharted territories by employing pulses with ever increasing energies and shorter durations, laser systems are growing in complexity.Optical waveform synthesizers based on optical parametric (chirped-pulse) amplification, OP(CP)A, are among the most promising candidates for the next generation of ultrashort, highenergy pulse sources [8][9][10].In order to achieve pulse energies in the multi-mJ range (and above) jointly with sub-cycle pulse durations (< 3 fs at 1 µm) and repetition rates in the kHz range, both the pump laser and the OP(CP)A system typically consists of several amplification stages.In parametric waveform synthesizers, the various involved seed, pump and amplified pulses need to be tightly synchronized in time with respect to each other with a required accuracy down to a small fraction of the carrier period.
In 2003, Schibli et al. [11] demonstrated a two-arm balanced optical cross-correlator (BOC), that paved the way for high-precision ATD measurements between different ultrashort laser pulses.Later, an inline, single-crystal implementation of that scheme was designed for timingdistribution systems [12], that takes advantage from ultra-low-noise high-repetition-rate fiber laser oscillators.
Here, we introduce a novel inline BOC scheme, dubbed RAM (relative arrival-time measurement), developed specifically to synchronize high-energy ultrashort pulses at kHz (or lower) repetition rate with high accuracy.This scheme is designed to minimize the influence of beam-pointing and intensity fluctuations/drifts present in ultrabroadband laser sources such as OP(CP)As on the measured ATD.By employing stretched pulses, the scheme even takes advantage of the chirp involved in OP(CP)As, rather than suffering from it.
Moreover the RAM setup allows to extend the high common-mode rejection ratio (CMRR), that is typically only found near the zero-crossing point [13] of the BOC curve, to any point of the RAM curve, allowing for unprecedented precision in the ATD determination over an extended measurement range, and thus more flexible options for waveform synthesis.Last but not least, this scheme could be used to measure two (or more) ATDs between three (or more) pulses in a single RAM device.All those features constitute the first step for even higher precision ATD control of the different outputs of a 3-channel parametric waveform synthesizer [14].

RAM method
The basic idea is sketched in Fig. 1.The red and yellow pulses 1 and 2 have to be synchronized in order to achieve a shot-to-shot stable synthesis (or a stable pump-seed temporal overlap in an OP(CP)A)).The small portions of the two pulses derived from a leakage of the beam combiner (BC) is stretched by a first dispersive element D RAM1 .In particular, we choose D RAM1 to stretch the yellow pulse significantly more than the red one.The nonlinear crystal NLC1, tuned to phase-match the sum frequency between the red pulse and the short-wavelength part of the yellow pulse, generates the first RAM signal S1.Afterwards, a second dispersive element D RAM2 (less dispersive than D RAM1 ) changes the relative delay between the two pulses, and a second nonlinear crystal (NLC2) generates the second RAM signal S2 via sum frequency generation (SFG) of the red pulse with the long-wavelength part of the yellow pulse.Alternatively, instead of SFG also difference-frequency generation (DFG) can be used to obtain the RAM signals.When the ATD between the two pulses changes, either the long-wavelength or the short-wavelength components of the stretched yellow pulse have a larger overlap in time with the red pulse, leading to an increase of the corresponding SFG signal, and a decrease of the other competing SFG signal.
When the ATD is scanned linearly, both S1 and S2 signals exhibit a bell-shape behavior, with the two bells being peaked at different ATD values.The difference between the two signals is then an S-shaped curve, or S-curve, that defines the measurement range.By employing a calibrated delay line to scan the ATD, it is possible to assign every point of the S-curve to an ).An FPGA-based signal processing unit (SPU) performs the normalized difference between PD1 and PD2 (see Eq. ( 9)) to extract the arrival-time difference (ATD), that can then be used to actively stabilize the ATD by moving a piezoelectric actuator (PZT).The main output of the BC will experience the dispersion D main , after which the two pulses can be temporally overlapped and compressed.In order to match the group-delay (GD) difference between the main output and the leakage, a birefringent plate (BP) can be used before D RAM1 .
univocal ATD value.Similar results can be obtained by means of a single broadband nonlinear crystal, as we will discuss in details in a short while, placed after D RAM1 ; in this case the two signals are derived from the broadband (and chirped) SFG via spectral filtering.The two signals S1 and S2 are then directed to a home-made detector that digitizes them.Such a digital signal is then fed to a home-made signal processing unit (SPU), based on a field-programmable gate array (FPGA), that calculates the ATD and uses it to generate the feedback values to be applied to a piezo-actuated delay line in order to stabilize the ATD.
Compared to the conventional two-arm BOC, the new RAM scheme features three major advantages: first of all it is a compact all-inline scheme, that makes it intrinsically more robust towards beam pointing/intensity fluctuations of the laser beam and misalignment of the optical system due to thermal drifts.Second, the simplicity and compactness of the setup makes it suitable for applications that demand multiple ATD measurements.This is the case, for instance, in multi-stage multi-channel parametric waveform synthesizers [9,10], where both pump-seed synchronization in the parametric amplifiers, and synchronization of the different channel outputs require ATD measurements and stabilization.Last but not least, the FPGA-based SPU allows for real-time calculation of the normalized difference (see Eq. ( 9)), allowing to lock to any point of an extended range (> 3 times the range without normalization) with constantly high accuracy.This feature enables the possibility of controlling the delay of the different OP(CP)A channel outputs over a broad range of values, thus enabling a more flexible sculpting of the synthesized waveform E(t).
Furthermore, when it comes to ultrabroadband pulses, the dispersion, that is required in one of the two arms of the conventional BOC in order to flip the ATD between the two pulses, can produce strong changes in the temporal profile of the pulses, with subsequent changes in the SFG response, with respect to ATD variations, from one arm to the other.This leads to a limited common-mode rejection ratio (CMRR) beyond the balanced point, located at the zero crossing.In the RAM setup, on the contrary, the possibility of independently tuning the two SFG signals to different spectral regions of the stretched pulse allows to generate two signals with almost identical responses with respect to ATD, permitting one to achieve a high CMRR all over the range.Furthermore, by tuning the SFG signals to different spectral regions also allows one to optimize the crossing point of the two cross-correlations in order to achieve a steep slope near the zero crossing while simultaneously ensuring a large range of the S-curve.
Moreover, in RAM the second SFG signal does not suffer from the energy depletion of the two laser pulses caused by the first SFG signal generation, since the stretched pulse is being used in two separate spectral regions, and the depletion of the short laser pulse is limited by the small amount of energy in the phase-matching bandwidth of the stretched pulse that overlaps in time with it.
Let us consider the synthesis of two broadband femtosecond pulses having carrier frequencies ω 1 and ω 2 , with ω 1 ω 2 .In the context of a parametric waveform synthesizer, the two pulses in general will not overlap in time and will not be compressed at the beam combiner (BC), since those conditions must be achieved at the experimental point [9,15], after propagating through a number of additional dispersive elements, such as a vacuum chamber window and chirped mirrors for final compression, here represented by D main .
If we assume the requirement of perfectly overlapped and compressed pulses (ATD = 0 s) at the experimental point, i.e., after D main , then the group-delay (GD) difference between ω 1 and ω 2 introduced by D RAM1 must be close to the one introduced by D main , i.e., At the same time, if we assume the two pulses to be fully recompressed after D main , while one of the two being chirped in the RAM setup, we require GDD D RAM1 (ω 1 ) GDD D main (ω 1 ) and GDD D RAM1 (ω 2 ) ≈ GDD D main (ω 2 ). ( Conditions ( 1) and ( 2) can be fulfilled in general by simply employing different materials for D RAM and D main with different thicknesses.If no convenient dispersion arrangement can be found, it is possible to decouple conditions (1) and ( 2) by means of a birefringent crystal (or wedge pair as in [16]), that can introduce a significant GD between the pulses with small intra-pulse dispersion.Once D RAM1 is chosen, in the RAM setup there will be two stretched replicas of the two pulses having similar ATD with respect to the main output.
Let us next consider two different possible implementations of the RAM setup, named I-crystal RAM and II-crystal RAM.The benefits of implementing one or the other will also be discussed.

RAM method: I-crystal
The I-crystal RAM makes use of a single dispersive element, D RAM , and a single broadband nonlinear crystal (NLC) in order to perform SFG between the ω 1 and ω 2 pulses (see Fig. 2).The two signals S1 and S2 required to determine the ATD are generated by spectral filtering the broadband SFG, as we will see later in detail.We now present a simple model that describes the experiment, in order to derive some guidelines to set up a proper ATD measurement.For this task, we need to make the following assumptions: (i) the phase-matching condition of the nonlinear crystal is not significantly limiting the interacting bandwidths of the two pulses in the SFG process; (ii) the nonlinear crystal is not introducing significant dispersion and there is no group-velocity mismatch between the three pulses (the ω 1 pulse, the ω 2 pulse and the SFG pulse).
As a rule of thumb, both (i) and (ii) are fulfilled for optical pulses in thin BBO crystals: for pulses in the 100-fs range one needs to use BBO crystals with ≈ 100 µm thickness, while for Fig. 2. Optical scheme of RAM: (a) The left-hand side of each rows shows the two pulses after being combined with an ATD = T.In the main arm the pulses are compressed and overlapped in time (ATD = 0) after D main .In the I-crystal RAM setup, D RAM stretches the ω 2 pulse such that τ 2RAM τ 1RAM .The two signals S1 and S2 can be obtained by spectral filtering the broad (and chirped) sum frequency generated by NLC.In the II-crystal RAM setup, D RAM1 modifies the durations of the two pulses such that τ 2RAM τ 1RAM , then NLC1, phase-matched for ω 2B + ω 1 , generates the first sum frequency S1.D RAM2 delays slightly one pulse with respect to the other, without affecting significantly the durations.NLC2 generates the second sum frequency S2 of ω 2A + ω 1 .In the top-right corner, the corresponding position on the S-curve is marked in red.(b) Same scheme as in panel (a), when the ATD is reduced by δT: the signal S1 decreases, while signal S2 increases.
pulses in the 10-fs range a thickness of ≈ 10 µm is appropriate.To this end, both type-I and type-II phase-matching crystals were successfully tested and can be used, depending on the specific details of the pulses involved.
Adopting the standard notation for linearly chirped pulses, the electric field of the ω α pulse (α = 1, 2) is given by E α (t) = A α (t)e i (ω α t+ ωα t 2 /2) , where A α (t) is the temporal field envelope, ω α is the carrier frequency of the spectrum, and ωα is the (linear) chirping parameter.We now consider that the pulse at ω 2 has a broader bandwidth (Ω 2 > Ω 1 ) and, due to the chirp, a significantly longer pulse duration (τ 2RAM τ 1RAM ) with respect to the pulse at ω 1 , implying that ω1 > ω2 .This is the typical case for a parametric waveform synthesizer, since the outputs of the different OP(CP)As channels usually have different bandwidths.Moreover, this is also the case for a pump-seed pair of pulses in a broadband OP(CP)A.Under these assumptions the temporal envelope of the SFG cross-correlation has a duration comparable to τ 2RAM .
It is now possible to express the electric field of the SFG signal as function of the ATD T between the two pulses as where, in the last step, we have considered the envelope of the shorter ω 1 pulse as A 1 (t) = A 1 δ(t), with δ(t) being the Dirac delta function.This expression of the SFG electric field allows us to conclude that the instantaneous angular frequency of the SFG pulse is Since in this derivation the SFG pulse is comoving with the reference frame of the ω 1 pulse, we can obtain its central angular frequency as that explicitly shows how the central frequency of the SFG signal is shifted by the ATD T between the two pulses.
We can now generate the two signals S1 and S2 required to determine the ATD between the two pulses via spectral filtering of the broadband and chirped SFG.We can imagine to cut the SFG signal spectrum in halves with respect to its central frequency at T = 0 (i.e., ω 1 + ω 2 ): the low-frequency side, named S1, will be directed to photodetector PD1, while the high-frequency side, named S2, to PD2.The two photodetectors convert the electric field of the SFG into electric charge, and integrate it over the whole SFG pulse duration.This means that for T = 0 we can assume the two signals to be balanced (S1 = S2) and approximately at half of their dynamic range.
We now want to derive a simple model that allows us to predict the ATD measurement range of the RAM with respect to the spectral bandwidth of the two pulses, Ω 1 = ω1 τ 1RAM and Ω 2 = ω2 τ 2RAM , and their pulse durations in the RAM setup, τ 1RAM and τ 2RAM .The RAM range is defined as the ATD ∆T that brings the signal S1 (or S2) from its maximum value to zero.Consequently, by assuming a symmetric behavior, ∆T/2 will bring the signal from half of its dynamic range to zero (or to its maximum, depending on the sign).The condition S1 = 0 will be reached, when the lowest spectral component of the SFG signal is equal to ω 1 + ω 2 , meaning that all the generated components are now in the S2 signal.Assuming ω2 > 0, this can be written as that brings us to the equation This equation is useful to determine how much chirp is required to achieve a desired measurement range.Nevertheless one needs to keep in mind the limits of its validity, that is Ω 2 > Ω 1 and τ 2RAM τ 1RAM .Formula (7) also applies only to the case of splitting the SFG bandwidth in halves.In reality it is possible to obtain S1 and S2 from different kinds of spectral filtering.For instance, by blocking the central part of the SFG spectrum and using the right and the left wings as S1 and S2, it is possible to achieve faster signal variations with respect to the ATD, meaning shorter measurement range but higher time resolution, as will be shown in the next section.
In general, the I-crystal RAM better suits pulses with bell-shaped spectral intensity profiles, since the phase-matching condition is optimum for the center k vectors of the pulses, i.e., ∆ k = k 1 + k 2 − k SFG = 0. On the other hand, the spectra generated by ultrabroadband OP(CP)As (such as degenerate or noncollinear OP(CP)As) typically yield M-shaped amplified spectra, when trying to push the spectral bandwidth to the extreme; in this case, the II-crystal RAM can come in handy.

RAM method: II-crystal
This variation of the RAM setup is almost identical to the I-crystal one, but after D RAM (here renamed D RAM1 ) and NLC (here NLC1), a second dispersive element D RAM2 can be used to adjust the ATD between the two pulses, and a second nonlinear crystal (NLC2) generates a second, differently tuned SFG.As mentioned before, this scheme is particularly suitable for ultrabroadband pulses generated by OP(CP)As exhibiting an M-shaped spectrum.Moreover this configuration opens up the unprecedented opportunity of measuring relative arrival time differences between more than two pulses, with different spectral contents, at once.In this case, the longer, stretched pulse in the RAM setup could be overlapped with two (or more) shorter pulses.The phase-matching conditions of the two crystals would have to permit the SFG (or DFG) between the long and each of the two short pulses, producing two signal pairs in two different spectral regions, easily separable via spectral filters.To this end our detector already features four independent PDs.This application is of particular interest in parametric waveform synthesizers since it would allow one to access all the relative timing information between all the different spectral channels in a single RAM device, therefore avoiding any systematic errors and drifts that may arise by measuring the different ATDs in different positions of the optical setup with different BOC or RAM devices.
In the II-crystal RAM, the two phase-matching conditions of NLC1 and NLC2 can be tuned to be fulfilled for the strongest spectral components of the ω 2 pulse, denoted as ω 2A and ω 2B , i.e., ∆ This way it is possible to avoid the problems that would arise from a limited phase-matching bandwidth and optimize the conversion efficiencies in the two SFG processes.
The two signals S1 and S2 can now be obtained by bandpass filtering, with bandwidth Ω 3 , around ω 1 + ω 2A and ω 1 + ω 2B , respectively.In order to avoid cross-talk between the two signals, it is necessary that (ω 1 + ω 2B ) − (ω 1 + ω 2A ) > Ω 3 , that leads to T AB > (Ω 3 τ 2RAM )/Ω 2 , where T AB is the ATD between the ω 2B and ω 2A components, which is easily fulfilled by choosing Ω 3 Ω 2 .In order to set up the correct working point for the RAM (at the zero crossing of the S-curve), we need to shift the relative ATD between the ω 1 and the ω 2 pulses such that the lowest frequency component of the spectrum of the higher-frequency SFG (the one generated by NLC1, for instance) is coincident with ω 1 + ω 2B , meanwhile the highest frequency component of the spectrum of the lower-frequency SFG (the one generated by NLC2, for instance) is coincident with ω 1 + ω 2A .Such an ATD shift, T shift , can be achieved by the dispersive element D 2RAM .The overall ATD shift can be estimated by observing that, according to Eq. ( 4), the lowest frequency component of the spectrum of the higher-frequency SFG at a time T is that, when set equal to the central frequency of the filter ω 1 + ω 2B , leads to T = 1/2(τ 1RAM + τ 2RAM (Ω 1 /Ω 2 )).The same considerations can be made for the highest frequency component of the spectrum of the lower-frequency SFG, that leads us to conclude that We can now estimate the required thickness of the two dispersive elements D RAM1 and D RAM2 to be where c is the speed of light in vacuum, n g (ω 1 ), n g (ω 2A ) and n g (ω 2B ) are the group refractive indices of the ω 1 pulse and of the ω 2A and ω 2B components in the D RAM1 or D RAM2 dispersive media.
Since it is typically true that |n g (ω 2B ) − n g (ω 1 )| |n g (ω 2B ) − n g (ω 2A )|, we can expect L D RAM2 to be much smaller than L D RAM1 , such that the temporal broadening (or compression) of the pulses due to the propagation in L D RAM2 can be neglected.Finally, we observe that the measurement range for the II-crystal RAM case is ∆T = Ω 3 / ω2 , which shows that a higher time resolution can be achieved by narrow filtering.

RAM method: detection and processing
The optical signals S1 and S2 are converted into the electrical signals A and B by two large-area photodetectors (10 × 10 mm 2 ), to accommodate the whole optical mode in order to prevent errors in the pulse-energy determination due to beam-pointing instabilities.In order to obtain the value of the ATD between the two pulses, we have to perform the difference between the two electrical signals.
In the RAM, differently from other previous BOC implementations, the outputs of the photodetectors are not subtracted analogically via an operational amplifier.Instead, the amount of charge in each PD is determined, on a single-shot and every-shot basis, by means of a homemade detector, that comprises a gated integrator (triggered by the laser) and a high-resolution analog-to-digital converter ADC (effective number of bits, ENOB ≈ 17bits).The digital signals are available for processing within less than 100 µs after the laser pulse signals S1 and S2 arrive at the PDs.
An FPGA-based signal processing unit (SPU) allows to perform operations in real time on the two PDs outputs, such as the generation of a proportional-integral (PI) control signal with custom finite-impulse-response filter (FIR), and to achieve low-latency and high-bandwidth active stabilization with customized frequency response.More importantly, the SPU can calculate the normalized difference (ND) between the two electric signals, defined as which allows to improve the CMRR over the whole measurement range, whereas the unnormalized difference [A(T ) − B(T )], which has commonly been used in BOCs, is exhibiting a good CMRR only at the zero crossing of the S-curve.This feature brings several distinct advantages as discussed in the measurement section.Additionally, this detector has proven to be very useful, especially while optimizing the ATD signals, because each cross-correlation can be individually plotted in real time, allowing to easily optimize its intensity and shape by tuning the optical components of the RAM setup.

ATD measurement using the RAM scheme
We used an exemplary benchmark system (see Fig. 3) to measure the ATD between the pulses from a Ti:sapphire chirped-pulse amplifier (800 nm, 150 fs, 20 mJ, 1 kHz) and pulses from a white-light-seeded visible noncollinear OPA (VIS-NOPA) [17].The NOPA output and a portion of the laser fundamental are overlapped with a dichroic mirror, and split into two beams via a neutral density metallic beam splitter, afterwards each beam is directed to an independent RAM setup.The two RAM setups are virtually identical; the second one serves for out-of-loop validation of the achieved locking performance.Each of the two beams in the RAM consists of ≈ 300 µW of NOPA power and ≈ 3 mW of laser fundamental.The two different pulses of each beam travel through a 20-mm-thick SF10 glass plate, which stretches the VIS-NOPA pulse duration to ≈ 800 fs, while the 800-nm pulses are stretched only from 150 to 160 fs.These pulse parameters fulfill the assumptions underlying Eq. ( 7) on page 7, so we can calculate an expected measurement range of ≈ 400 fs.
The pulses are then loosely focused ( f = 300 mm by a lens, beam waist ≈ 100 µm FWHM) and overlapped in a 100-µm-thick type-I beta-barium-borate (BBO) crystal.The I-crystal implementation of the RAM was chosen for its simplicity in this proof-of-principle experiment.
The sum frequency (centered at 345 nm) emerging from the BBO is spectrally dispersed with a grating (600 grooves/mm, blaze wavelength 407 nm, blaze angle 7 • , aluminum coated) and the two RAM signals (S1 and S2) are obtained by spatial filtering with razor blades.S1 and S2 are directed towards the two photodetectors A1 and B1 with UV-enhanced aluminum pick-up mirrors.
In order to check the SFG dynamics with respect to the ATD, we measured the spectrum of the SFG signal and of the VIS-NOPA, after the RAM, during a triangular scan of one PZT, as shown in Fig. 4. As one would expect from Eq. ( 5) on page 7, the central frequency of the SFG signal is shifted with respect to the ATD between the two pulses.The spectrum of the VIS-NOPA shows a corresponding behavior, where a narrow depleted line moves with the ATD due to the contribution to the generation of the SFG-signal.
The II-crystal configuration was tested as well and delivered almost identical results (green curve, plotted for comparison in Fig. 5), with the only difference that a lower amount of dispersive glass was required (5 mm of SF10 for D RAM1 , 1 mm of SF10 for D RAM2 ), and two narrowband interference filters (340 nm and 350 nm, both 10 nm FWHM) were used to separate the two signals instead of a grating and a spatial filter.
By applying different spectral filters to the different RAM schemes it is possible to modify the shape of the S-curve quite significantly and optimize the setup for measurement range or temporal resolution, as shown in the comparison of the S-curves in Fig. 5.In general, splitting the SFG spectrum in halves and directing the two halves to the two PDs leads to a longer measurement range, while stopping a wide band in the center of the SFG spectrum and directing the two opposite wings to the two PDs leads to a shorter range, but higher resolution (provided that the energy available is sufficient to fill the dynamic range of the PDs).
Once the two signals have been digitized by the ADC, the FPGA-based SPU can be used to perform complex mathematical operations within few tens of µs (for example an FFT with 1024 points can be performed in less than 100 µs).
With the commonly used plain subtraction of the two PD signals (A 1 (T ) − B 1 (T )), a high CMRR can only be achieved if both signals are of similar intensities, which limits the operating range of the BOC close to the zero crossing of the S-curve.However, our SPU performs a real-time calculation of the normalized difference (see Eq. ( 9)).
From one dataset obtained during linear ATD scans (n = 326 scans, 1000 data points per scan), the unnormalized S-curve and the normalized S-curve are calculated, and the rms noise of each ATD value from all S-curves is plotted in Fig. 6 for comparison.This yields the residual jitter function for each ATD position in the RAM range.
The benefits of the normalized difference according to Eq. ( 9) are remarkable.The range of the RAM is increased by a factor of > 3 while the noise level is kept constantly low over the measuring range.
With RAM and the applied normalization it is possible to determine the ATD and to lock to any desired ATD value within the measurement range with an accuracy as low as 400 as rms at any point of the curve, spanning a range of 425 fs, in agreement with our estimate from Eq. ( 7).This results in a 1 : 1000 ratio between residual noise and locking range, that is the best ratio, for a synchronization method based on balanced detection, ever demonstrated to our knowledge.
By tuning the RAM to a higher sensitivity (see red curve in Fig. 5) it is possible to obtain a residual jitter as low as 287 as rms, integrated over the full Nyquist bandwidth (see Fig. 7).To compare those numbers to previously published results [18], which covered only a limited bandwidth, we integrated the noise spectrum from DC (1 Hz) to 30 Hz, resulting in 19 as of residual rms jitter, and from DC to 100 Hz, resulting in 56 as.These values compare favorably to I -c r y s t a l R A M ( l a r g e s p e c t r a l g a p ) I I -c r y s t a l R A M ( i n t e r f e r e n c e f i l t e r s ) I -c r y s t a l R A M ( n a r r o w s p e c t r a l g a p ) n o r m a l i z e d d i f f e r e n c e o u t p u t r e l a t i v e a r r i v a l t i m e d e l a y [ f s ] Fig. 5. Influence of spectral filtering on S-curve.Blue curve: S-curve from I-crystal RAM obtained by splitting the SFG spectrum in two with a narrow spectral gap in between.Red curve: same as blue curve, but for a larger spectral gap.Green curve: S-curve from II-crystal RAM setup with narrowband interference filters.The measurement range and thus the temporal resolution can be tuned simply by applying different spectral filters.All S-curves here were obtained by measuring and averaging over few tens of curves.r a n g e : 1 3 4 f s Fig. 6.ATD rms for the cases of plain subtraction (black) and normalized difference (red) according to Eq. ( 9).Data extracted from n = 326 S-curve scans (hence a higher residual noise than for an active lock).The normalized difference allows for low detection noise over a much wider measurement range.
the previous art, proving the enhanced accuracy of the RAM scheme.In the context of a parametric waveform synthesizer [9,10], the RAM setup can be employed for the stabilization of the pump-seed overlap in the different OP(CP)A stages as well as for the actual synthesis from the different channel outputs.The benefits from the normalization are of particular interest, since in order to custom-tailor the synthesized electric field E(t), it is desirable to control the ATD with high and constant accuracy over a wider range [10].Once dealing with shorter pulses [9], we also expect to achieve higher accuracy (and even lower residual jitter) of the ATD measurement, at the expense of the range.Moreover, the simplicity and compactness of the RAM setup makes it particularly suitable for implementation in a complex system, which requires multiple ATD measurement and control devices.Last not least, as mentioned in the method section, the RAM technique could also be used to synchronize three (or more) synthesizer channels using a single RAM device.

ATD stabilization and out-of-loop validation of locking performance
To evaluate the absolute accuracy of the RAM, two identical I-crystal RAM setups are simultaneously operated, one in an in-loop configuration (IL, signals A2/B2), the other in an out-of-loop configuration (OOL, signals B1/B2), and the corresponding single-shot and every-shot ATD measurements are recorded.Figure 8 shows the individual signals of the two PDs and the corresponding normalized S-curves during a linear ATD scan, both for the IL and for the OOL RAM setups.
Both RAM setups exhibit a similar behavior, as apparent from the virtually identical S-curves in Fig. 8.The normalized difference strongly suppresses the shot-to-shot amplitude noise, still clearly present in the individual PD traces.
The ATD measurement and locking results from the IL RAM have been validated by comparison with the OOL measurement.The timing jitter spectral density and the integrated timing jitter, from a 100-s long measurement (10 5 laser shots), are shown in Fig. 9.
The IL RAM yields an integrated residual jitter of 387 as rms, while the OOL residual jitter is 429 as rms.The subtraction of the IL (A1) and OOL ATD results in a residual noise < 80 as rms noise, when integrated from 0.3 Hz to the Nyquist frequency.This clearly shows the high level of correlation between the two independent IL and OOL RAM measurements, resulting in an unprecedented precision for ATD determination, that is a residual rms(jitter)/range lower  than 1 : 5000.Any noise component at a frequency below 0.3 Hz can be attributed to thermal drifts of the optical table and optomechanic components, that can be compensated with proper environmental isolation [5].We are convinced that the simplicity of the RAM setup allows for such a high degree of reproducibility.Furthermore, the RAM setup has been operated for more than 2 weeks without requiring any realignment.Differently from ATD measurements of laser oscillators operating at repetition rates of tens to hundreds of MHz, few-kHz repetition rate laser amplifiers impose severe demands on the realization of a stabilization system designed to reduce the ATD jitter.In fact, any noise component Fig. 10.An in-loop and an out-of-loop RAM setup measure the ATD between two pulses from our benchmark laser system.A gated integrator determines the energy of the RAM signals separately, an ADC with 24-bit resolution (ENOB ≈ 17bits) digitizes the signals.
The normalized difference as well as a PI controller and a custom finite-impulse-response (FIR) filter are implemented inside an FPGA, which generates a feedback signal to actively stabilize the ATD with piezo actuator PZT2.A monitor attached to the FPGA produces realtime data plots allowing for an intuitive system optimization.A microcontroller provides an interface between the FPGA and a computer via Ethernet for streaming data and controlling all the system parameters.Furthermore, a function generator is implemented to introduce artificial noise on the second piezo actuator PZT1.
whose natural frequency is above the Nyquist frequency is down-sampled producing aliasing [19].Moreover, the higher energy of amplified pulses compared to oscillator ones, forces one to use large-aperture (thus heavy) mirrors in order to transport and delay the beams, a circumstance that severely limits the delay-actuator bandwidth.On the other hand, the low-kHz repetition rate opens the feasibility of performing analysis and feedback calculation with single-shot capability via an FPGA.
Our FPGA-based signal processing unit SPU (see Fig. 10), for instance, allows to generate a custom FIR-filtered PI feedback signal ≈ 200 µs after the RAM signal pulses impinge on the photodetectors.
Real-time operation with low latency is crucially important in order to achieve the highest possible bandwidth for the active stabilization.The sum of the latencies of the measurement, feedback computation and piezo actuator settling time defines the highest possible feedback bandwidth.Due to causality, a measured error from a random noise source can only be fed back to the next pulse, thus limiting the maximum feedback bandwidth to half the Nyquist frequency, i.e., 250 Hz, given the repetition rate of 1 kHz.This can be explained by considering that a certain frequency of the noise can only be damped by the proportional component (P) of a feedback loop, if its effect has a phase shift of less than 90 • with respect to that frequency.For a P feedback, the Nyquist frequency corresponds to a 180 • phase shift, thus in our system the 90 • phase shift occurs at 250 Hz, meaning that, with minimum latency, only frequencies below this value could be damped.The currently used piezo actuator (Physik Instrumente, 753.1CD) allows to move to a desired position in the range of 20 µm within 1.25 ms, with a standard 1-inch mirror attached to it.This results in an effective action of the feedback delayed by one additional pulse, meaning that the theoretical cut-off frequency in our experiment is 125 Hz.
To verify the performance of the active stabilization, we introduced a sweeping sinusoidal ATD spanning from 0.01 to 200 Hz with PZT1, with a corresponding ATD amplitude of 4.7 fs rms (13.3 fs pp ), meanwhile the feedback performed a lock by acting back on PZT2.
Figure 11 shows a comparison between the theoretical damping limit of a PI feedback versus our experimental results.The theoretical damping limit was calculated based on a feedback system with a proportional feedback gain of 0 dB (P gain = 1) and a latency of two pulses.The feedback reduces noise components in the range from 10 mHz (DC) to 80 Hz (−3 dB corner frequency).The excellent theoretical agreement suggests that with a piezo actuator capable of a 30 % shorter settling time, it would be possible to lock up to a −3 dB frequency close to 167 Hz.This would be possible to realize, for instance, by means of two different actuators, one with few µm travel range but faster settling time (< 800 µs), that would take care of the P-component of the feedback, and a second actuator with wider travel range (e.g., a long-range motorized delay stage), that would take care of the slow I-component of the feedback.
If the noise sources of a system, which needs to be locked, are known and are at least partly coherent, an active stabilization system with feedforward or predictive algorithms can be implemented, which can achieve a much higher damping ratio/bandwidth.

Conclusions
In conclusion, we have proposed and experimentally demonstrated an ATD measurement and control method that exhibits a reproducibility of < 80 as (integrated from 0.3 Hz to Nyquist) and allows to lock 150-fs long pulses with a residual integrated jitter of < 400 as rms over a > 400-fs range.The residual jitter is expected to scale to even lower values by employing shorter pulses [9,10].
The simplicity, compactness and the high CMRR of the RAM scheme makes it especially suitable for implementation in parametric waveform synthesizers, where several ATD measurement devices are needed.The FPGA-based signal processing and active stabilization unit is a powerful tool that allows to perform all the signal processing required for sophisticated active stabilization in real-time and at the full repetition rate of kHz laser systems.Moreover, the possibility of synchronizing more than two pulses via a single RAM setup has been envisaged, a characteristic that could allow the final timing control of three (or more) channels of a parametric waveform synthesizer with a single and intrinsically drift-free device.

Funding
We gratefully acknowledge funding from the European Research Council (grant n. 609920) and from the Hamburg Centre for Ultrafast Imaging-Structure, Dynamics and Control of Matter at the Atomic Scale of the Deutsche Forschungsgemeinschaft (grant EXC 1074).

Fig. 3 .
Fig.3.Stabilization of the ATD using the I-crystal RAM scheme: a portion of the pulses generated by a Ti:sapphire laser (150 fs, 800 nm, 20 mJ, 1 kHz) is split and sent to drive a white-light-seeded visible NOPA.The other part is sent to two piezo-actuated mirrors (PZT1 & PZT2), and recombined with the VIS-NOPA output with a dichroic beam combiner (DBC).A leakage from the DBC is sent to the two RAM setups.The first RAM setup is used inside the feedback loop acting back on PZT2 in order to stabilize the ATD.The out-of-loop RAM is used to verify the achieved accuracy of the measurement.All experimental data are acquired by a computer.In order to characterize the transfer function of the active stabilization, PZT1 can be driven by an arbitrary waveform.

Fig. 4 .
Fig. 4. (a) Spectral evolution of the VIS-NOPA due to depletion from the SFG, during a triangular ATD scan.(b) Corresponding spectral evolution of the SFG.The shaded areas indicate the spectral regions that were blocked by the filter, while the integral over frequency of each of the bright areas corresponds to the S1 and S2 signals.

p l a i n s u b t r a c t i o n n o r m a l i z e d d i f f e r e n c e r a n
g e : 4 2 5 f s r e s i d u a l j i t t e r r m s [ a s ] a r r i v a l t i m e d i f f e r e n c e A T D [ f s ] t e r s p e c t r u m j i t t e r s p e c t r a l i n t e n s i t y [ e g r a t e d j i t t e r i n t e g r a t e d j i t t e r [ a s ] e d i f f e r e n c e [ f s ] t i m e d i f f e r e n c e [ f s ] t i m e [ s ]

Fig. 7 .
Fig. 7. (a) Single-shot ATD measurement of an open/closed active stabilization loop over 50 seconds.(b) Zoomed-in portion of the ATD trace with 287 as residual jitter rms.(c) Jitter spectral density and integrated jitter with activated stabilization loop.

Fig. 8 .
Fig. 8.The outputs of the individual detectors of the IL and OOL RAM setups (IL: PD1/PD2 (A1/B1) on top red/blue, OOL: PD3/PD4 (A2/B2) on bottom red/blue with inverted sign) are plotted versus ATD (left vertical scale, n = 326 scans superimposed).The two corresponding normalized S-curves are plotted in green and violet (right vertical scale, the sign of the violet S-curve has been inverted for sake of clarity).

Fig. 9 .
Fig.9.ATD spectral density of the IL RAM (grey) and of the difference between the IL RAM and OOL RAM (dark grey).Integrated ATD rms jitter of the IL and OOL RAM (red, blue) and of the difference between IL and OOL (green).Each jitter spectrum is integrated both starting from DC and starting from Nyquist frequency (integration direction is indicated by colored arrows).
u c e d j i t t e r r m s r e s i d u a l j i t t e r r m s ( w i t h l o c k a c t i v a t e d ) t e m p o r a l j i t t e r r m s [ f s ] f r e q u e n c y [ H z ] o r e t i c a l d a m p i n g l i m i t f o r P I m e a s u r e d d a m p i n g o f l o c k d a m p i n g o f l o c k [ d B ]

Fig. 11 .
Fig.11.Characterization of the frequency response of the RAM setup with and without an active PI lock: a sinusoidal 13.3 fs (4.7 fs rms ) peak-to-peak ATD is introduced with PZT1, while PZT2 is used in the feedback loop to lock the ATD.The frequency-dependent damping of such active locking exhibits a corner frequency of 80 Hz (−3 dB point), when optimized for high feedback bandwidth.The performance of the PI lock is compared with the theoretically achievable performance calculated considering a 0 dB proportional component and a latency of two laser pulses (2 ms).
Scheme of the RAM setup: two laser pulses (1 and 2), whose spectra are centered at different wavelengths, are overlapped with the beam combiner BC.A leakage from the BC is directed to the RAM setup.The dispersive element D RAM1 is used to apply a significantly larger chirp on pulse 2 with respect to pulse 1. Signal S1 is generated in the nonlinear crystal NLC1 by SFG or DFG.A second dispersive element D RAM2 is used to adjust the time relation between the two pulses.A second signal S2 is generated in NLC2 at a wavelength different from S1.The two signals are separated spectrally and directed to two photodetectors (PD1 & PD2