A New Beam Synchronous Processing Architecture with a Fixed Frequency Processing Clock . Application to Transient Beam Loading Compensation in the CERN SPS machine

New projects being implemented or planned at CERN, such as the LHC Injectors Upgrade (LIU), the High-Luminosity LHC (HL-LHC) or the Future Circular Collider (FCC), motivate the change of several architectural paradigms in the Low Level RF systems (LLRF). In the upgraded Super Proton Synchrotron (SPS) LLRF, the distribution of RF, the revolution frequency, settings and synchronization timings will rely on a deterministic link, the White Rabbit (WR), exploiting a distributed topology. The digital electronics (FPGA) will now use a fixed frequency clock extracted locally in each node from this data stream. The paper presents a new solution for algorithms that treat beam-induced perturbations as they must solve the challenge to tune their processing to the instantaneous revolution frequency, Beam Synchronous Processing (BSP). We use dynamic resampling of uniform sampled data as tuning element between a sweeping spectrum and the static processing implemented with fixed frequency clocks. The paper applies the novel Beam Synchronous Processing method to transient beam loading compensation. It presents a One Turn delay FeedBack (OTFB) for the SPS based on a cascade of two variable ratio resamplers.


Architectural change of paradigm
The introduction of this new uTCA standard brings some architectural paradigm changes. These aim to exploit all the features offered by state-of-the-art digital electronics. The classic master-slave architecture [7,8] used in timing/synchronization and RF reference clock/phase distribution is now replaced by a distributed network topology. The reference clock and instantaneous value of the RF frequency are transmitted as a numerical word, using a deterministic network, the White Rabbit (WR) [9], and no longer as point-to-point analog or optical signals. The WR project [10] is a collaboration between several laboratories and universities, including CERN, where many groups are active in its development [11] and applications [12]. The main characteristic of this Ethernet-based network is its full determinism, which enables general purpose data transfer and sub-nanosecond precision. Similar architectures are already in use in the accelerator world, for instance the Brookhaven National Laboratory (BNL) uses its own deterministic protocol called the Update Link [13], and GSI Helmholtzzentrum für Schwerionenforschung (GSI) uses WR [14]. In our architecture, the receiving slave nodes (cavities, injectors, beam instrumentation, kickers and dampers) get the beam information (RF frequency, cavity amplitude and phase, etc.) from this digital network and extract a 125 MHz reference clock signal from the data stream. The reference clock is now decoupled from the RF and is fixed in frequency. It is also possible for any node to distribute to the network, in an absolute time, data acquired locally. All the information in the network is synchronized among all the nodes thanks to the determinism granted by the WR network, and coherent actions can be performed uniformly at machine level. This makes synchronized synthesis of the RF signals for acceleration in distant RF cavities easy [15]. The use of these new paradigms and common platform makes it possible to join the efforts within different groups in one laboratory, and in different laboratories, for the development of similar hardware and systems [16,17,18].

Applications
The LLRF algorithms aim at controlling the beam longitudinal and transverse positions, and at mitigating instabilities [19]. New complex schemes can now be supported with the new platform and topology. The beam dynamics are affected by two families of perturbations: The first is caused by the particles interacting with their surroundings (beam loading in the longitudinal plane). The second is due to hardware imperfections unrelated to the beam (RF noise, TX gain and phase droop, etc.). The latter perturbations appear in the spectrum at fixed frequency bands. The beam-induced perturbations (such as transient beam loading) will appear in the spectrum at the revolution frequency and its multiples. In hadron synchrotrons, the speed of the particles changes significantly during the ramp, and so does the RF frequency. In this case, algorithms treating beam-induced perturbations need to tune their processing to the beam energy. We call this Beam Synchronous Processing (BSP).
Until now, CERN LLRF has used the master-slave architecture, clocking the digital electronics with a reference synchronous with the RF [20]. This sweeping clock locked the processing to the beam by design. A classic example of this method is the Transient Beam Loading Compensation by means of the One Turn delay FeedBack algorithm (OTFB) installed in the SPS in the mid-1980s [3]. This algorithm implements a feedback loop filtering the revolution frequency harmonics and an exact one turn delay in the processing path. The variable sampling period resulting from the swept clock automatically matches the frequency response of the filter and the one turn delay to the beam velocity.
With the new fixed frequency clock, a novel solution is required for transient beam loading compensation (and in general for BSP) which reconfigures the processing as a function of the beam energy. This is the object of this paper, a new solution for BSP using a fixed frequency clocking scheme. It presents an architectural solution that automatically matches the processing response to the beam parameters (as the swept clock did in the original solution) and avoids the real-time reconfiguration of the processing elements (solution employed in other engineering fields for variable frequency responses). The solution is based on a resampling sandwich in which the BSP is encapsulated. This architecture is targeted at FPGA implementation. The new OTFB has been considered for prototyping. The solution has been implemented in a Xilinx Kintex-7 FPGA as proof of concept.
The paper is structured as follows: Section 2 presents the state of the art for BSP focusing on transient beam loading compensation. Section 3 presents the new proposed solution. Section 4 describes the implementation of the new OTFB and presents a proof of principle of our proposal. Section 5 presents some hardware results of the proposed architecture, and section 6 comments on the results.

Problem analysis
Beam loading results from the interaction of the beam current with the cavity impedance [21]. The current created by each bunch is its longitudinal profile in the time domain that can be observed with a wideband pick-up for example. Mathematically the single-bunch signal can be modeled as a train of short longitudinal profiles, spaced by the revolution period, as the bunch crosses the pick-up at each turn. The LLRF systems are nowadays digital signal processors and FPGAs, which require digitization of the analog pick-up signal. When sampling a signal at frequency F [Hz] with a clock at fs [Hz] we obtain a discrete representation of the analog signal in a normalized digital angular frequency W [radian/sample] Assuming a periodic signal with slowly varying revolution frequency, the normalized frequency changes if the sampling frequency is kept constant. As a consequence, the processing needs to be reconfigured to track the frequency sweep, and to match the bunch passage in the time domain.
We call Beam Synchronous Processing a processing that adapts the system response to the variations of the processed signal during the energy ramp. These processing techniques are generally used in feedback systems, such as transverse dampers [22] or beam loading compensation [20].

Current solutions
The accelerator world employs several solutions to cope with transient beam loading effects. In the early digital systems introduced in the mid-1980s the sampling clock was swept proportionally to the revolution frequency [20].
This locks the processing on the spectral content of the beam signal and the processing algorithm needs not be changed during the acceleration. This swept clock philosophy has been extensively used in the accelerator world, at CERN and other labs [23]. However, this easy solution is not optimal. In case the processing is not related to the beam energy ramp, for example to compensate an amplifier frequency response, the processing should not change with beam energy. In that case Beam Asynchronous Processing (BAP) would be preferred. If the system uses such a swept clock this requires complicated implementations with limitations [24]. Furthermore, modern FPGAs are intended for use with a fixed clock, as swept clocks pose problems in FPGA clocking logic and PLLs.
This also limits the use of its serial interfaces. The old swept clock scheme would therefore limit the exploitation of the new CERN distributed LLRF architecture, and the state of the art uTCA based processing systems.
Alternative approaches have been implemented: In small machines (high revolution frequency and small number of bunches) with the spectral content of the beam signal limited to a small number of revolution frequency lines, a common solution is to decompose the problem in different processing systems, one per revolution line, where a fixed sampling clock can be used. This requires multiple demodulators (one per revolution harmonic) for baseband down-conversion and several processing systems [25]. The amount of resources grows linearly with the number of revolution lines to be treated. This solution solves the constraint of the swept clock, as it can be implemented with a fixed frequency clock. However, the resources required when extending the regulation bandwidth limits its applicability. Examples of this strategy are found at CERN [26] and JPARC [27]. But it is not applicable to larger machines where many revolution lines are to be covered.
A generic solution is therefore desirable which can be extended to small or bigger machines with different regulation bandwidths and making use of a fixed clock. This paper presents a novel LLRF architecture with a new solution for efficient implementation of BSP. It demonstrates this solution for transient beam loading compensation. The solution is generic and portable to different machines with different sizes and parameters, and can be extended to other applications requiring the synchronous processing of beam signals, such as longitudinal and transverse dampers.
3 Proposed LLRF distributed topology. Beam synchronous processing application to transient beam loading compensation

New LLRF distributed architecture
This section presents the architecture of the new SPS LLRF system. It is based on the exploitation of the White-Rabbit link.

Network architecture: Clocks, synchronization and data distribution
The proposed solution for BSP relies on new distributed data-broadcasting and synchronization architectures. These architectures use a digital deterministic protocol for communication and information distribution. These protocols ensure known propagation delays between nodes, easing synchronization schemes, by calibrating the propagation time of the different paths at initialization of the system. Thanks to this, it is possible to ensure time precision and accuracy among all the nodes of the network. By exploiting this timing determinism, the information and reference signals can arrive in the same absolute time instant to all the nodes [15]. Similar approaches are implemented at GSI [14] and BNL [28], however they use a separate dedicated link for transmission of the fixed clock. In our solution, the fixed frequency system clock is extracted from the WR data stream. To avoid phase uncertainties resulting from the clock recovery circuitry, our system uses a "Start cycle" pulse that all stations receive in sync. This pulse is used to reset the clock managers and the phase accumulator of the Numerically Controlled Oscillators (NCO) so that the RF reconstructed at each station remains in sync. Efforts are ongoing to reduce the jitter of the recovered clock. Last prototypes achieve 5 ps jitter, that corresponds to 0.36° at an RF frequency of 200 MHz [12,29]. Thanks to the synchronized absolute time reference among the nodes and the fixed frequency clock (with a common frequency to all nodes) the reproduction and local computation of any signal or phase advance at the same instant is possible in the entire system. For RF regeneration applications or RF related processing, one node distributes the Frequency Tuning Word (FTW, the instantaneous RF frequency), which later is used by the entire network synchronously. Each node can then locally regenerate any analog signal with a NCO, for instance the RF or Local Oscillators (LO) used in RF front-ends. Data processing synchronized to the beam is also possible. The new CERN SPS LLRF presented in [12] is based on this philosophy.

Node architecture: Platform, interfaces, and processing
The nodes of the LLRF network host a hardware processing platform within an FPGA, and application specific front-ends. The network interface is a common port receiving information and synchronization from the WR network. In case the node also produces information to be broadcasted to the network or transmitted to another specific node, the network interface is bidirectional. The hardware in this interface is responsible for the implementation of the deterministic network protocol and the recovery of a 125 MHz clock. A simplified schematic is depicted in Fig. 1

Resampling as solution to avoid processing reconfiguration
With the fixed frequency clock there is still the need for a solution to avoid the reconfiguration of the processing elements (filtering for instance) to the beam revolution frequency (and to the spectrum of the sampled signal). The real time reconfiguration in complex processing schemes or algorithms can require plenty of parameters to change as a function of the beam frequency. We propose to use resampling of the digital signal, while keeping the sampling clock frequency fixed to avoid this burden in processing algorithms. Resampling is a discrete operation transforming a sequence x[n] acquired at a given sampling rate fs, into a new sequence y [m], which approximates the acquisition of the original time-continuous signal at another rate f's. By exploiting resampling on the acquired data, the spectral content can be mapped to a frequency response where the processing algorithms is defined, still using a fixed frequency clock for the sampling and processing. The resamplers used in BSP require the capability of using a real time variable resampling ratio derived from the beam revolution frequency (FTW).

Processing regions: Beam Synchronous and Asynchronous Processing
LLRF applications require usually BSP and BAP algorithms on a single platform. The presented work addresses these needs, offering flexibility to implement both types in the processing inside the FPGA. The objective is to integrate in a single platform as many functionalities as possible, such as compensation for the amplifier or pick-up response (BAP) in which no beam information is needed, or BSP algorithms as the transient beam loading compensation presented previously. In our solution, the processing platform running with a fixed clock is a global BAP region, while an island is defined between resamplers hosting the BSP region. The synchronous processing is conducted in the dedicated resampler-sandwich region of the platform. Two resamplers are the key elements interfacing this region. The input resampler performs the conversion of the fixed sampling rate, at which the data arrives into this BSP region, to a new sampling period proportional to the beam revolution period (as the swept clock does in the old system). At the output port, a second reciprocal resampler brings the signal back to the original fixed rate. The resampling ratios of the resamplers are reciprocal and vary dynamically during the acceleration ramp. These FPGA regions are depicted in Fig. 2, together with the network and application specific interfaces introduced in the previous point.

Real time FPGA resampler
The resampler is the element relating the BAP and BSP regions in the FPGA. A new solution has been developed and implemented in an FPGA [30], where the resampling ratio can take any real value and  [31] used for interpolation. Such an architecture is suited for variable resampling ratios that can be modified online. The principle relies on filtering the input samples with an all pass filter to produce output samples that are estimates of the original signal sampled at the desired output rate. The VFD uses the FTW information received from the WR to calculate the time-position of the desired sample. It then estimates the value of the signal delayed by a fraction of a sampling clock. Refer to [30] for more details.

RF processing
The LLRF control algorithms perform the processing in baseband. In the lower radio-frequency range the RF signals can be sampled directly. Above few hundreds MHz, demodulation is used first, preferably in I/Q coordinates where beam loading is an issue: A narrowband RF signal is demodulated with a Local Oscillator (LO), resulting in two components (in phase and in quadrature) whose bandwidth is much smaller than the Nyquist rate of the digital processing. An output stage with a complementary modulator brings the signal back from baseband to the required RF frequency. In our case, a varying LO frequency is required to bring the sweeping RF to baseband. This LO is reconstructed locally in the processing nodes with the clock and information extracted from the deterministic link (FTW). Demodulation can be implemented by means of an analog RF front-end (the acquired signal is mixed with a real analog swept LO), or digitally by means of digital mixer (CORDIC) driven from a Numerically Controlled Oscillator (NCO). We use both solutions in the SPS 200 MHz system.

Cavity controller
A cavity controller is responsible for regulating the cavity field [12]. It probes the cavity voltage using one or several antenna(s) that couple(s) to the accelerating field, processes the signal and generates the drive sent to the amplifier. This node receives the FTW, the voltage set point and other machine data from the beam controller via the WR link (Fig. 1). The problem is well suited for implementation in the proposed platform where BAP and BSP is supported. Thanks to the distributed philosophy, the cavities can be very distant in different places around the machine, controlled by different cavity controllers. Although this is not relevant to the SPS (with all cavities in the same machine straight section), it can be very attractive for the Future CERN Collider (FCC) with a one-hundred kilometer circumference and cavities in two opposite locations [32].

One Turn FeedBack
The OTFB is a feedback around the cavity-amplifier in which the loop delay has been intentionally extended to one exact turn, and the gain is limited to narrow frequency bands around the revolution harmonics [3]. It reduces the beam loading, including the transients caused by the gaps in beam current, thereby equalizing the bunch parameters (length) and increasing the longitudinal coupledbunch instability threshold. First introduced in the early 1980s for the SPS [20] it has since been installed on many machines, sometimes as a complement to a Direct RF feedback [4,33]. Two main elements are present in the processing performed in an OTFB: A filter matching the revolution frequency harmonics, and a delay element to properly match the corrective action in the cavity RF field with the next passage of the same beam portion.
The one-turn delay is implemented in the BAP region of the processing platform. Any delay can be efficiently synthesized with a fixed clock using the FTW information. The filter is implemented in the BSP region of the processing architecture. It tracks the beam energy ramping which results in a change in position and spacing of the revolution harmonics in the voltage perturbation caused by the beam. Thanks to the resampler sandwich, the BSP region automatically tunes the filter response to the beam. Fig. 3 depicts the different elements of the system.

One turn delay matching
The algorithm must implement an exact one-turn delay between measurement and corrective action. The total latency of the LLRF-Amplifier-Cavity system consists of two parts: A fixed delay (introduced by the cabling, amplifiers, LLRF fixed latency) and a second variable component synthesized by the LLRF to match the variable revolution frequency. In the original system this was easily done using a FIFO memory of depth M, clocked with the swept clock (harmonic of the revolution frequency). A swept clock multiple of the revolution frequency makes the delay M a constant integer. The fixed-depth FIFO implemented the variable delay, and the fixed delay was compensated by inserting a delay between the LO used in the demodulator and modulator mixers [3].
In the fixed clock solution presented here this is no longer applicable. A dual port memory is used but now clocked with a fixed frequency signal, in the BAP region. The one turn delay is achieved by updating read and write pointer offset dynamically according to the revolution period (considering the known fixed delay). The revolution frequency information is received in the node via the WR. To compute this offset, the FPGA implements an algorithm that divides the current variable delay needed by the FPGA processing clock period. The resulting value contains an integer part and a fractional part, the integer part is synthesizable with the memory achieving a time accuracy of a clock cycle (maximum error of half a clock cycle, i.e., 0.5 · 8 ns at 125 MHz). The fractional part is synthesized with a VFD filter placed behind the memory, with an architecture similar to the one used in the resampler [30]. This VFD filters the corrective signal and modifies its value recreating the fractional delay contribution that adopts a value between -0.5 and 0.5 clock cycle.

Comb filter sandwich
The frequency response of the filter in the OTFB is a comb, with large gain on the revolution frequency harmonics (Fig.4). The z-transform of its simplest form is shown in Eq. (2). In the old system, the implementation was very easy, with the ratio of sampling clock to revolution frequency being the integer number N [3].
The parameter a in Eq. (2) governs the bandwidth of the filter around each revolution frequency harmonic, and G is the gain of the filter on the resonances. The filter has zero phase shift on the revolution frequency harmonics. Fig. 4 depicts the magnitude of the filter frequency response for a unit gain. In the proposed architecture the filter is implemented in the BSP region after resampling to a clock f's that is a multiple of the revolution frequency. The delay N is again an integer number of clock cycles. The data stream to the filter has a variable sampling rate, as in the swept clock solution. The system reproduces the old behavior with a fixed filter (N does not need to be updated), but the implementation uses a fixed clock.

Resampling ratio computation
In this fixed clock implementation with a resampler, the parameter N must be chosen in conjunction with the resampling parameters. The resampler modifies the sampling period of the input signal x[n] to synthetize an output signal y [m]. The resampling ratio R is defined as the ratio of output to input sampling frequency: The resampling tunes the discrete representation of the acquired signal x[n] to the filter response defined in the resampled domain. There, the filter in Eq. (2) ( ) N must be chosen depending on the filtering bandwidth, that is, it must be at least twice the number of revolution harmonics to be filtered (see section 4). fs is the fixed input sampling clock. As the acceleration proceeds, the revolution frequency increases and so does the resampling ratio. At the output of the BSP region, a second resampler performs downsampling. The downsampling ratio is the inverse value of the upsampling. The chain consists of the following parts: The cavity field measured with a series of antennas is entering to an RF front-end, down-converted to baseband and passed to the LLRF. There the signal is compared to a set-point value and the resulting error is filtered and delayed (OTFB) so that the resulting correction is applied in the cavity at the next bunch passage. When the system behaves as expected, in closed loop, the field in the cavity shows rejection of the beam-induced voltage, and the total voltage approximates the set-point. The OTFB is implemented with a fixed frequency processing clock (the 125 MHz clock regenerated from the White Rabbit data stream) by means of the new presented BSP architecture based on resampling [30].

Testbench architecture
The test-bench is presented in Fig. 5. The blocks directly synthetizing the BSP region and the OTFB delay are implemented in Xilinx System Generator hardware primitives. The remaining elements used for validation of the OTFB have been modeled in Simulink. In the development of the testbench, all the accelerator plant blocks have been validated with MATLAB simulations first. Complex elements such as the cavity, amplifier and filters have been studied and modeled as single blocks and later integrated into the system level model. The transfer functions of these elements have been validated and the results have shown that the models behave as expected. The BSP region has also first been modeled in MATLAB, to assess the functional behavior of the blocks. Later, after deep understanding of the behavior of all the elements, Simulink was used for architectural exploration and verification. The blocks and the signal processing architecture were then migrated and verified in Xilinx System Generator. Finally, the BSP architecture has been integrated into the system level simulation test bench reproducing the transient beam loading compensation with the new OTFB. The system level simulations are presented here.

SPS parameters
The model of the accelerator plant mimics and emulates the new CERN 200 MHz SPS system [1]. The regulation must cover a 5 MHz single-sided bandwidth around the center frequency of the cavity, covering around 116 revolution frequency lines [12]. The simulation can be programmed to reproduce a given accelerating cycle. The central frequency of the cavity is set to 200.242 MHz and the maximum and minimum values of the RF within a cycle are parameters of the simulation.
In the simulation presented below, the RF value ramps from 200.242 MHz to 200.342 MHz. The simulation emulates a ramping time of 92 ms. This is much faster than any SPS ramp but the limit is caused by the computation load. The ramping is linear with steps of 25 Hz per turn, which gives a dF/dt of 1.08 MHz/s, a value more than seven times faster than the maximum SPS ramping rate of 142 kHz/s.
The accelerating structure can be configured to match any SPS cavity configuration in number of sections and cells. The present simulation uses a single four-section cavity.
The simulation calls for 1 MV set-point in the cavity. The RF component of the beam current is set at 1.14 A so that the beam-induced voltage also equals 1 MV. For simplicity, the stable phase is set to 0 degrees (synchrotron convention), this being a non realistic value for acceleration (above transition) but easing the simulations. The parameters are in accordance with the specifications after the upgrade [1] (total voltage of 6 MV at injection, set point of 1 MV per cavity), but the RF component of the beam current will peak at 2 A in the HL-LHC era.
The loop delay including cables, amplifier, cavity and LLRF is similar to the real conditions. According to the RF values presented above and given the use of a harmonic number equal to 4620, the revolution frequency of the simulated beam ramps from 43.342 kHz to 43.364 kHz.

Clocking
The clock extracted from the WR link is a 125 MHz signal. This clock is used in ADCs and DACs for acquisition and regeneration of the analog signals (cavity antenna and amplifier drive signal). The processing clock of the FPGA is also this 125 MHz signal. The sampled data stream of the cavity field, after conditioning in the RF front-end, arrives at a rate of 62.5 Msps at the input of the OTFB.

Comb filter and regulation
The OTFB is implemented as a baseband Cartesian (I and Q) RF feedback around the cavity. The filter used to track the revolution frequency harmonics is the comb filter presented in Eq. (2). The filter is implemented and encapsulated in the resampling architecture presented in [30]. As explained in section 3.4.5 the integer number N must be larger than twice the number of revolution frequency lines to be covered (N>232). In any resampler the error grows significantly when the input signal frequency approaches the Nyquist rate. In the test N is chosen to be 1442 so that the frequency band of interest (5 MHz) extends to about one sixth of the Nyquist rate (31.25 MHz). The resampling ratio is calculated according to Eq. (7). This gives an upsampling ratio at injection for the input resampler of R = 1 and at extraction of R = 1.000499. The downsampler uses the inverse values.
Note that in the absence of resampling the maximum frequency error (offset between the 116 th filter resonance and the 116 th harmonic of the revolution frequency), at the end of the ramp would be (8) resulting in the beam spectrum falling outside the filter resonances for a narrow bandwidth of the comb peaks.
The parameter a governs the filter bandwidth around each resonance in Eq. (2). It is set to 15/16 which gives around 30 dB peak to valley magnitude (Fig. 4). The gain G is set to 10.
As we count on the OTFB to regulate the voltage set point, the static field in the cavity will reach the value (9) A feedforward could be added for improved static precision in the operational system.

VFD in the resampler
The VFD set the spectral performance of the resampler. It is presented in details in ref [30]. The design required a maximum modulus square error of 10 -9 , in the used 5 MHz passband. This results in a design with six filters containing fifteen taps each [30].

SPS Travelling Wave Cavity
The effective voltage V seen by the beam upon one traversal of a n-cell travelling wave cavity, Fig. 6, is the sum of the contributions from all cells. Similarly, the beam-induced voltage must be summed on all cells. The result is not trivial as it strongly depends on the phase slip between the beam and the accelerating wave travelling in the cavity [2].
Let Ig and Ib be the generator current and beam current respectively, the effective voltage for excitation at frequency f (Hz) is given by (10) with Zg the impedance for the generator induced voltage and Zb the impedance for the beam induced voltage. These two impedances are not proportional for a Travelling Wave cavity. Refer to [34,35] for more details on the cavity model.

Amplifier (generator)
The amplifier block models the response of one of the tetrode amplifiers in baseband around its center frequency. It is implemented with a Butterworth filter having single-sided passband and stopband edge frequencies at 1.5 MHz and 4 MHz respectively. The attenuation is 3 dB at the passband edge and 15 dB at the start of stopband.

HCAV filter
The 5 MHz regulation bandwidth covers several zeros of the cavity frequency response [34,35]. The first side lobes of this response have a phase shift of 180 degrees and for stability the feedback frequency response must also change sign at these frequencies. A filter HCAV is included in series with the OTFB, which adds 180 degree extra phase shift to these lobes. HCAV is implemented in baseband, with the same response as Zg to properly match the cavity zeroes.

Beam
The beam is modeled as a DC current lasting for part of the turn (the populated buckets). This current is modulated at the RF frequency and injected into the cavity model (beam impedance). The simulation is performed for zero-degree stable phase (synchrotron convention). The cavity voltage and RF component of beam current are therefore in quadrature. As the feedback is implemented in I/Q coordinates, performances in term of beam loading compensation do not depend much on the stable phase. This beam model ignores longitudinal beam dynamics and is therefore valid in static conditions only. It is not valid during the injection transient caused by phase, momentum and bucket mismatch between the PS (injector) and SPS buckets. The current intensity is adjusted to 1.14 A so that the beam-induced voltage is 1 MV, equal to the generator-driven voltage. See section 4.3.

Open-loop response of the regulation
The magnitude of the open loop transfer function of the model is presented in Fig. 7, after phase alignment of the system. The model is excited with an impulse (delta) injected in the I channel only. The open loop response (I and Q channels) is measured. Then the transfer function of the model is computed with the Fourier transform of the response. This response is obtained with an RF frequency at the cavity center frequency (the plot depicts the I to I+jQ open-loop response with a sampling frequency of 62.5 Msps). The zeroes of the cavity are located at 1.6 MHz and its multiples as shown on Fig. 7(a). Fig. 7(b) enlarges the low frequency part of the response to show the comb filter peaks at the revolution frequency harmonics. The spacing is 43.342 kHz. =

Cavity voltages and transient beam loading compensation
In Fig. 8 the feedback is closed. We see the cavity voltage and the update ratio of the RF frequency ramp once per revolution period. The RF value ramps from 200.242 MHz (cavity center frequency) to 200.342 MHz in the 92 ms simulation time, the figure is zoomed around simulation time 64 ms. The beam spans one third of the ring (6.6 µs). The gaps in the signal correspond to the empty buckets spanning the remaining two thirds of the ring. When the beam crosses the structure the voltage in the cavity is perturbed by the beam loading; The steps of the linear frequency ramp are shown in Fig.  8(a), the real component of the cavity voltage is presented in Fig. 8(b) and the imaginary component in Fig. 8(c). The cavity filling time is 620 ns, for a revolution period around 23 µs. Therefore, the full voltage is induced at each turn and does not accumulate over the turns. The voltage in the cavity is depicted in Fig. 9 for the first 1.1 ms. The left column depicts the real component (I) and the right column the imaginary (Q). Fig. 9(a) shows the beam-induced voltage. The beam is injected after 0.48 ms, inducing -1.061 MV in the reactive Q channel.  Fig. 9(b) shows the voltage in the cavity driven by the generator. The set point is set to 1 MV and the regulation is closed after 0.07 ms. The voltage magnitude reaches a stable value equal to 0.909 MV after 0.2 ms. This value is consistent with Eq. (9). The different steps correspond to the response of the OTFB turn after turn. After one turn, the voltage in the cavity reaches a value around 0.65 MV. The time constant of the regulation in this case is hence 1 turn, 0.023 ms. This validates the regulation as this reaction time is much smaller than the synchrotron period, in the order of 2.5 ms. Fig. 9(c) depicts the total voltage in the cavity, being the addition of both the generator and the beam-induced voltage. At the first passage, the beam induces -1.061 MV reactive (Q channel). The beam current is in quadrature with the cavity voltage (zero-degree stable phase). At the beginning of the simulation the cavity is on tune (RF frequency equal to the cavity center frequency) so there is no beam loading in the I channel. The first correction of the OTFB arrives on the second turn. After five turns the beam-induced voltage has been reduced to -0.1 MV in the middle of the beam batch. The factor ten reduction is consistent with the feedback gain at low offset frequencies (G=10). Larger transients remain at the head and tail of the beam batch, caused by the reduced gain at large offset frequencies. As visible on Fig. 7(b), the open-loop gain has dropped to 0 dB at 1.2 MHz offset from cavity center frequency.
The voltage in the cavity for the last 0.25 ms of simulation is depicted in Fig. 10. Fig. 10(a) shows the demodulated beam-induced voltage, which now reaches 0.14 MV and -1.047 MV in the I and Q channels respectively, in the middle of the beam batch. The RF is now offset by 100 kHz with respect to the cavity center frequency resulting in beam loading in both channels. Fig. 10(b) shows the generator-induced voltage in the cavity set by the regulation. When the beam crosses the cavity, the regulation reduces the voltage in the I channel from the 0.9 MV to 0.78 MV to compensate the beam loading in the I channel. The same behavior is observed in the Q channel, the regulation increases the voltage from 0 MV to 0.95 MV in the middle of the beam batch. The total voltage in the cavity, addition of the beam and generator-induced voltages is depicted in Fig. 10(c). When there is no beam crossing the cavity, the regulation sets 0.9 MV in the I channel. In the middle of the beam batch, the I channel reaches 0.92 MV while the Q channel has -0.1 MV. Again, there is a factor ten reduction of the beam-induced transients, consistent with the feedback gain at low offset frequencies (G=10). This section presents measurements performed in the laboratory on the uTCA platform and with a real cavity. The resampling sandwich with the comb filter has been successfully implemented in an FPGA, targeting a 125 MHz processing clock for a high-grade Xilinx Kintex-7 XCKU040-1FFVA1156C. No hardware optimization has been done, resulting in a maximum achievable FPGA clock of 200.8 MHz. The place and route strategy was "Optimization of Area". The data-paths at the input and output of the resamplers are 16 bits wide, with 15 fractional part bits. The filter coefficients are 18 bits wide, with 16 fractional part bits. The data-path within the resampler is extended to 24 bits. See [30] for implementation details.

Inverse ratios and fixed point arithmetic
The resampling sandwich is part of a real-time feedback system. The input and output resampling ratios need therefore be exactly inverse. This is in practice impossible for real time varying values quantized to fixed point arithmetic. Truncation errors will appear in the computation of the inverse ratio. The consequence is a difference in the number of samples per second at the input and output of the sandwich, leading to a desynchronization between resamplers. A Frequency Locked Loop has been included in the hardware implementation to control the output resampling ratio. For this, a FIFO memory is inserted after the output resampler. This memory absorbs the fluctuations in number of samples per second due to the quantization error of the ratios. When the ratios are exactly inverse, the filling level of the memory remains constant, when this is not the case, the FIFO level increases or decreases. This level is compared against a reference value and the error is used as correction for the ratio. This Ratio Locked Loop is depicted in Fig. 11.  Fig. 12 shows the magnitude of the measured transfer function of the LLRF OTFB only, from RF input to RF output, with different RF frequencies and spans. The laboratory set-up contains the proposed architecture implementation in a uTCA crate. The results in Fig. 12 have been measured with a Vector Network Analyzer (VNA). The stimulus output of the VNA is connected to the antenna input of the LLRF (uTCA crate). The output drive signal of the crate is connected to the VNA input. The clock and LO signals of the crate are generated with external precision RF synthesizers. In Fig. 12 Fig. 3. Fig. 12(b) presents an enlargement around the RF, covering the first harmonic, which is 43.33 kHz aside, while Fig. 12 6) is placed in a test-stand that uses as amplifier the 1 kW pre-driver of the SPS amplifier. We could not use the full 1 MW amplifier as the cavity was not conditioned to high field yet. The figure shows the spectrum of the effective voltage, obtained with the sum of antennas from all cells (RF summing network) [34]. The SPS accelerator is stopped till April 2021 so beam test is not possible before that. The beam was therefore emulated by injecting a perturbation, periodic at the revolution frequency, into the cavity drive signal. The OTFB is then expected to reduce the effect of this beam loading perturbation. The figure corresponds to an RF at 199.898 MHz (center frequency of the Spectrum Analyzer span). It shows the main accelerating frequency and the revolution harmonics induced by the beam signal on the revolution frequency sidebands (43.3 kHz spacing). The traces in red were captured when the compensation was not active (OTFB Off), while the blue traces have the OTFB active. The reduction in the revolution harmonics reaches 20 dB for the first few harmonics, a figure consistent with the gain set in the system (linear factor of 10).

Conclusions and future work
The paper has presented a new solution for Beam Synchronous Processing (BSP) in an FPGA with a fixed frequency system clock. The solution is well suited for new distributed network architectures, such as the one being commissioned at CERN for the SPS upgrade. The SPS architecture exploits deterministic protocols for data distribution and uses a network synchronous clock extracted from the deterministic link for network wide synchronization and synchronous processing. This paper has validated the proposed BSP architecture with the implementation of a One Turn delay FeedBack (OTFB) system in an FPGA.
This new OTFB implementation incorporates both BSP and Beam Asynchronous Processing (BAP) regions. The BAP contains filters whose functionalities are not related to the beam (compensation of amplifier frequency responses, calibrations of antenna, etc.), while the BSP tunes the algorithms to the beam revolution frequency automatically. The BSP and BAP regions support the porting of any new or existing algorithms, requiring no specific modifications of such algorithms. The BSP region adapts the data sampling rate to the beam revolution frequency. This avoids the reconfiguration of BSP algorithms in real time. The data-path interfaces the BSP region by means of two resamplers. The input resampler performs the conversion of the fixed sampling rate, at which the data arrives into this BSP region, to a new rate proportional to the beam revolution frequency. At the output port, a second resampler brings the signal back to the original fixed rate. The resampling ratios of the resamplers are reciprocal (inverse) and vary dynamically during the acceleration ramp.
The new OTFB has been validated with system level simulations first. The filter is implemented in the BSP while the one-turn delay is in the BAP region. The performance is similar to the classic swept clock implementation. The resampler sandwich and the one-turn delay have also been validated in hardware. A new uTCA platform has been used, and laboratory tests have validated the correct tuning of the processing to the RF instantaneous value. No significant degradation of the signals spectral purity was observed as a consequence of the two resamplers. The CERN SPS accelerator is stopped till begin 2021 and no test with real beam will be possible before mid-2021. The beam was therefore emulated by injecting a perturbation, periodic at the revolution frequency, into the drive signal of a spare CERN SPS cavity, demonstrating the compensation of transient beam loading with performances that agree with the simulations.
biomedical signal filtering, etc. That problem is normally addressed by reconfiguring the processing elements in real time. The alternative solution presented here keeps all processing constant using resampling to cope with the sweep in signal fundamental frequency.