A high-resolution, low-latency, bunch-by-bunch feedback system for nano-beam stabilization

We report the design, operation and performance of a high-resolution, low-latency, bunch-by-bunch feedback system for nano-beam stabilisation. The system employs novel, ultra-low quality-factor cavity beam position monitors (BPMs), a two-stage analogue signal down-mixing system, and a digital signal processing and feedback board incorporating an FPGA. The FPGA firmware allows for the real-time integration of up to fifteen samples of the BPM waveforms within a measured latency of 232 ns. We show that this real-time sample integration improves significantly the beam position resolution and, consequently, the feedback performance. The best demonstrated real-time beam position resolution was 19 nm, which, as far as we are aware, is the best real-time resolution achieved in any operating BPM system. The feedback was operated in two complementary modes to stabilise the vertical position of the ultra-small beam produced at the focal point of the ATF2 beamline at KEK. In single-BPM feedback mode, beam stabilisation to 50$\pm$5 nm was demonstrated. In two-BPM feedback mode, beam stabilisation to 41$\pm$4 nm was achieved.


I. INTRODUCTION
The Accelerator Test Facility (ATF) [1] is a 1.3 GeV electron beamline complex located at the High Energy Research Organisation (KEK) in Tsukuba, Japan. The facility is a test-bed for technologies required for a future linear electron-positron collider. The ATF comprises a linear accelerator, damping ring (DR) and final focus system. Ultra-low emittance beams can be produced with the 138.6 m circumference DR via the process of radiation damping. After extraction from the DR, the beam passes through an extraction line and a final focus system (ATF2 [2,3]) ( Fig. 1) which is a scaled prototype of the final focus system of the International Linear Collider (ILC) [4] or the Compact Linear Collider (CLIC) [5]. The vertical focal point of the ATF2 beamline is designated the 'interaction point' (IP). The two main goals [6,7] of the ATF2 Collaboration are to produce nano-beams with an IP beam size of 37 nm and position stabilization at the nanometer level.
The ATF is typically operated with an extracted beam pulse repetition rate of 3.12 Hz, a beam charge in the range 0.1 − 1.0 × 10 10 e, and with one bunch per pulse. Multi-bunch trains can also be produced by accumulating two or three bunches in the DR and extracting them as a single train with one pulse of the DR extraction kicker.
The Feedback On Nanosecond Timescales (FONT) group [8] has developed several generations of prototype bunch-by-bunch beam-stabilization feedback systems which have been tested at the ATF. A feedback system was deployed in the upstream section of the ATF2 extraction line, using high-resolution bunchposition measurements from stripline beam-position monitors (BPMs) [9], to demonstrate [10] the resolution, correction-range and latency requirements for the ILC IP beam collision feedback system [11]. An extended feedback system based on this hardware was recently used to stabilize the beam trajectory before its entrance to the final-focus region, and yielded a significant reduction in the impact of 'wakefields' on the beam-size growth [12].
FIG. 1. Schematic of the ATF layout, with the ATF2 focal point indicated as the IP [12].
Here we report the design and performance of a highresolution, high-precision, low-latency, beam-position feedback system located around the ATF2 IP (Fig. 2), which is aimed at stabilizing directly the IP vertical beam position to the nanometer level. This system incorporates five cavity BPMs similar to those reported in [13], but with a much lower 'quality factor'. The downmixed BPM signals are digitized using a custom FPGAbased feedback controller, the 'FONT5A' board [10,14], and the feedback calculation is performed on an FPGA mounted on the board. An analogue correction signal is output from the board, amplified using a custom power amplifier with a fast rise-time (35 ns) [15], and used to drive a stripline kicker, IPK.
The cavity-BPM system is described in Section II, and its resolution performance is presented in Section III. The bunch-by-bunch feedback system is described in Section IV, and its beam stabilization performance is reported in Section V. A summary of results, and conclusions, is given in Section VI.

II. IP CAVITY BPM SYSTEM
The IP BPM system incorporates five C-band cavity BPMs [16,17], IPA, IPB, IPC, Ref x and Ref y (Fig. 2). Throughout this paper x and y refer respectively to the horizontal and vertical beam position coordinates in the plane transverse to the beam propagation direction. For beam-size measurements using the IP Beam Size Monitor [18] the IP is placed longitudinally between IPB and IPC. However, for the nano-beam stabilization studies reported here, the IP can instead be placed at any one of IPA, IPB or IPC; this is discussed in Section IV. The cavity BPM design and operation is described below.

A. Cavity BPM design and operation
As a bunch of charged particles passes through a cavity BPM, its electromagnetic eigenmodes are excited [19]. The transverse magnetic (TM) modes can be used to determine both the bunch charge and the bunch offset w.r.t. the cavity's electrical axis. Separate cavities were designed for sensitivity to the monopole and dipole TM modes, referred to as 'reference' and 'dipole' cavities respectively; Ref x and Ref y are reference cavities and IPA, IPB and IPC are dipole cavities (Fig. 2).
In the circular cylindrical x and y reference cavities the dominant excited mode is the monopole mode, illustrated in Fig. 3(a), which is sensitive to the bunch charge and, for small offsets, insensitive to its position offset w.r.t. the cavity electrical center. A schematic of the reference cavity is shown in Fig. 4(a), with the coupling slot and antenna indicated [20]. The x and y reference cavities have diameters of 42.95 and 38.65 mm respectively, designed to yield monopole-mode frequencies equal to the respective dipole-mode frequencies of the dipole cavities (see below). Using dedicated tuning pins, the monopolemode frequency of each reference cavity was fine-tuned to match the respective dipole-mode frequency of the dipole cavities (see Table I).
The dipole cavity principle is illustrated in Fig. 3(b). The cavity design is of rectangular cylinder form and uses spatial filtering to suppress the dominant monopole mode so that the higher-frequency dipole mode can be extracted [21]; this mode is sensitive to the bunch position offset as well as its charge. A schematic of the cavities used is shown in Fig. 4(b), with the coupling slots and waveguides indicated. The cavities were designed with different vertical and horizontal dimensions so as to decouple the horizontal and vertical dipole modes; the positioning of the coupling slots and waveguides allows these modes to be extracted separately from the same BPM. There are pairs of output x-ports and y-ports in each cavity; the respective output signals are combined (see Fig. 6) so as to double the signal from the antisymmetric dipole mode and cancel the unwanted symmetric monopole mode. A 700 MHz bandwidth bandpass filter (BPF) removes the residual monopole signal.
The cavities are fabricated from aluminium and were designed [16] to have ultra-low quality-factor values so as to be suitable for resolving in time individual particle bunches in trains with bunch separations of order 100 ns. The design and measured values of the cavity resonant frequencies are given in Table I Since the ATF2 is designed to focus the beam to c. 37 nm in the vertical (y) plane, and our aim is nano-beam stabilization in this plane, for the remainder of this paper we consider only vertical beam position measurements and hence discuss only those signals from the reference FIG. 3. Schematic of the electric and magnetic field lines of (a) the TM010 mode for a circular cylindrical cavity BPM and (b) the TM210 mode (or x-dipole mode) for a rectangular cylindrical cavity BPM. The waveguides which couple to the TM210 mode are shown. There are also corresponding waveguides which couple to the y-dipole TM120 mode.  The position resolution of a dipole cavity BPM is primarily limited by the signal-to-noise ratio, where the signal level is determined both by how much energy is transferred from the beam to the dipole modes and also how well this mode is coupled out of the BPM through the waveguides. Sources of noise in the system include thermal and electronic noise, as well as signal contamination from the monopole mode [19]. The resolution performance is discussed in Section III.
A variable attenuator on the combined output of each dipole BPM (Fig. 6) can be used to increase the dynamic range of the position measurement but at the expense of the resolution. For typical operating bunch charges of 1 nC, 10 dB attenuation was added to the dipole signal, yielding a dynamic range for vertical position measure-ments of ±3 µm.
The dipole cavity BPMs are mounted on two piezomover systems (Fig. 5) within the vacuum chamber which allow horizontal, vertical and angular BPM alignments w.r.t. the beam trajectory [23]. IPA and IPB are mounted on a single 'IPAB' mover block and, therefore, cannot be moved independently. The IPAB movers were manufactured by Cedrat Technologies and have a working range of 248 µm, while the IPC mover was manufactured by PI and has a working range of 300 µm. The movers incorporate feedback systems designed to ensure a position stability of better than 2 nm.
FIG. 5. Schematic of the IP BPM configuration, showing the 'IPAB' mover block, with submovers m1, m2 and m3, on which IPA and IPB are mounted, and the IPC mover block with submovers mC, mD and mE. The nominal IP location, as used for beam-size measurements, is indicated.

B. BPM analogue signal processing
The cavity BPM signals undergo two stages of frequency down-mixing [13,20] (see Fig. 6) so as to produce baseband signals that can be digitized with the FONT5A board. In the first stage, both the reference and dipole cavity signals are down-mixed to an intermediate frequency (IF) centered at 714 MHz using a common Local Oscillator (LO) signal so as to retain the phase relation between the signals. The 5.712 GHz LO signal is generated using frequency multiplication of the Master Oscillator signal [24] and hence is phase-locked to the beam. The LO signal can be written where f LO = 5.712 GHz, ∆φ LO is the phase difference between the LO signal and the dipole signal, and L is a constant. The output signals from the y-port of a dipole cavity (V dip ) and from the y reference cavity (V ref ) can be represented by: where q is the bunch charge, y is the vertical beam position offset w.r.t. the cavity electrical axis, y and α are the bunch pitch-angle and angle-of-attack, respectively, f dip or f ref is the respective signal frequency, and ∆φ is the difference in phase between the monopole and dipole signals; D y , D y and D α are constants. It can be seen that the signals excited by a y or α offset are 90°out of phase with those excited by a y offset. For small bunch offsets the reference-cavity signal is independent of the beam position.
After the first stage of down-mixing, signals V dip ⊗V LO and V ref ⊗ V LO are produced (Fig. 6) at both the IF (714 MHz) and the higher frequencies In the second-stage processing, the latter are removed with a 150 MHz bandwidth bandpass filter centerd at c. 700 MHz. The V ref ⊗V LO IF signal is then split, with one of the outputs passing through a diode detector to produce a pulse whose magnitude is proportional to the bunch charge. This signal, subsequently denoted q, is used for bunch-charge normalization to obtain the bunch position (see below). The other V ref ⊗ V LO output passes through a limiting amplifier to remove its charge dependence. This signal is then used as the LO signal for the second stage of down-mixing of the dipole signals, from the IF to baseband.
In the second stage, the reference and dipole signals are mixed in-phase and in-quadrature to produce I and Q signals, respectively. These signals are orthogonal components that together include the full amplitude and phase information of the BPM waveform [25]. If the BPM is well-aligned in y and α, the contributions to V dip from these terms are much smaller than those from y, such that and where the phase angle, θ IQ , corresponds to Since the reference cavity is tuned such that f ref f dip (see Table I), the I and Q signals are at baseband. Before digitization these signals are amplified so as to reduce the effect of quantization noise.

C. Signal digitization and digital processing
The signal digitization is performed on a FONT5A board [10], a custom feedback controller with a Xilinx Virtex-5 XC5VLX50T FPGA at its core [26]. The primary inputs and outputs of this board are shown in Fig. 7. The FPGA firmware is written in the Verilog FIG. 6. Simplified block diagram of the two-stage down-mixing process of the dipole and reference cavity signals from GHz-level to baseband. Diagram adapted from [17].
hardware description language, and the configuration bitstream is stored on a non-volatile Xilinx XCF32P Programmable Read-Only Memory (PROM), from where it is loaded on power-up or system reset. The inputs and outputs of the PCB are via Micro-Coaxial connectors (MCX) which patch to BNC connectors on the case which houses the board [27]. The FONT5A board is shown in Fig. 8 with the case removed. The board contains nine Texas Instruments 14-bit ADS5474 [29] analogue-to-digital converters (ADCs) grouped into separately-clocked banks of three. Seven ADCs are used to digitize the I and Q waveforms from IPA, IPB and IPC, and the q waveform from the Ref y cavity; the least significant bit is removed as it corresponds to the noise level of the signals [27]. The ADC channels contain an inherent offset on their baseline signal which can be zeroed by coupling each with the output of a 16-bit DAC, referred to as a trim DAC [30]. The values used for the trim DAC can be set using the associated FONT LabVIEW DAQ which is used to transmit values to the board through an RS-232 Universal Asynchronous Receiver/Transmitter (UART) via an Ethernet serial device server.
A clock at 357 MHz is used for the time-critical FPGA logic. It is derived from the LO, meaning it is phaselocked to the beam, and used to clock the ADCs, so that the I, Q and q signals are digitized at 357 MHz.
The start of the sampling window is set with respect to the trigger, which is internally delayed on the board and can be adjusted. The sampling window can be varied within the firmware but for one-or two-bunch operation typically consists of 164 samples each separated by 2.8 ns, meaning that a complete DR beam-circulation period (462 ns) can be digitized within a single window. Representative digitized waveforms for 2-bunch-train operation (Section V A) are shown in Fig. 9. The difference in the I and Q signals between the two bunches derives from the transverse position offset between them.
The firmware includes the functionality to provide a constant offset to the I, Q and q signals before they are used to calculate the bunch position. This is used to remove the position-independent baseline signals that are generated on each I and Q waveform at the second stage of the signal processing (Fig. 9).
FIG. 9. Representative digitized I, Q and q waveforms from IPC, for two-bunch-train operation with a bunch spacing of 280 ns. The waveforms were sampled at intervals of 2.8 ns.

D. Position measurement
A linear combination of I and Q can be chosen to produce a signal, I , with an amplitude proportional to the bunch position y [13]: By substituting Eqs. 3 and 4 into Eq. 6, where k [µm −1 ] is a constant, found by calibration of the BPM. A signal orthogonal to I can also be generated, Q , that is proportional to the beam pitch: Each dipole BPM is calibrated w.r.t. position by vertically scanning the beam across a known range by changing the position of quadrupole QD0FF (Fig. 2) and measuring the corresponding BPM response. Calibrations w.r.t. the beam angle y are performed by tilting the BPMs through a known range using the submovers shown in Fig. 5.
For each measured bunch in the beam, the calibration calculation can be performed using either single or multiple samples of the I and Q waveforms. For convenience a single sample of the q signal from Ref y is used for charge normalization of the I and Q signals from all three dipole BPMs. The requirements for low-latency feedback preclude the direct implementation of division for the charge normalization within the firmware and, instead, a method of lookup tables (LUTs) is employed using block RAM resources in the FPGA. The charge, q, is used as an address to the LUTs, for which the elements are preloaded with 1 q scaled by the appropriate feedback where C i incorporates the terms involving θ IQ , k and the feedback gain G (see Section IV C); there are four instances of the LUT logic (1 ≤ i ≤ 4), each loaded with the respective value of G, allowing for up to any two of the BPMs to be used as input to the feedback system [31]. The position resolution can be significantly improved (see Section III) by integrating over multiple samples of the I and Q signals as this both increases the signal level and averages over thermal and electronic noise. The integration range is chosen around the peak of the I and Q signals, as samples significantly in advance of the peak may contain transient effects from unwanted modes and samples late in the waveform have a poorer signal-tonoise ratio. This integration is performed in real time on the FONT5A board: on every rising fast-clock edge within the selected integration window, the most recent I and Q value is summed with the previous respective sum. As an example, an IPA position calibration using 11-sample integration is shown in Fig. 10.
Representative position and angle calibration constants for the three dipole BPMs, calculated using 11sample integration, are presented in Tab. II. It can be seen that IPA and IPB have similar sensitivities, whereas IPC has a lower sensitivity; this is due to a minor fabrication difference.

III. CAVITY BPM SYSTEM POSITION RESOLUTION
The resolution of the BPM system was evaluated using measurements of the bunch trajectory at all three dipole BPMs. Since the bunch follows a straight-line trajectory which can be characterized with measurements from only two BPMs, measurements from the third BPM can be used to estimate the resolution of the system.
The beam position at BPM i, y i , can be represented as a linear combination of the positions of the beam at  the other two BPMs, y j and y k : where A ij and A ik are 'geometric' coefficients defined by the relative separations of the three BPMs (Fig. 5). The predicted beam position at BPM i, y pred i , can therefore be written in terms of the measured positions at BPMs j and k: The difference between this and the measured position, y meas i , yields a residual. Under the assumption that all three BPMs have the same resolution, σ res. , the resolution is derived from the standard deviation of the distribution of residuals measured over a batch of sequential beam pulses: Detailed studies of the experimental setup to optimize the resolution, including the BPM alignment procedure, are given in [31]. For a data set with bunch charge 0.5 × 10 10 e, Fig. 11 shows the resolution as a function of the number of I and Q samples integrated in real time for the position calculation. It can be seen that the resolution improves from 41 nm (single sample) to an optimal value of 19 nm with 11 samples. No improvement is seen by integrating additional later samples as the BPM waveforms have decayed and the signal levels are low.
As a cross-check, an alternative, 'fitting', method was employed. Here the coefficients A ij and A ik (Eq. 10) are fitted to the measured position data set so as to minimise empirically the resolution (Eq. 12). The fitting method may be applied separately to each of the three BPMs, giving three correlated estimates of the resolution. Were the resolution effectively degraded via the influence of uncontrolled correlated parameters, the empirical fit could yield an improvement over the geometric method [31]. The fitted resolution results for the same data set are also given in Fig. 11; the results are in good agreement with the geometric method and confirm that the realtime resolution of 19 nm is the best that could be obtained for these BPMs with the given beam conditions. The resolution results are summarised in Table III

A. System design
The high-resolution real-time vertical beam position information from the cavity BPM system was used as input to a closed-loop feedback. For two-bunch trains (see Section V A) the position of the first bunch was measured and used to correct the position of the second bunch. Two feedback operating modes were used, represented functionally in Fig. 12. In single-BPM mode ( Fig. 12(a)) the position signal from one BPM was used to derive the correction signal supplied to the kicker IPK, such that the vertical beam position was stabilised at the chosen BPM. For this mode the IP was moved longitudinally from the nominal IP to the center of the chosen BPM so as to directly stabilise the vertical position there. In two-BPM mode ( Fig. 12(b)) the IP was placed longitudinally at one BPM; the position signals from the other two BPMs were used to derive a correction signal such that the nanobeam was stabilised vertically at the chosen BPM, which hence served as an independent witness of both the corrected and uncorrected beam positions. Four multiplexers within the firmware allow selection among the three BPMs for their input either individually (mode (a)) or as a pair (mode (b)).
For both modes the correction signal to the kicker is output from the FONT5A board (Fig. 7) via a Linear Technology 14-bit LTC2624 [32] digital-to-analogue converter (DAC).

B. Kicker and kicker amplifier
The correction signal from the FONT5A board requires amplifying before it can be used to drive the kicker (see Fig. 12). The stripline kicker (Fig. 13) is a modified stripline BPM [9] and consists of two conducting strips, ∼12.5 cm in length and separated by 24 mm, at the top and bottom of the inside of the beam-pipe. The custommade kicker amplifier (see eg. [27]) was manufactured by TMD Technologies Ltd [15]. In order to meet the lowlatency requirements the amplifier was designed with a fast rise time of 35 ns to reach 90% of the peak output. The amplifier is capable of providing a drive current of ±30 A.

C. Feedback calculation
For either feedback mode, the signal sent to the kicker, V (DAC counts), is derived from the measured position offset at the chosen BPM(s) as where G is the feedback gain, M (µm/DAC counts) is the kicker response calibration constant, and c is an arbitrary offset. If the beam is being stabilized at a location in between BPMs, y refers to the interpolated position. For the case in which the vertical positions of bunches 1 and 2 are 100% correlated, optimal stabilization of bunch-2 is obtained with G set to unity. For the case of uncorrelated position components G can be adjusted empirically so as to achieve optimal stabilization of bunch-2. The value of c can be controlled via the firmware settings so as to place the stabilised bunch-2 at any desired vertical position within the feedback dynamic range. The firmware is designed so that the kicker drive signal is output at the same time relative to the beam arrival regardless of the number of samples integrated in the digital signal processing, up to a maximum of 15 samples. The firmware is also set up to allow a selectable constant kicker drive signal from the DAC with the same timing structure as for a real feedback pulse; this feature is used to evaluate M directly by measuring the response of the beam as a function of the kicker drive signal.

D. Latency Measurement
The closed-loop feedback latency is defined as the time interval between bunch-1 passing through the longitudinal center of IPK and the derived kicker correction pulse (for bunch-2) reaching 90% of its final output value. The latency was measured directly with the beam by adding a controlled delay to a constant kicker drive signal (of 2000 DAC counts) and measuring the resulting position deflection of the second bunch. The principle is illustrated in Fig. 14. For large added delay (small ∆t) the kick arrives too late and bunch-2 is undeflected. For small added delay (large ∆t) the kick arrives in time to fully deflect bunch-2. When the kick arrives in time to kick the bunch by 90% of the maximum value, then ∆t is equal to the latency. Sequential triggers were toggled between feedback 'off' and 'on' to allow running baseline subtraction. Fig. 15 shows the beam deflection as a function of ∆t from which the latency is measured to be 83 samples, i.e. 232 ns.
FIG. 14. Schematic illustrating the principle of the direct latency measurement by adding a controlled delay to the kicker drive output signal.

A. Accelerator and feedback setup
For the operation of the IP bunch-by-bunch feedback system, the ATF DR was configured to deliver two-bunch trains to ATF2 with a bunch separation of 280 ns. The train repetition rate was 1.56 Hz. This setup provides a high degree of correlation between the vertical positions of the two bunches in each train [20], which yields the conditions for optimal feedback performance in stabilizing the second bunch.
The limited dynamic range of the dipole BPMs for optimal resolution necessitates both their good transverse centering w.r.t. the beam trajectory and, ideally, small beam jitter at each BPM. Each final-focus-system quadrupole is mounted on transverse movers, which allows for adjustments to both the incoming beam position and angle. In particular, moving QD0FF (see Figure 2) vertically adjusts the vertical IP position, while moving QF1FF (or the upstream QF7FF) adjusts the vertical beam incoming angle [33,34]. The beam trajectory is first globally aligned with the electrical centers of the IP BPMs, and fine adjustments are then made to center each BPM w.r.t. the beam by using the BPM movers (Fig. 5).
For single-BPM feedback ( Fig. 12(a)), small beam jitter is achieved by setting the IP at the longitudinal center of the feedback BPM [35]. For two-BPM feedback ( Fig. 12(b)) the situation is more difficult as the extreme IP angular divergence produces increasingly large beam jitter as longitudinal distance from the IP increases. With the nominal optics configuration the jitter at the feedback BPMs can exceed the dynamic range for best resolution. Therefore, for two-BPM feedback operation an optics configuration with a reduced angular divergence at the IP was used. This yields a reduced beam jitter at the feedback BPMs, although at the expense of increasing the IP beam jitter. These optics are designed such that the ATF2 beamline has the same magnet strength as for the nominal optics except within the matching section. With these optics, the vertical β-function at the IP is 12 cm.
The BPMs were set up for optimal performance, and calibrated, as described in Section II D. In order to make a direct comparison between the data with feedback 'on' and 'off' within a given dataset, the feedback was toggled between on and off on alternate bunch trains.

B. Single-BPM IP Feedback Results
Single-BPM feedback was operated with a bunch charge of 0.8 × 10 10 e − , with the IP set at IPC. The feedback gain was set to 0.8 to account for the imperfect bunch-to-bunch position correlation, as determined from correlation measurements taken at the start of the shift (Table IV). Further analysis has suggested, however, that the correlation decreased during the shift. A 10-sample integration window was found empirically to optimize the resolution. The feedback performance is illustrated in Fig. 16 and summarised in Table IV, where we compare feedback-on and feedback-off results. Since bunch-1 provides the input to the feedback its position is unaffected by the correction. By contrast, the bunch-2 mean position is zeroed by the feedback and its jitter is substantially reduced, from 119 nm to 50 nm. The same dataset is used in Fig. 17, which shows the effect of the feedback on the bunch-to-bunch correlation as well as on the time-sequence of the bunch-2 position. The expected level of beam stabilization can be computed from the bunch jitter and the incoming bunch-tobunch correlation. The corrected bunch-2 position, Y 2 , in  terms of the uncorrected bunch-1 and bunch-2 positions, y 1 and y 2 , respectively, is Taking the variance of Eq. 14 gives where ρ 12 is the bunch-to-bunch correlation and σ Y2 , σ y1 and σ y2 represent the jitters on positions Y 2 , y 1 and y 2 , respectively. The measured incoming position correlation between bunches 1 and 2 (feedback off) is about 85% (Table IV); hence, from Eq. 15, the expected feedback-corrected jitter for bunch-2 is 65 ± 11 nm, which is in reasonable agreement with the measured performance. With feedback on, the measured correlation between bunches 1 and 2 is −26.0 +9.8 −8.8 (Table IV), which implies a slight over-correction. This naively suggests that an improved feedback-corrected jitter would have been possible with a slightly lower gain. Limited beam operation availability at the facility did not allow this to be verified at the time, but it could be investigated in future beam studies.
The theoretically optimum performance is obtained for 100% correlation between bunches 1 and 2, i.e. ρ 12 = 1, comparable bunch jitters, σ y1 = σ y2 , and feedback gain G = 1. With these conditions fulfilled, the ultimate limit to stabilization is determined by the BPM resolution, σ res. : For a real-time BPM resolution of c. 19 nm (Section III) the ultimate feedback performance in single-BPM mode (Eq. 16) would hence be stabilization of bunch-2 to c. 27 nm, so there is in principle still a margin for improvement of the feedback performance reported here, subject to improved beam conditions.

C. Two-BPM IP Feedback Results
Two-BPM feedback was operated with a bunch charge of 0.5 × 10 10 e − , with the IP set at IPB and with IPA and IPC used as inputs to the feedback; hence IPB was used as an independent witness of the feedback performance. The longitudinal separations of IPA and IPC from the beam waist yield much larger position signal levels and higher signal-to-noise ratios. The sample window was chosen empirically to optimize the resolution, here with a measured resolution of ∼31.2 nm, for a five-sample window. Feedback was operated with a gain of 0.8 to account for the differences in the position jitter between the two bunches and the imperfect bunch-to-bunch position correlation (Table V).
The feedback performance is illustrated in Fig. 18 and summarised in Table V, where we compare feedback-on and feedback-off results. Since bunch-1 provides the input to the feedback its position is unaffected by the correction. By contrast, the bunch-2 jitter is substantially reduced, from 96 nm to 41 nm. The measured incoming position correlation between bunches 1 and 2 (feedback off) is about 92% (Table V); hence, from Eq. 15, the expected feedback-corrected jitter for bunch-2 is 40±11 nm, which is in excellent agreement with the measured value. In Fig. 18 it can be seen that the mean corrected bunch-2 position was at ∼0.5 µm, which simply arises from the residual relative transverse offsets between IPB and IPA/IPC; if desired this offset can trivially be removed with a compensating constant offset term, shown as c in the feedback algorithm (Eq. 13).
With feedback on the measured correlation between bunches 1 and 2 is about 41% (Table V), which implies an under-correction. This suggests that an improved feedback performance would have been possible with a higher gain. As previously noted, at the time, beam operation availability was limited and this could not be verified but it could be confirmed with further beam studies.
In this feedback mode, with stabilization at IPB, the feedback BPMs, IPA and IPC, contribute position information in the ratio 32:68, determined by their relative distances from IPB (Fig. 5). Hence, the theoretically best-possible resolution on the corrected beam position at IPB is given by: σ IPB = 0.32 2 σ 2 res. + 0.68 2 σ 2 res. = 0.75σ res. ; i.e. beam stabilization at IPB as low as c. 23 nm would have been achievable in principle given the measured resolution of 31 nm. Correspondingly, with the best achieved resolution of 19 nm, and a perfect feedback correction, stabilization to 15 nm would be theoretically possible. Hence there is still a margin for improvement of the feedback performance reported here, subject to improved beam conditions.

VI. SUMMARY AND CONCLUSIONS
We have reported the design, operation and performance of a high-resolution, low-latency, bunch-by-bunch feedback system for beam stabilization. The system includes high-resolution cavity BPMs, two stages of analogue signal down-mixing system, and a digital board incorporating an FPGA. The FPGA firmware allows for the real-time integration of up to fifteen samples of the BPM waveforms so that feedback can be performed within a latency of 232 ns. We have shown that this real-time sample integration improves the beam position resolution, with measured resolutions as good as 19 nm, which consequently improves the feedback performance.
In [13] results were reported using similar cavity BPMs, but with a higher design quality factor: data were recorded and the resolution was determined in a subsequent offline analysis using a function that included 10 free parameters to account for uncontrolled effects; a resolution of ∼9 nm was thereby obtained. In addition, the position-calibration constant was not measured directly at the most sensitive resolution setting, but was interpolated from measurements made with added signal attenuation, at lower position sensitivity. Furthermore, no attention was paid to signal processing latency as the BPMs were not used for bunch-by-bunch feedback.
We have made several significant advances since this earlier study: 1) the BPMs were calibrated w.r.t. position directly at the most sensitive resolution setting and the respective calibration factors were applied in the subsequent BPM operations; 2) the signal processing was done in real-time and with low latency, so as to permit the BPMs to be used for bunch-by-bunch feedback; 3) the resolution was measured directly, in real-time, without fitting any extra parameters. The high BPM resolution was hence utilised directly for stabilization of the beam, and is not merely an impressive offline performance figure of merit.
The feedback was operated in two complementary modes to stabilise the vertical position of the ultra-small beam produced at the focal point of the ATF2 beamline at KEK. In single-BPM feedback mode, beam stabiliza-tion to 50 ± 5 nm was demonstrated. In two-BPM feedback mode, beam stabilization to 41±4 nm was achieved, in good agreement with the predicted value, given the incoming beam conditions, of 40 nm.
Some margin remains to improve the feedback performance by increasing the degree of bunch-to-bunch position correlation in the incoming beam, and suitably optimising the gain. For the best achieved position resolution to date, and for 100% bunch-to-bunch correlation, an ultimate beam stabilization to about 15 nm is in principle achievable with the current hardware. Should ATF/ATF2 beam operations resume, this will be the subject of future feedback studies.