Multimode capacity of atomic-frequency comb quantum memories

Ensemble-based quantum memories are key to developing multiplexed quantum repeaters, able to overcome the intrinsic rate limitation imposed by finite communication times over long distances. Rare-earth ion doped crystals are main candidates for highly multimode quantum memories, where time, frequency and spatial multiplexing can be exploited to store multiple modes. In this context the atomic frequency comb (AFC) quantum memory provides large temporal multimode capacity, which can readily be combined with multiplexing in frequency and space. In this article, we derive theoretical formulas for quantifying the temporal multimode capacity of AFC-based memories, for both optical memories with fixed storage time and spin-wave memories with longer storage times and on-demand read out. The temporal multimode capacity is expressed in key memory parameters, such as AFC bandwidth, fixed-delay storage time, memory efficiency, and control field Rabi frequency. Current experiments in europium- and praseodymium-doped Y$_2$SiO$_5$ are analyzed within this theoretical framework, and prospects for higher temporal capacity in these materials are considered. In addition we consider the possibility of spectral and spatial multiplexing to further increase the mode capacity, with examples given for both rare earh ions.


Introduction
The realization of quantum networks relies on the distribution of entanglement over remote quantum nodes using photons. In ground-based networks the photons travel between the nodes in optical fibers, which causes the entanglement rate to decrease arXiv:2202.12383v1 [quant-ph] 24 Feb 2022 exponentially with the distance. The loss is due to optical fiber attenuation, which limits ground-based and repeater-less entanglement distribution schemes to a few hundred km [1], while still allowing quantum key distribution using weak coherent states up to about 600 km [2][3][4][5][6].
To overcome this limitation, quantum repeaters have been proposed [7][8][9][10][11]. Nearterm quantum repeaters are based on creating heralded entanglement within elementary links. These are the individual segments in which the network branch is divided into. Each elementary link has two so-called "quantum nodes", with the ability to generate entanglement and store a part of it in a quantum memory, while the other part is used to perform entanglement swapping operations with neighboring links. Repeating this swapping operation through the whole chain of elementary links will eventually lead to entanglement between the two end nodes of the network branch. Heralding and storing the entanglement between remote quantum memories in an elementary link is the key to the sub-exponential scaling of entanglement distribution rate with distance in quantum repeaters.
The generation of remote heralded entanglement usually relies on a measurementinduced process that requires the detection of photons, which were generated by the quantum nodes, at a central station located between the two nodes. These photonic modes, each entangled with the node that is storing the other portion of the entangled state, are mixed at a beam splitter (BS). If the modes are indistinguishable, the BS erases the information about their origin, i.e. the detection of photons after the BS projects the two quantum nodes onto an entangled state. The heralding of the entanglement then requires photonic modes traveling from the quantum nodes to the central station and the result of the photon detection traveling back to the nodes, i.e., a two-way communication. If the two quantum memories can store only one single mode and are connected by a communication channel of length L 0 and refractive index n, an entanglement creation trial duration is bounded by the communication time τ comm = n L 0 /c, such that the overall repetition rate of the entanglement generation in one link is limited to R = 1/τ comm [9]. For long distances, R could decrease significantly. For example, for two single-mode quantum memories separated by 100 km of optical fibers the repetition rate of the entanglement generation trials is limited to R = 1/τ comm = 2 kHz, therefore seriously constraining the achievable entanglement rate.
The entanglement heralding rate can be significantly increased by the use of so-called multimode memories, which allow for multiplexing of the entanglement generation. A multimode quantum memory allows for the storage of various photonic modes in different degrees of freedom, e.g., temporal, frequency or spatial modes. By using a multimode memory that is able to store N modes it is possible to perform N entanglement creation trials during a communication time, therefore increasing the entanglement generation rate by a factor N , to first order [9]. Moreover, the increase of entanglement rate has the beneficial side effect of relaxing the requirement on the storage time of the quantum memories [12].
Quantum memories based on ensembles of atoms are well suited for developing multimode quantum memories. Cold atomic clouds have been so far mostly used to investigate spatially multimode memories [13][14][15][16], but time multiplexing has been demonstrated as well [17]. Rare-earth doped crystals are particularly promising for the realization of massively multiplexed quantum memories. Their static inhomogeneous broadening can be used as a resource for time and frequency multiplexing, a precious capability that could be combined with spatial multiplexing. Among the many protocols that have been proposed to store photonic qubits in rare-earth doped crystals, the atomic frequency comb scheme [18] is naturally suited for temporal multiplexing [18,19]. Temporal multiplexing is particularly attractive because it can be used in a simple manner in quantum repeater architecture, just by detecting the arrival time of the photon using a single detector. Following the first demonstration of an AFC memory [20], several proof-of-principle demonstrations have been performed with photonic qubits and single photons enabling light-matter [21][22][23] and matter-matter entanglement [24][25][26][27]. AFC spin-wave memories with on-demand read-out have also been demonstrated for photonic qubits [28,29] and single photons [23,30]. Temporal multimodality has been shown in several experiments [31][32][33][34]. Spectral multiplexing has also been achieved by creating several AFCs at different frequencies within the inhomogeneous broadening [35,36].
In this paper, we analyze in detail the multimode capacity of AFC quantum memories in rare-earth (RE) doped crystals, taking into account realistic parameters. We develop a model to infer the maximum number of temporal modes for a given efficiency as a function of the coherence of the optical transition, the bandwidth of the AFC and the Rabi frequency of the control pulses in case of spin-wave storage. We also estimate the maximal number of spectral and spatial modes that could be stored using realistic parameters. In addition, we provide experimental demonstrations of multimode storage in Eu 3+ :Y 2 SiO 5 and Pr 3+ :Y 2 SiO 5 , where we report the largest number of temporal modes stored both in the optical and spin transition to date.

Atomic frequency comb quantum memories
Atomic frequency comb (AFC) quantum memories [18] are based on a periodic atomic absorption profile in the form of a comb structure, with a given periodicity ∆ and total bandwidth Γ, see Figure 1(a). The comb can be created on an inhomogeneously broadened optical transition |g -|e through the use of spectral hole burning techniques, i.e. by frequency-selective optical pumping. RE-doped crystals are ideal systems in this respect, given their large, static inhomogeneous broadening, narrow homogeneous broadening and low spectral diffusion [37]. To efficiently burn the periodic structure while maintaining high spectral resolution one can use optimized optical pumping sequences based on multi-frequency adiabatic pulses [32].
Input pulses absorbed by the comb on the |g -|e transition results in AFC echoes after a fixed-delay storage time of 1/∆, owing to the transient response of the medium, see Figure 1(b). If the minimum input pulse duration set by the AFC bandwidth is much Figure 1. (a) The AFC fixed-delay memory is based on the absorption and reemission of an echo by a comb structure of periodicity ∆ and bandwidth Γ written into an inhomogeneous absorption profile. The AFC spin-wave memory is based on a coherent and reversible transfer to a second ground state, allowing on-demand read out and long storage times. (b) Time sequence of an AFC echo memory with a fixed delay of 1/∆. The input consists of a train of modes with mode duration T m , each containing pulses with a FWHM of duration T in intensity. Note that the train can be a sequence of pulses with random intensities, but with otherwise identical properties. (c) In the case of an AFC spin-wave memory the control pulse occupies a bin size of T c , such that the total duration in which input modes can be defined is now reduced to 1/∆ − T c . In this illustration the control pulse consumes exactly two input mode bins. In this article smooth adiabatic pulses with a constant intensity duration of T s are treated analytically, see text for details. The total spin-wave storage time is T spin .
shorter than the fixed storage time 1/∆, then temporal multimode storage is possible. Let's consider that a smooth temporal mode occupies a truncated (cut-off) duration of T m , then the temporal multimode capacity is given by N t = 1/(∆T m ) ‡. In this article we show that the multimode capacity is related to the mode size T m and the bandwidth Γ, while the exact shape and full-width at half-maximum (FWHM) of the mode is of less importance provided its total energy is mostly contained within the cut-off duration T m . A detailed analysis assuming Gaussian intensity mode profiles will be given. We will also consider the impact of the optical coherence time T 2 (between |g -|e ) on the temporal multimode capacity.
On-demand read-out and longer storage times can be achieved through the reversible transfer of the optical excitation to a second ground state |s using two control fields, see Figure 1(c). The storage time T spin in the spin state is limited by dephasing due to inhomogeneous spin broadening and spectral diffusion. By applying a spin echo sequence the storage time can be extended to the regime of seconds, as demonstrated ‡ Formulas given throughout the article for calculating mode numbers N t will generate non-integer numbers in general. Mathematically the maximum mode number should be the integer part N t , while in practice if N t is close to the next-highest integer another mode can be stored with negligible loss of efficiency.
when storing strong laser pulses [38][39][40]. In the quantum regime one must deal with technical noise generated by the imperfect spin echo sequences, where up to 100 ms of spin-storage time has been achieved so far [41].
Efficient mapping to and from the |s state requires high transfer probability over the entire bandwidth Γ of the AFC. This is a challenge with RE ions due to their low optical oscillator strengths [37]. To circumvent this one can use adiabatic, frequencychirped pulses [42][43][44] having a flat transfer profile over Γ. However, these pulses are much longer than the π pulse duration set by the optical Rabi frequency Ω. If the cut-off duration assigned to a single control pulse is T c (assuming two identical control pulses), then as a consequence the temporal multimode capacity is reduced to N sw t = (1/∆−T c )/T m . In this article we will calculate the required duration T c assuming a specific adiabatic pulse proposed by Tian et al. [45], which is particularly efficient given a constraint of the total cut-off duration T c .
3. Theoretical temporal multimode capacity 3.1. Capacity limit of the AFC fixed-delay memory due to Nyquist-Shannon sampling theorem The preservation of temporal information is closely related to the Nyquist-Shannon sampling theorem. As argued by Shannon [46], a temporal signal f (t) that contains a maximum frequency W has its Fourier spectrum within the frequency range −W to W , leading to the minimum required "sampling rate" 1/2W . In our case information is encoded into a train of input modes separated by T m in time, meaning the maximum frequency of the time signal is W = 1/T m and it requires a minimum frequency range of 2W = 2/T m to be accurately captured.
This point can be illustrated by looking at the Fourier transform of a train of pulses of equal amplitude. In Fig. 2 the Fourier spectrum of a sequence of 5 modes with T m = 1 µs is shown. The mode spacing of T m results in the two Fourier peaks at ±W = ±1 MHz. To preserve the information encoded at the modulation frequency W , then clearly one must have a total bandwidth of at least Γ = 2W , which encapsulates the Nyquist-Shannon sampling theorem. Note that a sequence of modes with random amplitude and phase modulation have its Fourier information encoded within the Nyquist limits. Hence, there is a strict minimum mode size T m imposed by the AFC bandwidth Γ, independently of the exact temporal shape of the mode inside the interval T m .
Based on this argument we propose the following relation between the AFC memory bandwidth Γ (in Hz) and the input mode size T m , where we have rather arbitrarily chosen a factor of 2.5 instead of 2 to fully capture the Fourier peaks at ±1/T m whose widths depend on the total duration of the train of pulses. Following this definition, the temporal multimode capacity N t for a AFC fixed-delay memory is One can also rewrite the formula as N t = N tooth /2.5, where N tooth is the number of teeth in the AFC, showing that each temporal mode requires about 2.5 additional peaks in the AFC. The temporal mode capacity doesn't depend directly on the choice of pulse shape, nor on the FWHM of the pulse within the cut-off duration T m . However, clearly one should also consider the pulse energy contained in the mode, given the mode profile. In general one can optimize the relation between the mode FWHM, T , and the mode size T m , where we define T m = κT . In Appendix A the case of a Gaussian mode is treated in detail. It is shown that for κ ≈ 2.38 then 99.5% of the energy is contained both in the time cut-off T m and in the power spectrum cut-off Γ. This choice is illustrated as the dashed curve in Fig. 2. Other choices of κ might increase the energy content in either time or frequency domain, at the expense of less energy content in the reciprocal domain. In practice any choice in the range of κ = 2 to κ = 2 √ 2 preserves at least 98.1% of the energy in either domain for Gaussian modes.

Effects of finite optical coherence time
The AFC echo memory efficiency is ultimately limited by the dephasing caused by decoherence on the optical transition. The optical coherence time, or equivalently the homogeneous linewidth, limits the AFC memory efficiency both in the optical pumping step (the AFC creation step), and during the actual storage time 1/∆. State-of-the-art measurements of the AFC efficiency as a function of the fixed-delay 1/∆ storage time shows exponential decays [26,32,47]. In Ref. [32] it was argued, based on Maxwell-Bloch simulations of the AFC preparation step, that ultimately the AFC efficiency is limited by where T 2 is the optical coherence time of the transition [32]. Note that the relative η T 2 efficiency only accounts for the loss due to the optical T 2 limitation, given the choice of optical storage time 1/∆. See Ref. [32] for a more comprehensive discussion of the total AFC memory efficiency.  The PE data has been corrected for instantaneous spectral diffusion (ISD) in the conventional manner [48]. The PE data by Könz et al. [48] is shown for reference (solid line). The coherence times are also reported in Appendix B.
State-of-the-art AFC experiments [26,32,47] so far have not reached the T 2 limit given by Eq. (3). To further investigate the effect of T 2 we have performed both AFC efficiency decay measurements and photon echo (PE) measurements in 151 Eu:Y 2 SiO 5 as a function of temperature, see Appendix B for experimental details. The T 2 data from the PE measurements are taken as the reference, which the effective T 2 extracted from the AFC measurements should ideally reach. As shown in Figure 3, the PE and AFC coherence times do indeed converge at temperatures above 6.5 K, supporting the η T 2 limit introduced in Ref. [32]. The AFC coherence time of T 2 = 300 ± 30 µs obtained at low temperatures is the longest reported AFC coherence time so far. Yet, the PE data results in T 2 = 707 ± 204 µs, which is a significant difference that negatively affects the current temporal multimode capacity in 151 Eu:Y 2 SiO 5 . We believe that the lower AFC T 2 is due to technical issues, such as laser frequency drifts and/or dephasing induced by vibrations in the employed closed-cycle cryostats [49]. In Figure 3 the temperature dependence of the T 2 measured by Könz et al. [48] in a Eu-doped Y 2 SiO 5 sample with natural isotopic abundance is shown as reference, where a slightly longer T 2 of 1.27 ms was reached, which we attribute to sample differences. It should also be noted that a coherence time up to 2.6 ms has been measured with PE in a Eu-doped Y 2 SiO 5 sample under a weak magnetic field [50]. Pr-doped Y 2 SiO 5 crystals generally have shorter optical coherence times, reaching 111 µs at 1.4 K at zero magnetic field and 152 µs with a weak magnetic field [51]. The longest measured AFC T 2 of 92 ± 6 µs is close to the PE coherence time (see Sec. 5.1), lending further support to Eq. (3).
In the following we assume that the loss of efficiency due to the optical coherence time can be modeled with Eq. (3), and we express ∆ as a function of η T 2 and insert the expression into Eq. (2), which gives a temporal multimode capacity of The factor Γ T 2 is the ultimate upper limit of the mode capacity, as it represents the intrinsic time-bandwidth product of the memory. However, in practice only a small fraction of the memory can be exploited if an efficient storage is to be achieved. For η T 2 = 0.9, only about 1% of ΓT 2 can be used, while for η T 2 = 0.5 that factor goes up to 7% of ΓT 2 but at the cost of significantly reduced efficiency. Furthermore, it should be pointed out that Γ is often limited by hyperfine and/or Zeeman splittings, which generally is much narrower than the entire optical inhomogeneous broadening. It is illustrating to plot a contour map of the AFC echo multimode capacity as a function of optical T 2 and relative efficiency η T 2 , as shown in Fig. 4(a) and (b) for a bandwidth of Γ = 5 MHz and 15 MHz, respectively. These bandwidths are compatible with those achievable in the RE systems 151 Eu:Y 2 SiO 5 and Pr:Y 2 SiO 5 (5 MHz), and 153 Eu:Y 2 SiO 5 (15 MHz), respectively. The plots clearly show that achieving both high multimode capacity and high relative efficiency requires long optical coherence times, given the limitations in bandwidth.

Multimode capacity of the AFC spin-wave memory
AFC spin-wave memories require efficient transfer of the optical excitation over the entire bandwidth Γ of the AFC. This can be achieved by chirped adiabatic control pulses [44]. In NMR research, inversion pulses based on complex hyperbolic secant pulses, or sech pulses, were proposed for selective inversion of a flat frequency spectrum [42]. In the adiabatic regime the sech-pulse bandwidth is entirely determined by its frequency chirp range, which follows a smooth tanh function, with a flat transfer efficiency over that bandwidth that can approach 100% with the appropriate pulse parameters [43]. However, the sech pulse has a smooth, almost Gaussian intensity profile, hence it doesn't make very efficient use of the cut-off duration T c allocated to the control pulse. More recently Tian et al. [45] proposed an "extended" sech pulse, called a hyperbolic-squarehyperbolic (HSH) pulse, which has a flat intensity profile of duration T s in the center and smoothed sech pulse edges, see Figure 1(c). The frequency chirp is still described by a tanh function, but with an extended linear regime in the center. As a result the HSH reaches significantly higher efficiency over the same bandwidth, given a cut-off duration T c . The analysis here will be based on HSH control pulses, as efficient transfer using the shortest possible duration T c is paramount for the temporal multimode capacity of AFC spin-wave memories.
For the analysis it is assumed that the HSH chirp width matches exactly the AFC bandwidth Γ. We further assume that the square part of the HSH pulse T s is much longer than the edges of the HSH pulse, and that the chirp width Γ is significantly larger than the Rabi frequency Ω of the pulse, the natural working regime for chirped, adiabatic pulses. Under these conditions the population transfer efficiency of the pulse can be written as (see Supplemental Material of Ref. [52]) Note that Ω and Γ are defined in natural frequency (Hz) and not in angular frequency (rad/s) as in Ref. [52]. Now, by setting π 2 T s Ω 2 /Γ = 4 it is assured that the transfer efficiency is at least 98%. In practice it will be slightly more efficient, as the smooth sech edges of the HSH pulse will also contribute to the transfer efficiency. If we introduce the relation T c = χT s , where χ 1, it follows that the spin-wave multimode capacity N sw t can be expressed as where we have used Eq. (1). As it is assumed that Γ > Ω, it follows that a certain number of modes will necessarily be consumed by the HSH pulse, where the exact number depends on the ratio Γ 2 /Ω 2 . The AFC spin-wave multimode capacity can also be expressed as a function of η T 2 and T 2 to account for the finite optical coherence time, by simply modifying the first term in Eq. (6) to yield Europium-doped Y 2 SiO 5 features long optical and spin coherence times, hence it is particularly favorable for temporal multimodality and long-duration spin storage. Storage experiments in the quantum regime have so far utilized the 151 Eu isotope [29,33,41,53]. Given the long AFC T 2 obtained in the 151 Eu:Y 2 SiO 5 system, it is particularly interesting to compare its experimental multimode storage capacity to the theoretical results obtained in Section 3. The bandwidth of 151 Eu:Y 2 SiO 5 AFC memories are limited by the overlap between optical-hyperfine transitions, see for instance [54], which in turn depends on the choice of three-level system used for spin-wave storage. So far AFC experiments have used either the 35 MHz or 46 MHz spin transitions, which limits the bandwidth to less than 5.7 MHz. Here we set the memory bandwidth to Γ = 5 MHz and the fixed storage time to 1/∆ = 50.7 µs, which results in a mode size of T m = 500 ns and a temporal multimode capacity of N t = 100 modes according to Eqs. (1) and (2), respectively. The intensity FWHM of the Gaussian modes were set to about 210 ns, giving a κ parameter close to the theoretical optimum of 2.38. The mean photon number in the input modes wasn = 0.99 ± 0.05, integrated over the mode size T m . The experimental photon counting histograms are displayed in Figure 5(a-c) and the zoom on the first 20 input/output modes shows clearly distinguishable modes with these mode settings. The average storage efficiency was 18 ± 2%.
In Figure 5(d) the AFC echo efficiency for a single input mode is shown as a function of 1/∆. The input pulse was a bright laser pulse and the AFC echo was detected by a photodiode. The zero-time efficiency of 41.3 ± 1.7% is the highest reported AFC echo efficiency without cavity enhancement. It is consistent with the measured peak optical depth of d = 5.8, which gives an optimal theoretical efficiency of 40.1% [55]. The AFC T 2 = 250 µs is slightly shorter than the value reported in Figure 3, which we attribute to larger sample vibrations in this experiment. The relative efficiency at 1/∆ = 50 µs is then η T 2 = 0.45. Given the bandwidth limitation of Γ = 5 MHz, one can store N t = 13 modes at a higher relative efficiency of η T 2 = 0.9, and N t = 28 modes at η T 2 = 0.8, according to Eq. (4).  [29,32,41,53]. Multimode storage of bright coherent input modes containing large numbers of photons has reached up to 50 modes [32]. The experiment involved a storage time 1/∆ = 41 µs, and the AFC coherence time was relatively short at T 2 = 110 µs, as compared to the state-of-the-art values reported here and in [41]. In addition, the HSH was too short, T c = 14 µs, for the memory bandwidth of Γ = 5 MHz, resulting in a poor transfer efficiency of 55% per HSH pulse. These factors contributed to an efficiency of only 1.6%. The theoretical capacity with these values is 54 modes, according to N sw (6)) and Eq. (1), in close agreement with the experiment reported in [32].

Current
Storage of weak coherent input pulses with a mean photon number of around 1 generally requires higher efficiencies given the read-out noise of the memory, particularly when spin-echo and dynamical decoupling techniques are employed to achieve long storage times [32,41,53]. Refs. [29,53] showed storage of 5 temporal modes using the 35 MHz spin transition in 151 Eu:Y 2 SiO 5 , with a spin-storage time of about 1 ms. Recently, storage of 6 temporal modes for a duration of T spin = 20 to 100 ms was demonstrated using the 46 MHz transition, based on dynamical decoupling (DD) of the spin transition. The main difficulty in combining spin-wave storage and DD of the spin transitions lies in the read-out noise generated by errors in the DD sequence [41,53,56]. To achieve high signal-to-noise ratio in this context requires a spin-wave storage efficiency in the range of 5-10%, which in turn reduces the multimode capacity.
The theoretical spin-wave capacity can be compared to the latest experiment in Ref. [41]. The AFC parameters were 1/∆ = 25 µs and Γ = 1.5 MHz. The bandwidth was smaller than the maximum limit of 5 MHz, in order to optimize the HSH transfer pulse efficiency given the limited Rabi frequency of Ω = 230 kHz, obtained with about 500 mW of power before the cryostat. The experimentally-optimized HSH pulse had parameters T c = 15 µs and T s = 11 µs (i.e. χ = 1.36). Using these values Eq. (6) predicts a storage capacity of N sw t = 5.6 modes, in accordance with the experimentally optimized value of 6 modes. The small difference certainly lies in the strong dependence of N sw t on the Rabi frequency, which is the parameter with the largest experimental error. For example, with Ω = 250 kHz one finds N sw t = 7.1 modes. The spin-wave storage efficiency at T spin = 20 ms was η sw = (7.39 ± 0.04)%.
It is clear that the Rabi frequency seriously limits the currently achievable multimode capacity for AFC spin-wave memories in 151 Eu:Y 2 SiO 5 . Currently the control pulse is applied on the weak transition of the lambda system [29,32,41,53], in order to favor the input transition in terms of optical depth. However, it is known that impedance-matched cavities can achieve 100% memory efficiency using weak input transitions [57][58][59][60][61]. Hence, a way forward is to use the strong transition for the control pulse, and use a cavity to compensate the low absorption on the input transition. According to the table of transition strengths in Eu 3+ :Y 2 SiO 5 [62], this approach would boost the HSH Rabi frequency by a factor of 2.7, giving about Ω = 620 kHz. In Fig. 6 the spin-wave multimode capacity is plotted as a function of memory bandwidth, over a range of excited state storage times 1/∆, for a higher HSH Rabi frequency of Ω = 620 kHz. In general there is a maximum mode capacity for an optimum bandwidth Γ, given Ω and 1/∆, according to Eq. (6). However, one also needs to consider the maximum bandwidth supported by the physical system. The dashed lines in Fig. 6 show the approximately maximum bandwidths achievable in 151 Eu:Y 2 SiO 5 , Pr 3+ :Y 2 SiO 5 (5 MHz) and 153 Eu:Y 2 SiO 5 (15 MHz), respectively. It follows from this plot that with an increased Rabi frequency, for 151 Eu:Y 2 SiO 5 and Pr 3+ :Y 2 SiO 5 the multimode capacity is chiefly limited by the system bandwidth. Reaching a temporal multimode capacity of 100 or more with Europium would likely require shifting to 153 Eu:Y 2 SiO 5 and using excited state storage times 1/∆ of 40 µs or more. This in turn requires long AFC coherence times to simultaneously achieve high efficiencies, according to Eq. (3), which is in principle possible given the long optical coherence times in Eu 3+ :Y 2 SiO 5 crystals. Finally we emphasize that waveguide-based quantum memories could provide a paradigm shift for higher bandwidth memories, as the mode confinement allows Rabi frequencies significantly higher than 1 MHz [63,64]. A particularly interesting system in this context is 171 Yb:Y 2 SiO 5 [65], where memory bandwidths of at least 100 MHz are possible due to the larger hyperfine splittings [52].

Temporal multimode storage experiments in Pr
Compared to europium-doped Y 2 SiO 5 , praseodymium-doped Y 2 SiO 5 has shorter optical and spin coherence times. In the case of the optical transition, the inferred coherence time from literature is 111 µs at 1.4 K, and up to 152 µs with an applied magnetic field, in the most commonly used crystallographic site [51]. However, the transition strength is larger, which can allow for shorter control pulses when performing spin wave storage. In the following two sections we will discuss the temporal multimode capacity of AFC fixed-delay and spin-wave memories in Pr:Y 2 SiO 5 .

AFC fixed-delay multimode storage in Pr
The longest reported AFC storage time in Pr 3+ :Y 2 SiO 5 was 1/∆ = 25 µs in Ref. [26]. The bandwidth of the memory was Γ = 4 MHz, limited by the hyperfine level spacing in Pr-doped Y 2 SiO 5 . Following Eq. (2), the maximum number of modes that can be stored is N t = 40 (where T m = 625 ns). Ref. [26] contained an analysis of the concurrence and heralding rate for an experiment entangling two Pr 3+ :Y 2 SiO 5 memories versus the number of possible stored temporal modes. It was shown that the heralding rate increases with the number of modes stored, demonstrating the advantage of temporal multimode storage.
In the same experiment, the T 2 measured from the AFC storage efficiency was 92 ± 6 µs, which is very close to the T 2 measured with photon echoes in the same sample of Pr 3+ :Y 2 SiO 5 . The resulting relative efficiency for 1/∆ = 25 µs is then η T 2 = 0.34. The application of a magnetic field and lower temperatures would be needed to increase the measured T 2 and to reach longer storage times. Improvements in efficiency are also still possible, which are beneficial to obtaining high rates of entanglement distribution.

Spin-wave storage in Pr:Y 2 SiO 5
The advantage in using Pr 3+ :Y 2 SiO 5 for temporal multimode storage becomes more apparent when performing spin-wave storage. The higher optical transition dipole moment produces a higher Rabi frequency for the same optical intensity, as compared to Eu, thereby reducing T c and thus allowing a larger temporal mode capacity given a similar Γ and ∆.
We here present experimental results of temporal multimode spin-wave storage using Pr 3+ :Y 2 SiO 5 . The experimental setup is similar to that of Ref. [23], except we used attenuated laser pulses (weak coherent states) as input modes. These had a Gaussian intensity profile, where the FWHM was T = 180 ns and the input mode size was T m = 620 ns, resulting in κ = 3.4. The average number of photons per mode was 0.90±0.05. The AFC delay was set to 1/∆ = 25 µs (the maximum storage time reported in the previous section), for which the AFC echo had an efficiency of (7.7 ± 0.3)%. The AFC bandwidth was Γ = 4 MHz, and the control pulse Rabi frequency was Ω = 410 kHz with a measured power outside the cryostat of 8.5 mW. The control pulses were not HSH pulses, but less optimal pulses with a Gaussian profile and a hyperbolic secant frequency chirp spanning 4.2 MHz, where T c = 5 µs and the FWHM was 2.5 µs. The resulting transfer efficiency was (72 ± 2)%. While spin-echo techniques could be applied to Pr 3+ :Y 2 SiO 5 systems to extend the storage time [39], none are applied in the following analysis. Figure 7 shows the retrieved temporal modes after spin-wave storage, where 20 modes were stored in (a) and 30 modes in (b). For 20 modes, the average signal to noise ratio, SNR, was 17.4 ± 0.4, with a corresponding spin-wave storage efficiency η sw of (1.88 ± 0.03)%. For 30 modes, SNR = 6.7 ± 0.1, with a corresponding efficiency of (0.63 ± 0.01)%. There is a drop in SNR and η sw when storing more modes because different T spin were used: 14.1 µs and 20.7 µs for 20 and 30 modes, respectively. There will be a residual signal from the AFC echoes, due to the finite efficiency of the control pulses, so T spin has a minimum length required to avoid the signal from the spin-wave output modes overlapping with the residual AFC echoes, which depends on the number of modes stored. The efficiency follows exp(− (πT spin γ spin ) 2 2 ln (2) ), where γ spin is the inhomogeneous broadening of the transition used for spin-wave storage, so η sw decreases for longer T spin . In this measurement, γ spin was measured to be 26.3 kHz. We can reduce this to 16.1 kHz in more ideal experimental conditions [23], which would increase the average efficiencies to approximately 3.5% and 2.4% for 20 and 30 modes, respectively.
Given the T c of 5 µs, the ideal T m , and the AFC storage time 1/∆ = 25 µs, it should be possible to store N sw t = 32 modes, in agreement with the 30 modes stored in Fig. 7(b). However, the control pulses have neither the ideal profile or optimum transfer efficiency. Using Eq. (6), which does assume a higher transfer efficiency (and taking χ = 1.36 like the Eu experiments), the maximum number of possible modes is 19. Although we can currently store more modes by using control pulses with a lower transfer efficiency, longer pulses are required to increase the total storage efficiency. Nonetheless, this still shows the benefit of using materials allowing strong Rabi frequencies for temporally multimode spin-wave storage.

Spectral multimodality
The wide inhomogeneous broadening of RE-doped crystals offers a significant intrinsic advantage for spectral multimodality. The inhomogeneous broadening, which allows for the implementation of the AFC protocol, arises from a static effect that increases the absorption spectrum of the ensemble, widening it by several orders of magnitude if compared to the absorption line of each single ion. For example, in Pr 3+ :Y 2 SiO 5 it results in an increase in absorption from about 1 kHz to 10 GHz, while the bandwidth of the AFC quantum memory is only 4 MHz. In Pr 3+ :Y 2 SiO 5 , frequency multiplexing has been demonstrated in the case of AFC fixed-delay memories, where 15 frequency bins of a photon pair, separated by 261 MHz and spanning almost 4 GHz, have been stored simultaneously in a waveguide-integrated quantum memory [36]. The storage of weak coherent states multiplexed in 26 frequency modes in a Ti:Tm:LiNbO 3 waveguide has also been demonstrated, followed by feed-forward-controlled frequency manipulation [35], as part of a proposal of quantum repeaters based on frequency multiplexing and AFC fixed-delay memories. A demonstration for spin-wave storage has also been realized, with the storage of weak coherent states in two spectral modes separated by 80 MHz [66].
The maximum number of spectral modes that can be stored in the inhomogeneous profile of RE-doped crystals is limited by the number of independent AFCs that can be realized. This can be quantified by considering the specific energy level structure of the ions. The position of holes and anti-holes during spectral hole burning in the inhomogeneous profile will be dictated by the spacing between hyperfine levels, with the most distant ones appearing at frequencies ∆g + ∆e and −(∆g + ∆e) from the central  Figure 8.
(a) Sketch illustrating the minimum spacing required to prepare independent features of width ∆f in the inhomogeneously broadened profile of RE ions, with total hyperfine splitting of the excited and ground state levels of ∆e and ∆g, respectively. The preparation of any spectral feature, for example a spectral hole as illustrated, results in a series of holes and antiholes appearing at either side of the original one, with the same width ∆f . The furthest ones will then appear at a spacing of ∆e + ∆g. The height of the side features depends on the branching ratio of the specific transitions, which are omitted in this figure. (b) Maximum number of frequency modes N f m that can be stored in a RE-doped crystal, as a function of the inhomogeneous broadening width Γ inhom and the spacing between the different AFCs.
hole [43], where ∆g (∆e) are the total hyperfine splitting of the ground (excited) level. If the spectral width of the feature that we are generating is ∆f , then this feature will involve ions within the frequency range ±(∆g + ∆e + ∆f /2). The actual width ∆f varies depending on the system, and can be as small as the AFC width Γ (like in Eu), or larger due to the details of the optical pumping sequence (like in Pr). Therefore, independent AFCs each dedicated to a different frequency mode will have to be realized in the inhomogeneous broadening at a distance larger than 2(∆g + ∆e) + ∆f between each other (see Fig. 8), so to avoid addressing the same ions and thus degrading the quality of one AFC while preparing the others. For a square inhomogeneous broadening of width Γ inhom , the number of frequency modes that can be stored, N f , shown in Fig. 8(b), can be defined as This approximation is valid if only the central portion of the inhomogeneous broadening of a RE-doped crystal is considered. The specific profile of the inhomogeneouslybroadened absorption and its maximum value will determine a different maximal optical depth (OD) available for the realization of each AFC according to their position. The OD decreases with distance from the centre of the absorption band, therefore limiting the maximum efficiency achievable for the different frequency modes. The maximal efficiency reachable for different ODs follows the formula [55]: whered = d /F and F is the finesse of the comb and d is the OD. Eq. (9) is valid for a comb with square teeth, which provides the highest efficiency, and backward  retrieval from the crystal. This could be implemented by performing spin-wave storage with counter-propagating control pulses [18]. Note that backward retrieval can also be reached using an impedance-matched cavity [57,58], but the calculations for the efficiency in that case would require a more complex treatment.
6.1.1. Discussion for Pr 3+ :Y 2 SiO 5 The OD of a RE-doped crystal can be calculated as d = α L, where α is the absorption coefficient and L the length of the crystal. In our 5 mm long Pr 3+ :Y 2 SiO 5 crystal with a doping concentration of 0.05%, we regularly measure an OD of 10, corresponding then to an α = 20/cm in the center of the inhomogeneous broadening [43]. The inhomogeneous broadening in this type of crystal is well described by a Gaussian distribution with a FWHM of ∼ 10 GHz in OD [63,67]. From the level scheme of Pr 3+ :Y 2 SiO 5 , ∆g + ∆e = 36.9 MHz. However, this is valid only for an independent AFC, and storage in the spin-wave requires additional room for the realization of single-class features and the application of control pulses [68]. Therefore, the width of the spectral feature to be considered is larger than the bandwidth of the AFC: for Pr 3+ :Y 2 SiO 5 we have to consider ∆f = 18 MHz, and therefore 2(∆g + ∆e) + ∆f ≈ 92 MHz. The maximum efficiency achievable at different positions in the inhomogeneous broadening, calculated according to Eq. (9), is reported in Fig. 9(a). Panel (b) instead reports the mean storage efficiency for an increasing number of stored modes, considering also different maximal ODs of the crystal. This is naturally decreasing, as more modes with lower efficiency are stored in the wings of the inhomogeneous broadening. However, it is important to notice that while the mean efficiency is decreasing, the rate at which photons can be stored in the crystal increases linearly with the number of modes [36]. Moreover, the decrease in efficiency could be counter-balanced by the use of an impedance-matched cavity, where near-unity values of efficiency can be reached even for very low values of OD and individually adjusted for every frequency mode.  [69], where the inhomogeneous broadening is about 1.6 GHz [29]. The frequency multimode capacity is thus at most 3 modes for 151 Eu in Y 2 SiO 5 . However, there is a prospect of increasing the inhomogeneous broadening by introducing compositional disorder through co-doping, as observed in erbium-doped Y 2 SiO 5 crystals [70,71]. There are however open questions as to the preservation of optical and spin coherence times in this procedure.

Spatial multiplexing
Spatial multimodality is another precious asset available to any RE-doped crystals, that can be employed to significantly increase the number of modes available. Spatial multimodality is particularly appealing as any additional spatial mode stored would not result into a decrease in efficiency, as was the case in temporal and frequency multimodality (without an impedance-matched cavity). In the former, either the T 2 of the system or the separation between the modes would decrease the total efficiency, while in the latter the variation of OD across the spectral modes would result in a different maximal efficiency. In contrast, the OD is flat for parallel beam paths across the surface of the RE-doped crystal, resulting in a constant storage efficiency. This property has already been demonstrated by using two portions of the same Pr 3+ :Y 2 SiO 5 crystal to store a polarization qubit [72]. Atomic clouds behave differently, in that each spatial mode experiences a different OD, depending on its relative position to the center of the cloud. At the same time, atomic clouds do not show a static inhomogeneous broadening, meaning that they cannot take advantage of spectral multiplexing, and have only limited access to temporal storage due to the bad scaling in terms of optical depth [19]. For this reason the spatial degree of freedom has been largely exploited in such systems [14][15][16][73][74][75][76][77].
In RE-doped crystal, different spatial modes could be realized using for example electro-optical deflectors addressing different positions in the crystal, following a similar approach as in atomic clouds [14,78,79]. Further exploiting the solid-state nature of the crystal, a matrix of waveguides could be generated along the length of the sample, where different memories are addressed using commercial fiber arrays, which are at a distance of 127 µm between each other. This would result in 62 modes per mm 2 of crystal surface, allowing extremely high level of multiplexing. Note that this is not a fundamental limitation: a waveguide-based in-and out-coupling optical chip could collect and tightly pack modes from optical fibres with low bending loss waveguides. The chips could then be fabricated to terminate into a much denser matrix, increasing the number of modes in the crystal by more than one order of magnitude and limited only by evanescent coupling among the waveguides.
For quantum repeater operations [9], each of the spatial modes could be addressed by an independent photon source, but this solution would make the scaling challenging. This limitation only affects the case of an external source, while would not be a problem for emissive memories [14,33,34]. However, even for absorptive memories the number of independent sources could be drastically reduced by using switches that could direct photons from a single source to different spatial modes at different times, translating spatial into temporal multiplexing. Another open challenge with RE-doped crystals is to combine addressing of various spatial modes with cavity-enhanced quantum memories to reach high storage efficiencies. Cavities could in principle be incorporated into waveguide arrays, but more work is needed to reduce the coupling losses to such a device.
In the discussion above we addressed the spatial multimodality obtained by realizing copies of the same quantum memories across one crystal. There are also other ways to exploit spatial multiplexing that rely on using orthogonal sets of spatial distributions of the phases of the photons along the same optical path. These include the storage of twisted light, such as optical-angular momentum states [13,66,80,81] or vector vortex beam [82], or of Hermite-Gaussian or Laguerre-Gaussian modes [83]. So far, besides the usual linear polarization states, only orbital angular momentum states of light have been stored in RE-doped crystals [66,81].

Conclusions
In this article we have considered a detailed model for quantifying the temporal mode capacity of AFC fixed-delay and spin-wave memories. These formulas can be used to compare different materials and experimental configurations, based on measurable experimental quantities. Comparisons with state-of-the-art experiments in both Eu-and Pr-doped Y 2 SiO 5 crystals demonstrate the validity of the theoretical models and further highlight both current limitations and future strategies for increasing the temporal multimode capacity. We finally considered the possibility of frequency and spatial multiplexing to further increase the multimode capacity, a key performance factor for future quantum repeaters.
Our analysis shows that extreme levels of multiplexing can be reached in rareearth doped crystals, one of the few systems where temporal, spatial and spectral degrees of freedom could be exploited simultaneously thanks to their solid-state nature. Rare-earth doped crystals are therefore an evident and powerful candidate in quantum communication applications, where the entanglement distribution rate is directly proportional to the number of stored modes. Moreover, tens of temporal modes and tens of frequency modes have already been stored in various rare-earth ensembles, and the storage of several spatial modes could be easily envisioned. Temporal multimodality could then be used in synergy with spatial and frequency multiplexing, resulting in the storage of tens of thousands of modes in one, millimeter-sized solid-state crystal.

Appendix A. Gaussian temporal mode profile
The intensity profile of a Gaussian pulse with a temporal FWHM of T can be written as I(t) = exp(−4 ln 2 · t 2 /T 2 ). (A.1) Note that the field amplitude FWHM duration is √ 2 · T . The corresponding power spectrum is where the spectral FWHM (in Hz) is Note that we work with power (or intensity), and not field amplitude. The goal here is to find an optimal relation between the pulse FWHM in time T , and the mode size T m . One natural choice is to set T m = 2T , for which 98.1% of the pulse energy is contained in the mode. Another natural choice is to set T m as twice the FWHM in the field amplitude, i.e. T m = 2 √ 2T , for which 99.9% of the energy is contained in the temporal mode bin. In practice any choice between these limits would yield low mode overlap in the time domain.
To find an optimal choice one can also calculate the ratio of the AFC bandwidth to the power spectrum FWHM of the input mode, Γ γ = π 2 ln 2 ΓT = 2.5π 2 ln 2 The choice of κ = 2, or κ = 2 √ 2, results in a ratio of Γ/γ ≈ 2.83, or 2, respectively. These in turn result in 99.9% and 98.1% of the power spectrum being contained in the AFC bandwidth of Γ. Comparing the two different choices it is clear that they either includes a larger portion in the time cut-off (T m ), or in the power spectrum cut-off Γ, respectively. Interestingly one can force the same energy content in the cut-off in both domains, by requiring With this choice, 99.5% of the power is contained in the cut-offs of both domains. The calculations above serve to demonstrate that the precise choice of κ is not very critical, as soon as it is comprised between κ = 2 and κ = 2 √ 2. It should also be noted that all formulas were derived assuming that the Fourier spectrum follows a Gaussian profile given by an ideal Gaussian pulse without truncation. This is idealized, and in reality the cut-off in the time domain will slightly modify the Fourier spectrum. If κ is chosen in the proposed region any correction factor is still low (i.e. less than 1%). For the optimal choice of κ = 2.38 this effect is particularly negligible, and the truncated Fourier spectrum still contains 99.4% of the pulse energy within the AFC bandwidth of Γ. preparation beam (700µm beam waist) optically pumped the energy levels and prepared the AFC, while a separate input beam (50µm beam waist) was used for probing the tailored absorption profile and to create the input pulses to be stored. The intensity and temporal shape of the pulses were controlled by two independent acousto-optic modulators, driven by programmable arbitrary wave generators. A linear silicon detector was employed for the characterization of the memory with bright pulses, while a single photon avalanche detector was used for storage experiments with single photon-level input pulses.

Appendix B.2. Memory initialization and AFC optimization
The AFC experiment is initialized with a class cleaning procedure [54], in order to select a unique energy level structure within the inhomogeneous broadening. In a second step the atomic population is polarized into one spin level. In a third step the AFC was prepared using the parallel AFC preparation technique described by Jobez et al. [60]. For each AFC delay 1/∆ the AFC preparation pulse parameters where optimized, these being the pulse amplitude, programmed AFC finesse and the number of pulse repetitions. Initially, a starting value of finesse is set close to the optimal theoretical finesse for square AFC peaks given the initial optical depth, see Ref. [55], with a low pulse amplitude to avoid power broadening. The number of pulse repetitions is then optimized to maximize the intensity of the emitted AFC echo. Then, the finesse is varied to further maximize the AFC echo intensity. The amplitude of the AFC preparation pulse is finally increased and the pulse repetition number decreased, in order to reduce the overall preparation time while keeping the same AFC echo intensity.