Retrieving past quantum features with deep hybrid classical-quantum reservoir computing

Machine learning techniques have achieved impressive results in recent years and the possibility of harnessing the power of quantum physics opens new promising avenues to speed up classical learning methods. Rather than viewing classical and quantum approaches as exclusive alternatives, their integration into hybrid designs has gathered increasing interest, as seen in variational quantum algorithms, quantum circuit learning, and kernel methods. Here we introduce deep hybrid classical-quantum reservoir computing for temporal processing of quantum states where information about, for instance, the entanglement or the purity of past input states can be extracted via a single-step measurement. We find that the hybrid setup cascading two reservoirs not only inherits the strengths of both of its constituents but is even more than just the sum of its parts, outperforming comparable non-hybrid alternatives. The quantum layer is within reach of state-of-the-art multimode quantum optical platforms while the classical layer can be implemented in silico.


INTRODUCTION
Classical machine learning techniques have shown their outstanding impact in handling vast datasets, while quantum machine learning offers groundbreaking potential by harnessing the power of quantum mechanics to revolutionize problem-solving with enhanced computational capabilities.This has spurred an interest in hybrid classical and quantum machine learning techniques for quantum machine learning [1][2][3][4][5][6][7][8][9][10][11].This innovative approach harnesses the specific capabilities of both classical and quantum systems, leveraging classical algorithms and quantum computational power to address complex problems that surpass the limits of classical methods alone.Popular implementations are variational quantum algorithms [9] used in optimization and learning problems that combine parameterized quantum circuits implementable in noisy intermediate scale quantum (NISQ) devices [10] with classical optimization techniques.Examples are quantum variational eigensolvers [4,12], quantum approximate optimization algorithms [3], generative models [6], and classifiers [13].The possibility of combining classical neural networks to optimize control parameters of parametrized quantum circuits has also been explored in the context of meta-learning [7] and quantum circuit learning [5].Recently, multiple Kernel methods have also been proposed [11] in (hybrid) pairwise combinations of several classical-classical, quantumquantum, and quantum-classical kernels with supportvector machines to improve their expressivity.An architecture that naturally falls within the definition of hybrid classical-quantum algorithms is represented by quantum reservoir computing (QRC) [14][15][16], together with its static version known as quantum extreme learning machine [17][18][19] (see Ref. [8] for an inclusive review).* jsinok@utu.fiQRC combines the richness and expressivity of the quantum dynamics with the easy and fast trainability typical of classical reservoir computing, an approach for timeseries processing inspired by recurrent neural networks and where it suffices to optimize the output layer leaving the hidden layer(s) untouched [20][21][22].
Meanwhile, deep neural networks have shown their outstanding capabilities [1,2].The idea behind such deep learning is that complex problems can be better solved by taking advantage of the ability to process data hierarchically, addressing the different characteristics of the inputs in each of the different layers within the model.Deep learning has been applied very successfully to, for instance, computer vision or natural language processing [23,24] as well as detection of entanglement from incomplete measurement data [25].One of the main issues with deep learning is the large number of parameters that need to be optimized in the training phase, which makes it extremely costly.This problem becomes much less significant when deep architectures meet reservoir computing algorithms where only the output layer needs to be trained.Examples of deep reservoir computing can be found in Refs.[26][27][28][29][30][31] where it was shown that the sequential structure of the reservoir hidden layers enriches the dynamics and, as a consequence, improves the performance of the system.
In this work, we combine the pursuit of hybrid approaches and deep learning by introducing deep classicalquantum reservoir computing.In our proposal, the deep architecture is hybrid itself, as the hidden layers consist of a cascade of a QRC and a classical echo state network (ESN).QRC allows the coherent processing of quantum inputs for temporal tasks [32], has the potential for a rapidly increasing information processing capacity due to its rich dynamics [8], and also inherits many of the benefits of its classical counterpart [29]: engineering simplicity, multitasking, and fast training.Indeed, the internal connections of the reservoir can be random and a fixed reservoir can solve many tasks by just using a different output layer trained by simple linear regression.
Reservoir computing has been successfully implemented in different platforms for time series processing [33].When moving to QRC implementations, the output extraction presents new challenges due to the invasive nature of quantum measurements [8].Interestingly, decoherence does not always represent a limiting factor [34][35][36][37].Sequential processing can be achieved with different strategies based on indirect [38][39][40] quantum non-demolition measurements [41].In particular, Refs.[39,40] combine input injection, output extraction, and a physical ensemble in a scalable multimode quantum optics platform.Real-time QRC is performed continuously on the injected quantum states, with sequential processing not relying on external memories.Furthermore, this implementation is successful with experimentally convenient resources, e.g.Gaussian states.
Building on the advances in both classical and quantum approaches, our goal is to show the advantage of their integration into hybrid designs.Indeed, while quantum settings allow for a natural quantum input embedding, the output observables are limited to linear temporal processing [42].Additionally, for any finite ensemble, they suffer from rapidly decaying memory [38,39] and expressivity [43], being the signal-to-noise ratio of the observables a limiting factor.On the other hand, considering only a classical reservoir computer injected with a (measured) quantum input would require exponential resources or not reveal full information of the data without suitable quantum preprocessing (e.g. when the measurements are not tomographically complete).
We show that our hybrid setup combining the real-time QRC [39] with a powerful and versatile classical reservoir computer, namely an echo state network [20], has the same strengths as both of its constituents.As opposed to the case of standard QRC [44], nonlinear functionals of the temporal quantum data become possible.Additionally, new strengths appear as the memory can overcome the decay of the quantum layer alone, and significant advantages over purely quantum and purely classical alternatives may be observed in both linear and nonlinear tasks.In the proposed hybrid architecture, the quantum preprocessing allows the reconstruction of the full input covariance matrix from incomplete (homodyne) measurement.The one-time output reveals many past input states or their quantum features (for instance entanglement).With the QRC, high-fidelity reconstruction would require large ensembles with a size that needs to increase exponentially with the delay to sustain the signal-to-noise ratio (SNR) of all the state components.The ESN can assist in an accurate reconstruction by post-processing the measurement data.Indeed, this can reduce the required ensemble size by over an order of magnitude.We show how the nonlinear memory allows the detection of, for example, entanglement and purity, provided both the real-time QRC and ESN are of moderate size.HD     2    copies in fiber FIG. 1.The hybrid architecture.The input, a time series of quantum states, is processed by the QRC to produce a time series of classical information as a linear combination of its estimated observables according to the trained matrix W (2) .This is injected to the ESN through a random coercion matrix C. The final output is a trained linear combination of both the estimated QRC observables and ESN variables according to matrix W out .Specifically, at each time step a product state of N copies of the input interacts with the quantum reservoir-a physical ensemble of size M -via a beam splitter of reflectivity R. The two nonlinear χ (2) crystals with sparsity p couple the optical modes both in the optical fiber loop and before the homodyne measurements.The used observables are the elements of the maximum likelihood estimate of the covariance matrix of the x-quadratures, σx.

Deep hybrid architecture
Reservoir computing is an established supervised approach for temporal tasks, where each output at a given time can be trained to approximate any functional of the recent input history [22,33,45].Quantum reservoirs, beyond potential performance boosts, offer the unique advantage of coherent processing of quantum states as inputs [8,16,32,46], as will be done here.While quantum states can be processed by classical computers, this is generally exponentially costly in terms of resources.In this sense, using a quantum reservoir offers an intrinsic advantage.Our proposal, shown in Fig. 1, is based on combining a quantum and a classical reservoir computer in a deep or cascaded configuration.The goal is to benefit both from the quantum preprocessing and the nonlinear classical post-processing to solve quantum tasks efficiently, with high accuracy and with experimentally convenient resources.The classical RC consists of an ESN (described below).The output of the hybrid classicalquantum reservoir computer is trained to the desired tasks, optimizing the matrices W (2) and W out , which form the QRC output and the ESN output, respectively (see Fig. 1).
The quantum layer is accounted for by the real-time QRC introduced in Ref. [39], based on a multimode quantum optics platform shown in the bottom half of Fig. 1.The reservoir consists of N optical modes in a feedback loop and coupled to each other by a nonlinear χ (2) crystal.A physical ensemble of M identical copies of reservoirs can be implemented as a sequence circulating inside an ideal lossless optical fiber.The input modes are in zero mean Gaussian states and interact with the reservoir modes through a beam splitter of reflectivity R such that at R = 1 they are reflected directly to the readout-a second χ (2) crystal followed by homodyne measurements of the x-quadratures of all modes-and at R = 0 will be transmitted, circulate once and then transmitted again to the readout.Nontrivial memory effects appear at intermediate values of R where each reflected set of modes depends on (recent) input history, with memory provided by the feedback loop.The nonlinear crystals distribute the information across the available degrees of freedom, leading to linearly independent observables that play the role of the computational nodes.
The Hamiltonian of a χ (2) crystal in the QRC is of the form where i = √ −1, a j and a † j are the annihilation and creation operators, respectively, indexed by mode and satisfying the commutator relation [a j , a † l ] = δ jl .The reservoir is a complex network [47,48] with (real) couplings g jl and h jl .Following Ref. [39] they are random numbers uniformly distributed as g jl ∈ [0.1, 0.3] and h jl ∈ [0.2, 0.4], scaled with the modes frequencies (ω j = 1); these parameters are consistent with squeezing levels of state-of-the-art experiments.The time ∆t inside each crystal is set to one.Here we additionally introduce a sparsity parameter p which controls whether a coupling is on or off in the network, that is to say once all weights have been randomized each is individually kept with probability p and otherwise set to 0 such that still g jl = g lj and h jl = h lj .For p = 1 the network is generally fully connected, whereas setting p = 0 removes all couplings and effectively the crystals.As shown in Methods, the fully connected configuration is not always the best option as one could expect, and intermediate values can optimize performance.
Because all states will be zero mean Gaussian states and the Hamiltonians are quadratic in the creation and annihilation operators, the relevant state remains a multimode squeezed vacuum state at all times and as such is fully described by its covariance matrix (σ(x)) jl = ⟨x j x l +x l x j ⟩/2−⟨x j ⟩⟨x l ⟩ where x = {x 1 , p 1 , x 2 , p 2 , . ..} ⊤ is the vector of the quadrature operators of the field indexed by mode.In general, x = a+a † and p = (a−a † )/i, making the covariance matrix of the vacuum state |0⟩ normalized such that σ(x, p) = 1 0 0 1 .For each time step k, the input is a product state [39,40], with the covariance matrix σ(x in k ) ⊕N where ⊕ is the direct sum (for the sake of reducing clutter we drop the exponent from now on).With each injection, the input quantum state is set to a new value and coupled to the cavity modes through the beam splitter (bottom of Fig. 1).Therefore, at each time step k + 1 there are not initial correlations between the reservoir and input modes, ).Here x R k accounts for the reservoir modes at the end of the previous time step, x in k+1 are the new input modes.The covariance matrix of the reservoir and output updates as where the 4N × 4N symplectic matrix S(∆t) accounts for the action of the beam splitter and the two nonlinear χ (2) crystals (see Eq. ( 12) in Methods).Then, due to the complex dynamics of the reservoir, σ(x R k+1 ⊕ x out k+1 ) in general displays correlations; we remind that the argument x R k+1 ⊕ x out k+1 is just a vector of operators.The QRC outputs are real numbers obtained from the homodyne measurements of the x-quadratures of the output modes, and beyond ideal conditions, we consider finite ensembles (of size M ).Specifically, the output o QRC k+1 is a linear combination of the elements of σx (x out k+1 ) and a bias term.Here σ x is the N × N covariance matrix of only the x-quadratures and σx its maximum likelihood estimate calculated from the homodyne measurement data over the ensemble of size M ; because σx is a symmetric matrix it provides N (N + 1)/2 computational nodes.
In the case at hand the measurement back-action averages out as explained in the Appendices of Ref. [39].In QRC, the output layer is usually easily trained through a Ridge regression actually implemented on a classical computer [8], so that a somewhat hybrid nature of the architecture is already present in the original setting [49].
Here instead we consider a deep hybrid design where both the quantum and the classical components are reservoir computers, in order to boost the overall performance and enquire about their respective capabilities.
The classical layer is an ESN, a powerful and versatile reservoir computer [20].In practice, it would be implemented in silico on a (classical) computer.The state of the ESN at some timestep k is a vector of N ESN real numbers x ESN k where N ESN is the number of neurons in the network.With each new input, the state updates as where f is the activation function, ρ, ι ∈ R are the feedback gain and input gain parameters controlling the importance of the previous state and latest input, respectively.The classical ESN vector x ESN k+1 and the (new) injected classical inputs s k+1 (vector of dimension d coming form the covariance outputs of the QRC) are linearly combined in this classical reservoir through two random real matrices W and C. The weight N ESN × N ESN matrix W accounts for the internal connections between the ESN neurons; the elements from a uniform random distribution in the interval [−1, 1], are scaled setting a unit spectral radius of the matrix.The N ESN ×d coercion matrix C is generated in the same way and is responsible for converting the input to match the ESN size.The activation function is f (x) = ln(1 + exp(x)) and the function acts elementwise.The hybrid architecture output o k+1 is a linear combination of all computational nodes, namely the N (N + 1)/2 independent elements of σx , the N ESN variables in x k+1 , and a bias term.
As common in reservoir computing, the sufficiently long input time series is divided into three parts, namely preparation (or wash-out), training, and test.During preparation, the transient dependency on the initial state is removed (echo state property).Training aims to optimizing W (2) and then W out (see top of Fig. 1), in a two-step process.The elements of σx are recorded during the training phase and linear regression is used to minimize the squared error between a given target output and the actual output, fixing W (2) .This output is used as input by the ESN and the process is repeated to optimize the weights in W out .Finally, the test phase is used to assess how well the trained reservoir computer generalizes to cases not used in training; all the shown results in the following represent tasks performance for the unseen data, i.e. in the test phase (see also Methods).We note that in this two-step process, the target for training W (2) needs not be the same as the final target.We will describe all targets as the considered tasks are introduced.
The performance of our hybrid architecture will be tested in four different tasks, displaying the ability of the system to store and process information of random sequences of quantum states: short-term memory, as well as trace, determinant, and entanglement detection with delay.The first two require linear memory to be efficiently solved, while the last two are nonlinear.
For convenience, the selected parameters for the presented results, when not differently indicated, are reported in Tab.I. How their values have been chosen is explained in Methods, under Hyperparameters.In particular, the default values of N and N ESN are chosen to ensure an equal distribution of computational nodes between the quantum and classical layers.

Memory of quantum states
Let us begin our analysis by assessing the reservoir computer's ability to recall past (quantum) inputs through the short-term memory task, a standard benchmark in classical RC here generalized to quantum state sequences.The target for the deep hybrid setting is to output at time t the full information of a quantum state injected at time t − τ .Let σ t be the short-hand for the input covariance matrix at time step t; then the final target is simply σ t−τ where τ ≥ 0 is a delay.In the proposed hybrid architecture, the QRC is trained to output the unique elements of the input covariance matrix of Gaussian states with a delay τ ′ ≤ τ .The figure of merit is fidelity F (see Eq. ( 14) in Methods) between the input state (injected in the past) and the approximation achieved by the output after a delay time delay τ .As the output usually has some errors it is not guaranteed that F ∈ [0, 1] as it should; such unphysical cases are very rare especially when the performance is good, and are therefore simply excluded.We start considering, as inputs, random zero mean squeezed thermal states with a covariance matrix distributed as σ(n th , r, φ) ∼ σ(U(0, 5), U(0, 0.75), U(0, 2π)) (4) where U(min, max) is a continuous uniform distribution in the closed interval [min, max] and with a slight abuse of notation we indicate how the thermal excitations n th , squeezing parameter r and phase of squeezing φ are distributed.
One of the features of the real-time QRC, beyond ideal assumptions, is that for any finite M the observables have limited SNR [38,39].The exact ideal covariance and averages of observables can be determined only at the limit M = ∞.The achievable performance for finite ensembles decreases exponentially with delay [39], limiting the short-term memory and affecting also the hybrid reservoir computer.This is investigated in the upper panel of Fig. 2. As expected, there is quite a large memory gap between the ideal case M = ∞ and a large ensemble of M = 10 6 especially at longer delays.Remarkably, in the proposed hybrid RC, the classical and quantum reservoir delays can be adjusted independently to sum up the total delay τ .This opens the possibility of improving the performance in memory tasks.Indeed in the lower panel of Fig. 2, the delay is fixed to τ = 5 and the default parameters are compared to the case where short-term memory is boosted by increasing the size and feedback gain of the ESN while setting the QRC delay τ ′ = 0.
▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ This greatly reduces the physical overhead required to reach a given fidelity: for instance, with bigger ESN the fidelity is F ≈ 0.95 at M = 1000 but in the default case this requires an ensemble of size M = 10 5 .Importantly, measurements of only the x-quadratures would not normally suffice for high fidelity for any delay as they would not determine the entire single-mode covariance matrix.However, this limitation is lifted by the QRC.Indeed, the χ (2) crystals enable interactions in the reservoir that effectively distribute the information across different optical modes.A more complete state reconstruction is then possible through the observables of the optical network.
Linear and nonlinear memory for energy, purity and entanglement retrieval As the QRC dynamics is governed by a linear inputoutput map for the state [44], Equation ( 2) is linear in the input covariance matrix.The hybrid setup, instead, has nonlinear memory thanks to the ESN.Here we investigate both the linear and nonlinear memory and check how performance depends on both N and N ESN .To this end, we introduce another simple linear task and two nonlinear tasks to provide a broad assessment.Specifically, as a linear task in the quantum state, we consider the memory trace task where the target is the trace of the covariance matrix with delay, Tr(σ t−τ ), proportional to the input energy -and not to be confused with the trace of the density operator.The target for the real-time QRC in the trace task is Tr(σ t−τ ′ ).
More complex tasks are the retrieval of entanglement and purity of past quantum inputs whose information is stored in the reservoir, based on a (delayed) single-time measurement.Both entanglement and purity are nonlinear tasks, obtained respectively from the lower symplectic eigenvalue and from the determinant of the covariance matrix.Their retrieval, in presence of delay, requires nonlinear memory.The respective targets for the QRC are set as in the short-term memory task and the final targets of the deep reservoir are Det(σ t−τ ) and the logarithmic negativity E(σ t−τ ), whose definition is reminded in the Methods Section.All of the new tasks reveal partial information about the input: the trace of a covariance matrix is directly proportional to the energy of the system (since the mean is zero), the determinant completely determines the purity, and the logarithmic negativity is an entanglement measure.Unlike for others, for the entanglement detection task, the input states are two-mode squeezed thermal states distributed as σ(n th , s) ∼ σ(U(0, 1.25), U(0, 0.75)) (5) where n th is the amount of initial thermal excitations in either mode and s is the two-mode squeezing parameter.We address the performances in all these tasks, assessing the improvement provided by increasing the sizes of the quantum and classical reservoir layers and results are summarized in Fig. 3.The figure of merit for delayed energy, purity, and entanglement is the NMSE (see Eq. (7) in Methods), which vanishes for perfect correlation between the target and actual output and becomes 1 at the limit of a long training phase when there is no correlation [50].For the state estimation task infidelity 1 − F is shown.Results in Fig. 3 display all these tasks performance when N varies from 1 to 9 in steps of 2 (and the number of trained weights label the axis).Notice that in the delayed entanglement detection, if N is even, N/2 copies of the input state 5 are introduced at each time step, and if it is odd the extra mode will be in the vacuum state; consequently the task is not well defined for N = 1 (as indicated by the hatching in the panel of the last panel of Fig. 3).For ESN the number of trained weights coincides with network size N ESN .In all cases the quantum layer is necessary and cannot be compensated by the ESN.While the performance of the classical reservoir is always improved by the quantum layer, the latter can pose a bottleneck, as is particularly evident in the trace task.However already with N = 3 (6 trained wigths in the QRC) a decent performance can generally be reached if the ESN size is sufficiently large.Also, we find that N ESN does not limit the performance in linear tasks as the final readout layer can use the linear memory of the real-time QRC.In nonlinear tasks such as the determinant task increasing N cannot compensate for a small N ESN ; both layers become critical for a high performance.

Performance advantage of deep hybrid design
Instead of using the hybrid reservoir computer, one could discard the ESN and use the output of the real-time QRC only.Alternatively, one could measure directly the x-quadratures of the available copies of σ t at each time step to form the estimate and use it with the ESN alone.Building on the results above, dropping either of the reservoirs is expected to hinder the performance because the real-time QRC is linear in σ t the measurements are not informationally complete requiring the quantum preprocessing.Furthermore, as we have seen hybrid reservoir computing is not limited by the same SNR scaling of the QRC and can achieve comparable performance with a significantly smaller experimental overhead (less measurements).In the next, we evaluate the performance of the hybrid architecture in comparison to fully quantum and classical reservoirs, for fixed output layer sizes.
In Fig. 4 we show the hybrid performance together with those of the real-time QRC and the ESN, in all the tasks considered before.While the Hilbert space offers a larger size than a classical reservoir with the same components [14,51], here we do not make the assumption of an equal number of units.For a fair comparison, it is the size of W out to be kept fixed in all three cases, even if this requires a larger number of classical neurons.For the QRC, this is achieved by using two independent random instances, and for the ESN by simply doubling the number of neurons.The insets show the hybrid performance in detail using a smaller scale.Remarkably, the NMSE of the hybrid setup can be even three orders of magnitude better than the classical and quantum reservoirs, in both linear and nonlinear tasks.In all cases the hybrid performance in retrieving past states or their information (energy, purity, or entanglement) tends to be better or is at least comparable.This is a robust result and suggests that the hybrid advantage can be expected in general.
In most cases the homodyne measurements alone do not reveal sufficient information to solve the tasks, causing the ESN to fail.The sole exception is the entanglement detection task; indeed, the covariance matrix of the used states has only two linearly independent elements which are both revealed by measuring only the xquadrature as shown by Eq. ( 9).Still, the ESN struggles to simultaneously handle the delay and the nonlinearity; presumably, this could be solved by increasing its size further.That being said, for generic bipartite states determining the logarithmic negativity requires the determinant of the blocks and the full covariance matrix and can therefore be expected to be impossible from measuring just the x-quadratures.
As expected, the real-time QRC can't cope with the nonlinear tasks.Furthemore, even in linear tasks, it is not competitive at higher delays due to the limited accuracy.The most striking difference is in the trace task where in fact the full delay is moved to the final readout layer by setting τ ′ = 0.This results in an error at delay τ = 5 that is approximately two orders of magnitude smaller for the hybrid model.As a note, in the state estimation task setting τ ′ = 0 would lead to worse performance for the hybrid and therefore the shown results use the default value (τ ′ = ⌈τ /2⌉).This could be inter-    preted as an indication that the task is more difficult and hence it is better to share the delay.

DISCUSSION
We have introduced a deep architecture for temporal quantum data processing by concatenating a scalable real-time quantum reservoir computer with a classical reservoir computer.The quantum layer is based on experimentally convenient Gaussian states and homodyne measurements, evades the measurement back-action, and is amenable to continuous extraction of data from a physical ensemble of identical random reservoirs.Our proposal combines its potential for a high information processing capacity and quantum preprocessing of the data with the nonlinear and versatile memory of the classical layer, leading to higher performance than either layer can achieve alone.Importantly, this performance advantage cannot be compensated for by just making the shallow counterparts bigger as it arises from the combination of their respective unique properties whereas training of the hybrid setup can be done by applying standard methods and is therefore of comparable ease.While the relatively small classical layers we have used are enough to support our main conclusions, they could be easily scaled up significantly; this would provide a further boost to the short-term memory without increasing the physical overhead, i.e. the ensemble size of the quantum layer.
There is much room for further work.QRC has been proposed for temporal quantum data processing only re-cently [8,16,32,46], and indeed also the tasks considered here are benchmark tasks meant to be used as diagnostic tools to assess the capabilities of the reservoir computer rather than solve a practical problem.An interesting possibility to further explore comes from the memory capabilities of this design.Indeed, from single-time measurements, it is possible to gather, for instance, the entanglement evolution at several previous instants, offering a convenient alternative to continuous monitoring, where a final measurement can be used to reconstruct the dynamics.Identifying further use cases should however happen alongside proof-of-principle experiments.Our proposal is well suited for this purpose as it is within reach of state-of-the-art multimode quantum optics platforms.This would expand earlier experimental work on classical machine learning applied to tomographic tasks [52], demonstrating for example improvements in speed and the number of measurements [53] or the ability to also learn quantum trajectories [54].Simulations have suggested, e.g., the possibility of significantly increased robustness to noisy or incomplete data [55].Another avenue of further research would be to consider the possibility of having temporal correlations or entanglement, i.e. between inputs at different time steps.This could increase the importance of the quantum memory to redistribute and store the relevant information such that it leaves its fingerprint on the extracted data processed by the classical layer.Generalization beyond bipartite inputs is another possibility.Finally, the possibility of augmenting the quantum layer with suitable non-Gaussian operations can be envisioned, combining the hybrid setup with recent advances in multimode single-photon-added and -subtracted states of light [56,57] or considering single-photon instead of homodyne detection [58,59] to unlock even more power.
Note added : After the completion of this work, we became aware of Ref. [60] which also proposes combining a quantum and a classical reservoir computer, but to process classical temporal data.

Reservoir computing
Generally speaking, a reservoir computer is a dynamical input-output system such that under the drive of the input, its dynamical variables become diverse functions of input history [22,45,61].The output is trained to approximate the reservoir variables; typically linear combinations are used to offload the bulk of the computations to the system, i.e. the reservoir.This leads to a supervised machine learning approach well suited for temporal tasks, as recurrent neural networks.More formally, a reservoir computer stands on the echo state [20] and fading memory [62] properties, which essentially guarantee that eventually the effect of the initial conditions becomes negligible and the dynamical variables become continuous functions of only recent input history-so-called fading memory functions-respectively [45].The realtime QRC in particular is known to have these properties [39].Reservoir computers are ideal for solving tasks that can be well approximated by fading memory functions and have the advantage of robust single-step training and amenability to physical implementations [33] when compared to recurrent neural networks.The dynamical variables used to form the output are the computational nodes.In our case, they are the elements of the estimated covariance matrix of the x-quadratures σx and the neurons for the real-time QRC and the ESN, respectively.
During the training and testing phases, the states of the computational nodes are recorded into two matrices X train and X test such that the rows correspond to time steps and the columns to each computational node.The bias term is included by adding a unit column.Let ōtrain be the target output in the training phase.Then we set for example where X + train is the Moore-Penrose pseudoinverse of X train , minimizing the squared error i (o train,i − ōtrain,i ) 2 between the actual output o train = W out⊤ X ⊤ train and the target ōtrain .Test phase uses the trained weights to generate the output o test = X test W out .The standard figure of merit is the normalized mean squared error (NMSE), namely In the short-term memory task of quantum states, however, fidelity is used.Importantly, NMSE in the test phase depends on the ability of the reservoir computer to generalize beyond the training phase.Further remarks are in order.To account for the hybrid architecture shown in Fig. 1, the real time QRC is trained first.Its trained output is then treated as input for the ESN and training is repeated for the full hybrid setup.When the target is a vector, the framework above applies as is and in particular the trained matrix will have its own column for each element while the squared error is minimized element-wise.In all tasks where the final figure of merit is NMSE, the final target is also a scalar, making Eq.( 7) sufficient.

Gaussian quantum states and operations
Gaussian states and operations are covered in detail for example in Refs.[63][64][65].Zero mean Gaussian states are completely determined by the covariance matrix which accounts for the shape of the multinormal distribution in quantum optical phase space, or the Wigner function of a Gaussian state.The covariance matrix of the vacuum state |0⟩ is normalized to the identity matrix.For a single mode squeezed thermal state with n th thermal excitations, a squeezing parameter r and phase of squeezing φ, the covariance matrix reads σ(n th , r, φ) = (1 + 2n th ) y cosh + z cos z sin z sin y cosh − z cos (8) where y cosh = cosh (2r), z cos = cos (φ) sinh (2r) and z sin = sin (φ) sinh (2r).Two-mode squeezed thermal states are considered in the entanglement detection task.
The matrix for such a state with two-mode squeezing parameter s reads where now y cosh = cosh (2s) and y sinh = sinh (2s).The covariance matrix of a product state, such as the ones injected into the reservoir at each time step, is simply the direct sum of relevant covariance matrices.The initial state of the reservoir is also a product state, namely that of single mode vacuum states for each reservoir mode.
The corresponding covariance matrix is just an identity matrix of appropriate size.Unitary evolution is determined by a Hamiltonian H. Let M be the real symmetric matrix defined by where x = {x 1 , p 1 , x 2 , p 2 , . ..} ⊤ is the vector of the quadrature operators.Then the symplectic matrix S(t) that determines the evolution of the covariance matrix σ via σ(t) = S(t)σ(S(t)) ⊤ is given by S(t) = exp(ΩMt) (11) where Ω jl = −i[x j , x l ] is the symplectic form accounting for the commutation relations between the quadrature operators.
In the case at hand it is convenient to consider separately the two nonlinear crystals with Hamiltonians H 1 and H 2 and the beam splitter with reflectivity R. The symplectic matrix S(∆t) that accounts for the evolution during a single timestep reads where S i (∆t) = exp(ΩM i ∆t) and ∆t is the time spent inside the first or the second crystal.Importantly, the first (second) column of this block matrix acts on the reservoir (ancillary) modes and in general, redistributes the information such that both the reservoir state and the output will depend on input history as is necessary to achieve reservoir computing.After each action of S(∆t) the x-quadratures of the ancillary modes will be measured and then replaced by fresh modes accounting for the new input; in practice, the correlation blocks between the reservoir and ancillary modes are set to the zero matrix and the ancillary block to the new covariance matrix.
In general, the beam splitter will create correlations between the reservoir and ancillary modes which will lead to measurement back-action conditioned by the measurement outcomes.However, as shown in the appendices of Ref. [39], here the back-action averages out and one gets equivalent results by propagating the back-action free reservoir covariance matrix determined only by the Hamiltonian evolution and input history.
It remains to model the measurement process given a finite sample size M and to construct the maximum likelihood estimate σx of the covariance matrix σ x of the x-quadratures of the ancillary modes.It is convenient to combine these two steps into one by considering the sampling distribution of such a process, i.e. the Wishart distribution W (σ x , M ) [66] it suffices to draw a random sample from W (σ x , M ) to account for stochasticity caused by finite ensemble effects.For this, it is enough to sample suitable chi-squared distributions and the normal distribution of zero mean and unit variance, as well as find the Cholesky decomposition of σ x [67].In the limit M → ∞ one naturally recovers σx → σ x , and in this limit the sampling can be skipped.For the state estimation task, the target is a covariance matrix and both the fidelity F and the infidelity 1 − F between the actual and estimated state are used as the figure of merit.In terms of some density operators ϱ and ς the definition reads and the value can be found directly from the covariance matrices [64].It should be noted that since the training minimizes the squared errors between their elements it does not necessarily optimize fidelity.However, low errors tend to result in high fidelities, as might be expected.
Logarithmic negativity E(σ) is estimated in the entanglement detection task.It is based on the positive partial transpose criterion and quantifies how much the positivity of the partial transpose is violated, or how much the purity of the state would need to be decreased to make it separable.In the case at hand it is determined by d− , the smaller of the the two symplectic eigenvalues of the two-mode covariance matrix σ [68], via In the training phase, the target is − log 2 d− and the maximization is done separately before calculating the NMSE.It should be pointed out that in many sources the relevant quantity is reported to be − log 2 2 d− but this constant factor difference is due to a different normalization of the vacuum covariance matrix; the definition introduced above is equivalent.

Generation of random instances
The relevant parameters take the values given in Tab.I unless stated otherwise.Each random instance involves the generation of an input time series where elements are sampled according to either Eq. ( 4) or ( 5), the generation of the random Hamiltonians H χ (2) for the two nonlinear crystals, the weight matrix W for the ESN and the coercion matrix C. It should be noted that the Hamiltonians H χ (2) are not by construction stable; in the rare event that they are not, they are replaced by new random Hamiltonians.The relevance of stability for the real-time QRC is discussed in App.F of [39].The target time series is completely determined by the input in all tasks.As previously explained, the only effect of the finite ensemble size M is to introduce some random noise in the observables of the real-time QRC; this is accounted for by using the sampling distribution (see Eq. ( 13)) once for each time step.

Hyperparameter optimization
By hyperparameters we mean any parameters of the hybrid setup other than the trained weights of matrices W out and W (2) .The number of optical modes N , neurons N ESN and the ensemble size M are special as performance can be expected to never decrease as they are increased due to some very general properties of reservoir shifted Det(σ) • τ'=τ computers [69] for the first two and for M due to its connection to the SNR of the estimated covariance matrix σx .
Besides N the real-time QRC dynamics depends on the reflectivity of the beam splitter R and the sparsity parameter p.We will analyze how they affect performance in a linear task of estimating the off-diagonal element of the input covariance matrix, averaged over small delays up to τ = 3 in Fig. 5.Besides short-term memory, this also tests the ability of the quantum layer to reveal new information about the input state as normally measuring the x-quadrature can reveal only one of the diagonal elements.The performance behaves non-monotonically both in R and p and in particular, the tested short-term memory is optimized at intermediate values.The smallest error with the considered values is found at R = 0.4 and p = 7/9.Although considering longer delays would shift the optimal reflectivity to higher values we have lim-ited to small delays as the rest can be taken care of by the ESN.
Adding the ESN introduces new parameters: besides N ESN they are the separate delay for the real-time QRC τ ′ , feedback gain ρ, and input gain ι.The latter two are optimized with a simple lattice search separately for linear and nonlinear tasks.Unlike the others, τ ′ is a parameter unique to the hybrid setup.Its choice is investigated in Fig. 6 considering the trace and determinant tasks.Differences become noticeable starting from τ = 2.In the linear trace task letting the classical layer handle the full delay leads to the smallest error until eventually the ESN runs out of memory and the NMSE collapses near 1, leaving the even split to work best although still at a relatively high NMSE of 0.1.But in the nonlinear determinant task, the roles are almost reversed, except none of the approaches produce useful results at larger delays.All in all, it seems that the default value τ ′ = ⌈τ /2⌉ (with ⌈•⌉ for the integer part) tends to work well whereas τ ′ = τ tends to lead to inferior performance, possibly because then the SNR of the memory provided by the real-time QRC is the lowest.For linear tasks trying also τ ′ = 0 may be worthwhile.

2 ■ 10 FIG. 2 .
FIG.2.Effect of the ensemble size M on mean performance over 100 random realizations in input state estimation task and enhancement of the performance with a larger ESN.On the top panel, both the QRC and the ESN contribute with 45 trained weights and the delay τ is split evenly such that the real-time QRC is trained for delay τ ′ = ⌈τ /2⌉.For smaller ensembles, the performance drops quickly as τ increases.On the bottom panel, the delay is fixed to τ = 5 and the default settings are compared to the hybrid setup with a larger ESN with NESN = 295 with increased feedback gain ρ = 1.8 and τ ′ = 0.All other parameters are as in Tab.I.

2 FIG. 3 .
FIG.3.Effect of the reservoir size on mean performance over 100 random realizations in linear (shifted trace and state estimation) and nonlinear (determinant and entanglement estimation) tasks.The last one is not well defined for the smallest real-time QRC size as commented in the main text (hatched line).The delay is fixed to τ = 2 and split evenly between QRC and ESN in the deep architecture.Performance is quantified by NMSE except in state estimation task where it is quantified by infidelity.All parameters are as in Tab.I.

FIG. 4 .
FIG. 4.Mean performance of the hybrid architecture benchmarked against comparable purely quantum and purely classical alternatives in linear and nonlinear tasks.Insets show the hybrid performance in more detail.Shown results are averages over 100 random realizations.Unlike in others in the trace task τ ′ = 0.All other parameters are as in Tab.I.

9 ■ 1 / 3 ◆ 7 / 9 ▲ 1 FIG. 5 .
FIG.5.Effect of the value of reflectivity R and the sparsity parameter p on the real-time QRC performance.The task is to recall the off-diagonal elements of past input covariance matrices, shifted to have zero mean.100 random realizations have been considered for each value of R, p and τ .Other parameters are as in Tab.I.

FIG. 6 .
FIG.6.Effect of different values of the delay τ ′ for the real-time QRC on mean performance over 100 random realizations in both linear (shifted trace) and nonlinear (shifted determinant) tasks.At τ ′ = 0 (τ ′ = τ ) all delay is handled by the ESN (real-time QRC).Other parameters are as in Tab.I.

TABLE I .
Relevant hyperparameters in QRC and ESN, and default values used in benchmark tasks.Feedback gain ρ and input gain ι have different values for linear (short-term memory, trace) and nonlinear (determinant, entanglement detection) tasks.