Model-Free Control of Dynamical Systems with Deep Reservoir Computing

We propose and demonstrate a nonlinear control method that can be applied to unknown, complex systems where the controller is based on a type of artificial neural network known as a reservoir computer. In contrast to many modern neural-network-based control techniques, which are robust to system uncertainties but require a model nonetheless, our technique requires no prior knowledge of the system and is thus model-free. Further, our approach does not require an initial system identification step, resulting in a relatively simple and efficient learning process. Reservoir computers are well-suited to the control problem because they require small training data sets and remarkably low training times. By iteratively training and adding layers of reservoir computers to the controller, a precise and efficient control law is identified quickly. With examples on both numerical and high-speed experimental systems, we demonstrate that our approach is capable of controlling highly complex dynamical systems that display deterministic chaos to nontrivial target trajectories.


I. INTRODUCTION
Controlling dynamical systems is a ubiquitous problem in disciplines ranging from engineering to medicine. The fundamental problem in control engineering is to design control signals that are applied to a system with accessible inputs, referred to as a plant, so that it follows a desired behavior. Solutions to this problem have far-reaching applications, such as in autonomous automobiles [1]- [3] and aircraft [4], heating and cooling systems [5], robotic arms [6], and chemical industrial processes [7], to name a few.
One challenge in controlling dynamical systems is that, beyond textbook examples, they are almost always nonlinear, i.e., their behavior is a nonlinear function of the state variables or the accessible inputs. Control methods for nonlinear systems include linearizing the dynamics about a typical operating point and then applying linear control methods [8], evaluating a model of the system in real time for state estimation [9], or using artificial neural networks (ANNs) to perform the state estimation [10].
Typical ANN models are feed-forward networks consisting of layers of nodes that pass the weighted outputs of nodes from previous layers through a nonlinear function [11], i.e. the neural network forms a directed acyclic graph. These static functions can universally represent other nonlinear functions, but they do not naturally represent time-dependent nonlinear signals. Recurrent neural networks (RNNs), in which the ANN remains directed but is no longer acyclic, naturally have temporal dynamics that can be used to learn time series, but training the recurrent weights is notoriously difficult and typically fails to converge. This can be addressed by considering restricted RNN topologies. A popular one is the long shortterm memory architecture which has recently been applied to control [12].
Alternatively, we adopt an approach based on a new paradigm for machine learning -known as reservoir computing (RC) -that is well suited for dynamical systems [13], [14]. Here, the nodes of a RNN have their own dynamics and are hypothesized to be a universal dynamical system [15]. Reservoir computing can learn the 'climate' or phase-space attractor of complex systems using only a segment of the temporal evolution of a system observable [16]. Compared to deep learning, RC requires substantially less data to achieve good performance, requires much less training time due to the simplicity of the training algorithm, and achieves state-ofthe-art performance in time-series prediction [17]- [20], system identification [21], and spoken-word recognition [22]. Because most of the network is unchanged during the training process, RC is particularly well-suited to computing with dedicatedpurpose and low-power hardware, including novel implementations with delayed optical feedback [22] and electronic Boolean circuits [23].
As noted above, RC is well-known for its ability to form data-driven models for complex time-series. In the control context, this task can be thought of as creating a model for a plant in the absence of inputs. Unsurprisingly, this notion can be extended to include plants with accessible inputs [24], a task commonly referred to as system identification.
Once a plant is identified, a control law can be devised. Most often, closed-loop control is desired, where the plant input is a function of the desired plant observable and the actual plant observable. Techniques for obtaining such a function (or control law) are wide-ranging and include using a piece-wise linear approximation of the plant to direct construction with a feed-forward ANN; see, e.g., Ref. [25] for a recent review.
System identification is a common first step for controlling an unknown system, particularly when applying machinelearning based techniques such as ANNs. Recently, it has been shown [26] that this two-step process is not necessary with RC. In fact, reservoir computers are capable of directly learning an appropriate control law. This is accomplished by learning the system's 'inverse,' which we explain further in the following sections.
The first contribution of this paper is to expand the study of the RC-control method introduced in Ref. [26]. We provide motivation for the algorithm, explicitly demonstrate and quantify the ability of a class of RC known as an echo state network (ESN) [27] to 'invert' a system, and study optimizing the RC parameters that are specific to the control problem. The second contribution is to develop an iterative technique for adding parallel layers to the ESN controller, forming a deep ESN (dESN) [28] to achieve more precise control. We demonstrate the efficacy of the proposed algorithm with numerical and experimental results.
The rest of this article is organized as follows. First, we define notation and formulate the control problem. We follow by explaining the concept of direct inverse control and how it can be accomplished with RC. Next, we examine the the effects of varying hyperparameters using numerical studies on controlling the Mackey-Glass chaotic system. We then develop our multi-layered control algorithm for precise control and apply this algorithm to a number of numerical and experimental examples. Finally, we conclude and discuss future research directions.

II. PROBLEM FORMULATION
We assume that the plant is described bẏ where x are the plant internal states, y are the m plant observables, and v are the l accessible inputs. Generally, f and g are unknown, and the only information available is the simultaneous response of the plant to a user-defined input signal v train during a training period. In the following analysis, we assume that f is Lipschitz continuous with respect to x, and that g is 'typical' in the sense defined by Takens' embedding theorem [29], so that an observer with memory of a history of y(t) can construct a map to x(t). There may also be noise terms included in Eq. 1 and 2, but we do not explicitly model noise in the analysis below. Designing a controller requires an operation that reads a reference signal r and outputs a control signal v such that y → r. It is a closed-loop controller if v is also a function of the current value of y as measured at the the plant.
If v is constant over an interval from t to t + δ, where δ is a non-infinitesimal interval, then f (·, v) = f v (·) may be viewed as a differential equation parameterized by v. The Lipshitz condition implies that the value of x (t + δ) is determined by the initial conditions at t, i.e., where F v is a nonlinear evolution operator mapping x(t) to x(t + δ). It can be constructed approximately by repeated application of f over infinitesimal time steps.
If v is instead slowly varying from t to t+δ, then we expect this equality to instead be an approximation given by for some function F. In general, this function is not invertible since there may be multiple possible input trajectories v (t) that drive the system to a given future state and not all states may be reachable from the current state. However, if we restrict the domain to future states reachable from the current state, we can effectively invert this function as where F −1 is the function of interest for devising a controller for Eqs. 1 and 2 that chooses a principle value among possible inputs. In Sec. IV, the future value x(t + δ) is replaced with a desired future value calculated at the current time t in order to obtain a causal control law.

III. RESERVOIR COMPUTING
In RC, a recurrent network of time-dependent nodes and fixed connection weights (known as the reservoir) is driven with an input signal of interest. The response of the reservoir is then recorded, and a time-independent transformation is obtained to map the reservoir state onto a target signal. This prescription requires that the reservoir be sufficiently highdimensional and complex so that distinct input signals are effectively separated in the state space of the reservoir. It also requires that the system exhibit the echo-state property [13] (or, more generally, generalized synchronization [30]) with respect to the input signal. This ensures that the reservoir exhibits short-term memory, while eventually forgetting past values of the input signal.
Although many types of physical systems with different topologies have been proposed as the reservoir substrate (see Ref. [14] for a thorough review), it is common to use a recurrent neural network (RNN). Within this paradigm, a reservoir computer has three distinct components: a feed-forward input layer, a recurrent layer, and a feed-forward output layer. The input-to-reservoir connection weights W in and the reservoirto-reservoir weights W are randomly assigned to initial values and kept fixed. The network is then driven by an input signal y(t) during a training period during which the response of the reservoir u(t) is observed. Finally, given a desired output signal v d (t), the reservoir-to-output weights W out are chosen so that v(t) v d (t) during the training period after some initial transient is discarded. Because training only involves linear optimization of the output layer, there is no vanishing gradient problem that makes training deep learning networks difficult.

A. Echo State Networks
The reservoir of an ESN is an N -node recurrent ANN whose behavior is described by the differential equation where u is an N -dimensional vector and tanh is the vectorized function, W (dimension N × N ) is the adjacency matrix for the internal reservoir links, W in (N × m) is the input-weight matrix, and b (N ) is a bias vector. They are composed of random matrix elements that are fixed at the initialization of the reservoir and describe the reservoir dynamics with respect to an input signal y. The only trained parameter is W out (l × N ). After driving the ESN and observing the reservoir response, we use Tikhonov regularization to determine W out by mini- from t = T init to t = T train , where T init is long enough to get beyond the transient response of the reservoir, T train is the end of the training period, and β is a small regularization parameter chosen to prevent overfitting to data and can be chosen with standard cross-validation techniques. The fixed parameters W, W in , b are instantiated according to a number of hyperparameters that are selected prior to training. These are: the magnitude of the largest eigenvalue (also known as the spectral radius) of W, ρ; the proportion k of nonzero elements of W, which is also the mean in-degree of the network; the scale σ of W in ; the mean b mean and maximum b max values of b; and the time constant c. They are often selected by hand based on some heuristics [31], but may also be optimized by various algorithms [32]. In all cases, we find that typical choices for most of these parameters provide good performance and do not explore other values in detail.

IV. SINGLE-LAYER RESERVOIR CONTROLLER
Direct inverse control [33] involves modeling the relationship in Eq. 5 with some physical assumptions about the plant and observation measurements {y(t), v(t); 0 ≤ t ≤ T }. The function F −1 is used to devise a closed-loop controller by replacing x(t + δ) with the desired plant state. However, this function depends on the internal state x, whereas only the observation y is available to the controller. In a general scenario, this signal may be missing important information about x, such as when g projects x onto a lower-dimensional space.
The operating principle behind ESNs is their ability to synchronize, in a generalized sense, with their inputs [30]. This means that a reservoir coupled to y(t + δ) and y(t) will tend towards some function of x(t + δ) and x(t). If we denote the reservoir state by u(t), then for some (unknown) function Equivalently, u(t) is approximately a function of x(t + δ) and x(t) after some appropriate waiting time T init . Using this synchronization property for a reservoir with high enough dimension, then we can find a W out such that where the left hand side is the trained reservoir output that approximately 'inverts' the plant dynamics. The training data is generated by perturbing the plant with a small random signal applied to the inputs v train from t = 0 to t = T train + δ, which ensures the plant is stimulated with many frequencies to explore the complete phase space. During this time, triplets y(t+δ), y(t), and v train (t) are collected and used to train an ESN with v d = v train . The configuration of the plant and ESN in this training phase is depicted in Fig. 1a.
Through this training, the reservoir learns to approximately invert the internal plant dynamics using y without the intermediate step of learning F −1 . To control the plant, y(t + δ) is replaced with r(t+δ), where r(t) is the desired behavior of the plant. If the ESN has learned F −1 , then the resulting v(t) is precisely the control signal that drives y(t+δ) → r(t+δ). The complete dynamics of the controlled plant are then described by Eqs. 1 and 2 with Eq. 6 replaced by For simplicity and clarity, we write r(t + δ) ≡ r δ and split the input weights as W y in and W r in , the latter of which couples to y(t + δ) in the training phase and r(t + δ) in the control phase. The configuration of the plant and ESN in this control phase is in Fig. 1b. In physical implementations, driving the reservoir with y(t) and y(t + δ) can be accomplished with a delay line as shown in Fig. 1a. This couples W y in to y(t − δ) and W r in to y(t), which is the desired configuration under a shift t → t + δ done after the training phase is complete.
As we discuss below, the control algorithm is capable of controlling a wide range of systems. However, |y(t) − r(t)| does not converge to 0. This is to be expected because the reservoir computer only approximately learns F −1 . In the RC literature, it is known that learning improves with increasing N and hence this is a strategy to reduce the control error. However, as we show in Sec. IV-B, increasing N generally decreases |v(t) − v train | but not |y(t) − r(t)|. For situations where precise control is critical, an algorithm for improving the control error is desired. One of our contributions is to iteratively execute the control algorithm in a parallel configuration as described in Sec. V.

A. Choosing v train
During the training phase, we need to specify the training signal v train (dimension l) when the plant and reservoir computer are in the training phase shown in Fig. 1a. Identifying an optimal perturbation signal is an important problem in system identification and a number of methods have been developed [34]. In keeping with the spirit of the RC framework, we randomly generate v train according to a number of controlspecific hyperparameters.
Recall that Eq. 4 holds if v varies slowly with respect to δ. This suggests that v train be bandwidth limited with frequency cutoff 1/λ with λ > δ. Another consideration is the magnitude of perturbations p. Generally, large perturbations will be easier to learn because they have a greater effect on the plant. However, this may not be the best way to learn to control the plant and real-world control applications often require bounded inputs. Our approach is to generate a training signal from a uniform random distribution, which is Fourier-transformed and frequencies above 1/λ are dropped. The signal is then inverse-Fourier-transformed, and scaled to the range [−p, p] yielding v train with the required properties.
In addition to the usual hyperparameters discussed in the previous section, we also must optimize the control-specific hyperparameters. In the next subsection, we explain how we select these hyperparameters based on the physical properties of the plant in the context of controlling the Mackey-Glass chaotic system. Additionally, we study the effect of N and come to the surprising conclusion that increasing N does not result in increased control performance.

B. Hyperparameter Considerations: The Mackey-Glass System
The Mackey-Glass system is described by a delaydifferential equation, which is augmented by an additive drive signal v(t) and an observer aṡ We take β M G = 0.2, γ = 0.1, q = 10, and τ = 17, which places Eq. 12 in the chaotic regime without input. We simulate Eq. 11-13 with a 4 th -order Runge-Kutta method and fixed step size h = 0.1.
We investigate the effect of hyperparameter selection by attempting to stabilize the unstable steady state (USS) x(t) = 1 of Eq. 12. That is, after W out is identified, the control configuration is obtained by replacing and r δ in Eq. 11 with the constant 1.0 and v(t) in Eq. 12 with the trained reservoir output. Motivated by the training algorithm to find W out , the first quantity of interest is the normalized difference between the training and actual input signals after training has been completed and during the interval [T train , T test ] given by Because v is approximately equal to F −1 (recall Eq. 10), this metric is approximately equal to the error in finding the inverse plant dynamics. For brevity, we refer to this simply as the plant inversion error, although it is understood that this is only an approximation.  We also measure the control error once control-loop feedback is enabled during the control configuration shown in Fig. 1b. The asymptotic control error is given by There is a possibility that the plant inversion error and the control error are different because the former is in an openloop configuration (Fig. 1a) whereas the later is in a closedloop configuration (Fig. 1b). Unless otherwise specified, the RC and control hyperparameters are given in Table 1. As discussed in the previous subsection, the range of p is often restricted by case-specific constraints. The parameters δ and λ are particularly interesting in that they introduce two additional temporal scales, where the typical RC problem only contains c. Above, we argue that λ > δ is expected for good plant inversion error. Similarly, we expect that λ ≈ c because the reservoir nodes themselves are frequency filters with cut-off 1/c. We test these ideas by simultaneously varying the temporal parameters as shown in Fig. 2. We adjust the parameters while choosing W out to minimize the inversion error. As seen in Fig. 2a and b, the plant inversion error is a relatively smooth function in this parameter space, with minima in the (λ, δ) plane below the λ = δ line and minima in the (λ, c) plane along the λ = c line. The falsecolor plots are on a linear scale.
On the other hand, Fig. 2c and d reveal that the control error has a more complex dependence on the parameters and that different parameter combinations result in low error. Also, the observed variation in the control error is larger than the inversion error -the false-color scale is on a logarithmic scale -making it difficult to arrive at a simple functional dependence of the error landscape on the parameters δ, λ, and c.
Finally, we investigate the effect of reservoir size N . Based on previous RC computing research, we expect that the approximate plant inversion will decrease with N because the training algorithm (Eq. 8) is designed to choose W out to minimize this error. This is confirmed in Fig. 3. We find a very different dependence on the control error. There is a sudden drop in the error with increasing N corresponding to a bifurcation of the closed-loop feedback + plant system. For N too small, the attractor of the controlled Mackey-Glass system is identical to the uncontrolled Mackey-Glass system. However, there is a sudden change at a critical N , beyond which the attractor of the controlled system becomes a fixed point very close to the requested USS. We see similar behaviors when controlling the other systems described later. We also observe a minimum control error near N = 30 immediately after the bifurcation. Thus, there is little benefit to controlling N beyond this value and there is even a slight increase in the error for larger N .
Our findings also point to the challenge in controlling highcomplexity systems that display chaos. A single reservoircomputer controller has been shown previously to be effective in controlling the behavior of heated and stirred tank, pitch control of an aircraft, a double pendulum [35], and a robot arm [35]. While it might be possible to improve the controller performance studied here by fine-tuning the reservoir and control hyperparameters, we introduce an alternative method in the next section.

V. THE DEEP ESN CONTROLLER
In this section, we introduce a layered approach to obtain lower control error. As motivation, consider Eq. 11 for the controlled plant. It can be thought of as another (partially) unknown dynamical system with internal state given by {x, u} and output y. An accessible control input v can be created in a number of ways, such as with the replacement v → v + v . Because this new plant is partially controlled, the trajectory of y is now much closer to r than in the uncontrolled system given by Eqs. 1 and 2. This means that the partially controlled system is generally easier to control with the same strategy described above.
The layered controller is described by Eqs. 1 and 2, and where we simply add the outputs from each reservoir and W out,i is trained by controlling the (i − 1) th controlled plant as described below. The deep controller is illustrated in Fig. 4. Fig. 4. The configuration of the n th reservoir controller. All layers of the controller take as input y and r δ , which couple to the i th reservoir through W y in,i and W r in,i , respectfully. The trained weights W out,i depend only on the measured dynamics of the (i − 1) th controller, so the deep controller is trained sequentially.
Adding additional control layers adds additional hyperparameters to consider. In general, the hyperparameters should be selected for each layer, but we find that low control error is obtained using the same parameters for all layers, thereby greatly simplifying the design process. We follow this approach in the additional examples given below.
An added benefit of the layered approach is its computational efficiency. The training algorithm given in Eq. 8 scales approximately as N 3 in the size of the reservoir, but only linearly in the number of reservoirs. Thus, it is more efficient to train several smaller independent reservoir computers than a single large reservoir computer with the same number of total nodes.

VI. CONTROLLING THE LORENZ SYSTEM
In this section, we use our algorithm to control the multiinput multi-output Lorenz '63 system, described bẏ where x = (x 1 , x 2 , x 3 ) and v = (v 1 , , v 2 , v 3 ). We consider the typical parameters σ L = 10, ρ L = 28, and β L = 8/3, for which Eqs. 18-21 display chaotic behavior when v = 0. We solve numerically the Lorentz and RC-based controller equations using a 4 th -order Runge-Kutta method with fixed step size of 10 −3 . For u = 0, unstable steady states exist at (0, 0, 0) and ρ L , ± √ ρ L β L , ± √ ρ L β L , the latter of which exist at the center of the symmetric leaves of the attractor. We find that the dESN controller can stabilize or induce a wide variety of behaviors and give only two examples here for brevity.

A. Controlling an USS of the Lorenz System Using a Single-Layer Controller
We first use the dESN controller to stabilize the Lorenz system to the positive USS using a single control layer with the parameters in Table 2. Because the USS is an unstable set of the attractor, stabilizing the attractor to the USS should require control perturbations whose size are dictated by the noise level, which is set only by the numerical integration error in our simulations and much smaller than the errors observed below.  The plant and reservoir dynamics for a single-layer controller are shown in Fig. 5. During the training phase, the reservoir computer does a reasonable job of inverting the plant dynamics as seen in Fig. 5a. Here, we show how well the reservoir computer inverts the plant when we replay the training data (t < 200) through the system after the weights have been trained and its ability to generalize to a noisy drive signal it has not seen previously (t > 200). The normalized root-mean-square error during the training time is 0.50.
When control is turned on at t = 250 in Fig. 5b, it rapidly approaches the desired USS, which can be seen more easily in the two-dimensional projection of the attractor in phase space shown in Fig. 5d. The control perturbations are quite large initially (Fig. 5c) -comparable to the size of the other terms on the right-hand side of Eqs. 18-20 in the absence of controlbut then decrease rapidly. We make no attempt to minimize the initial size of the control perturbations; in the case shown, the controller essentially stops the chaotic dynamics in its tracks and then guides the trajectory to the desired USS.
Standard chaos control methods that require only small perturbations [36] cannot be applied to control this USS because they require that the chaotic trajectory passes within a neighborhood of the desired unstable state at which point control is turned on. This demonstrates the ability of our control approach to stabilize behaviors that are not on the attractor, albeit at the cost of larger control perturbations. A close inspection of Fig. 5c reveals that the control perturbations do not go to zero as expected. This is due to the small error in the reservoir computer estimating the plant inverse. As with controlling the Mackey-Glass system (Sec. IV), we find that increasing N does not improve the control performance even though it decreases the plant inversion error (data not shown). We thus add layers to the controller to improve its performance.

B. Applying the dESN to the Lorenz System
We control the Lorenz attractor to the same USS discussed in the previous section using progressively more layers with the same hyperparameters as in Table II except that each layer has N = 50. The state of the Lorenz system and the real-time control error from a typical example are presented in Fig. 6. Here, we include both the training and control phases as 4 layers are added to the controller.
As seen in the figure, each additional layer provides more precise control over the Lorenz system. After four layers (200 total nodes), the final control error is improved by two orders-of-magnitude relative to the error from the first layer. Other simulations (data not shown) predict that a single-layer reservoir computer controller with N = 200 does no better than a single-layer with N = 50. Figure 6 also demonstrates that the controlled system is highly stable to the training perturbations as each layer is added.

C. Controlling Ellipses Near the Lorenz Attractor
To highlight the fully nonlinear control characteristics of the dESN, we stabilize an elliptical periodic attractor that is in the Fig. 6. A typical trajectory of a controlled Lorenz system. Vertical dashed lines separate successive training and control phases, with the error from the requested USS y 0 shown in the bottom panel. That is, the system is perturbed with the first training signal from t = 0 to t = 25, and the single-layer controller is applied from t = 25 to t = 30. The controlled system is then perturbed with a second training signal from t = 30 to t = 55, and so on. vicinity of one of the leaves of the Lorenz attractor. It is located near the positive lobe of the attractor and centered around the positive USS, as illustrated in Fig. 7a. The ellipse coefficients are chosen by observing a segment of the Lorenz trajectory that spends several cycles orbiting about the positive USS and fitting to an ellipse in the least-squares sense. No elliptical periodic behavior is a solution to the autonomous Lorenz system (Eqs. [18][19][20], which implies that a non-vanishing controller effort is required to stabilize this behavior. A typical control example is presented in Figure 7b and c, which shows the temporal evolution of the control error and perturbations, respectively, needed to stabilize the elliptical periodic behavior when control is turned on at t = 115. The parameters are the same as in the previous section on stabilizing the USS (including N = 50 for each layer). The asymptotic control error is only 0.13. Compare this to the mean square-root of the variance of the Lorenz attractor of approximately 25.08. As in the previous section, the control perturbations are large initially because we make no attempt to wait until the trajectory of the uncontrolled system is close to the desired ellipse.

VII. APPLYING THE DESN CONTROLLER TO A CHAOTIC ELECTRONIC CIRCUIT
In this section, we apply the dESN controller to a chaotic electronic circuit, which consists of passive linear components, nonlinear signal diodes, and an active negative resistor [37] shown schematically in Fig. 8. Its dynamics are governed by where V 1 (V 2 ) is the voltage drop across capacitor C 1 (C 2 ), I is the current through the inductor, v 1 (v 2 ) is an accessible current into the V 1 (V 2 )-node, and v 3 is an accessible voltage across the inductor. In the absence of control perturbations, the circuit displays double-scroll behavior for a range of negative resistances. Similar to the Lorenz attractor discussed in Sec. VI, the circuit has a USS at the origin and two symmetric USSs at (±V ss 1 , ±V ss 2 , I ss ), with approximate values V ss 1 =0.59 V, V ss 2 =0.09 V, I ss =0.20 mA. The error level (both electronic noise and discretization error in the analog-to-digital converter) is determined by adjusting R n so that one of the non-zero USSs is stable and measuring the root-mean-square error (RMSE) of the signal. This noise level is used in this section to contextualize the control errors. Fig. 8. The chaotic electronic circuit displaying double-scroll behavior. Circuit parameters: Rn = 3 kΩ, C 1 = C 2 = 10 nF, L = 55 mH, R L = 355 Ω, Rs = 100 Ω, Rm = R L +Rs = 455 Ω, R d = 7.86 kΩ, signal diode are type 1N914 with V d = 0.58 V with Ir =5.63 nA and α = 11.6. circuit to be controlled. The characteristic resistance of the circuit is R = R/L = 2,245 Ω.
The system described by Eq. 22 exhibits chaotic oscillations with a characteristic time scale of L/C = 23.5 µs, which is the approximate orbit time of a trajectory around the USS in the center of a scroll, and has substantial spectral components beyond 20 kHz. Such fast time scales require a control loop with a total latency between measurement and application of the control perturbation much less than the characteristic time, which is a challenging task with a reservoir computer in the loop. To connect to the previous sections, x = (V 1 , V 2 , I) To perform the measurement of the accessible system variable, evaluate the reservoir computer, and apply the control perturbations, we use a field-programmable gate array (FPGA). Specifically, we use a Max 10 10M50DAF484C6G Device on a Terasic Max 10 Plus development board. The device includes integrated dual 12-bit analog-to-digital converters for measurement that operate up to 1 MHz and a 16-bit digital-to-analog converter that operates at 1 MHz for applying the control perturbations.
We use the ADCs to make simultaneous measurements of V 1 and V 2 as the accessible plant variables y = (V 1 , V 2 , 0). The DAC generates a voltage (v train during the training period or v during the control period) that we send through a voltageto-current converter with variable gain and inject into the V 1 node so that u = (v 1 , 0, 0).
To accelerate the dESN controller on the FPGA, we use 32bit, fixed-point calculations and an Euler integration method for Eq. 11. To greatly reduce hardware space, the matrices W, W in , and b and the time constant c are hard-coded at the time of design compilation. Conversely, W out is stored in onboard RAM and updated mid-operation by a host computer. The output is then calculated by evaluating W out x with dedicated multipliers and adders. The tanh function is implemented with a 10-bit lookup table. The ADC, DAC, and ESNs are synchronized to a 1 MHz global clock.

A. Nonlinear Control of the Chaotic Electronic Circuit
As with the simulation results in the previous sections, we find that the dESN controller can stabilize a variety of behaviors, but focus here on the nonlinear control problem of rapidly and repeatedly guiding the system between each nonzero USS (±V ss 1 , ±V ss 2 , I ss   Figure 9a and b show the real-space trajectories of the controlled circuit dynamics and control error, respectively, and Fig. 9b shows the circuit dynamics in phase space over the same interval. As seen in Fig. 9b, substantial control perturbations are required to initially control (−V ss 1 , −V ss 2 , −I ss ) as well as when making the transition to (V ss 1 , V ss 2 , I ss ) and back again. We also observe DC offset of the system from its USSs and a ringing effect after the transition. We also observe that requested path straight across the attractor is difficult to control precisely as can be seen most apparently in Fig. 9b.
Curiously, it appears that the DC error is largely addressed by the second reservoir. The control effort during the transition is reduced substantially by the second reservoir, but a persistent high-frequency instability is somewhat larger.
To quantify these results, the control task is repeated a total of 30 times each with 5 different realizations of ESNs. The mean performance is characterized by the RMSE of the control error over one period and is 131.1 ± 6.9 mV for layer 1 and 26.9 ± 0.8 mV for layer 2. Compare these numbers to the estimated RMSE of the noise level at 13.1 mV.

VIII. CONCLUSIONS
In this paper, we introduce a method for controlling an arbitrary dynamical systems to arbitrary trajectories. It requires no knowledge of the plant, and is therefore model-free. Unlike other model-free techniques, the control law is learned directly rather than through an initial system identification step. The algorithm is capable of controlling complex chaotic systems and is robust to the noise and non-ideal properties of physical systems. It can be implemented with a compact FPGA and used to control fast experimental systems. This work paves the way for research into control engineering with reservoir computing and application to real-world problems, as we have demonstrated. The fast training time suggests that our approach can be applied to systems in real time in response to changing conditions, such as a damage event, or in manufacturing variation of system. This research suggests several future directions in control engineering and RC more generally. First, a rigorous stability analysis is desired. While this is notoriously difficult when recurrent neural networks are involved, many safety standards require such a proof before deploying a control system when humans are involved. Second, the application of optimization methods is not well understood in this domain of RC. The issue is particularly salient here, given the increased number Fig. 9. Control of the experimental circuit between USSs. a) In real space, the first controller leads to substantial ringing after the transition betweem USSs. The second reservoir substantially reduces this. b) The uncontrolled attractor in phase space. The black box indicates the region containing the target trajectory in the next panel. c) In phase space, it appears that dragging straight across the attractor is an unnatural trajectory for the circuit. The circuit subject to the first controller, the circuit subject to the second controller, and the target trajectory are depicted with +, x, and square symbols, respectively. of hyperpameters. Particularly interesting is whether optimizations can be made by relaxing the constraint that all ESNs have the same set of hyperparameters. It may instead by the case that, say, deeper ESNs require different time constants because the local Lyapunov spectrum is different from the controlled and uncontrolled plants. Third, on the other hand, a controller with an exceptionally small footprint may be devised by considering individual reservoirs that have identical weights. Although multiple reservoirs are required during training (because the trained reservoirs couple to r(t+δ), while the to-be-trained reservoir couples to y(t + δ)), they could all be replaced with a single reservoir during the control phase, where they all have the same dynamics. Initial study of this idea to the Lorenz system suggests this is possible, but more investigation is required.