Energetic cost of feedback control

Successful feedback control of small systems allows for the rectification of thermal fluctuations, converting them into useful energy; however, control itself requires work. This paper emphasizes the fact that the controller is a physical entity interacting with the feedback-controlled system. For a specifically designed class of controllers, reciprocal interactions become nonreciprocal due to large timescale separation, which considerably simplifies the situation. We introduce a minimally dissipative controller model, illustrating the findings using a simple example. We find that the work required to run the controller must at least compensate for the decrease in entropy due to the control operation.

We focus here on information engines that apply repeated feedback at discrete points in time by making a measurement of the system state, modifying a feedback potential, and letting the system relax in the potential until the next measurement. A plethora of experimentally realized information engines and theoretical models fit this paradigm. One key distinction is whether the relaxation time is so long that the system equilibrates between measurements [6,14,20,31,33], or sufficiently short that subsequent measurements are correlated [12, 17-19, 21, 22, 48-51].
Whenever the state of the controller depends on the measurement of the system, which is a random variable, the control itself becomes a cofluctuating random variable, i.e., the system and controller form a bi-variate stochastic process in which each system is influenced by the other. Energy flows between strongly coupled, cofluctuating systems contain, in general, both heat-like and * jehrich@sfu.ca † sstill@hawaii.edu ‡ dsivak@sfu.ca work-like contributions [40]; however, the situation may be simplified by applying specific design assumptions to the controller, which we pursue here.
A system controlled by a parameter that is adjusted using knowledge of the system's state was studied extensively [52,53], and it was found that the entropy production of the controlled system is bounded by a difference in mutual information. This bound, however, does not immediately give insight into the minimum thermodynamic costs of the feedback-control operation. In particular, an open question is whether the difference in mutual information is actually the minimum realizable work required to run the controller. To illuminate this issue, we use a bottom-up approach, starting from a physical controller model to explicitly calculate the work required to achieve feedback control. Our method differs from a previous approach [29] which started with an abstract inequality and then added an interpretation. We confirm that the minimum work required for implementing feedback control by updating the controller according to a particular feedback rule is given by an informationtheoretic quantity that can be related to the difference in mutual information between controller and system, before and after the controller update. We illustrate how a physically realizable controller can reach the minimum work. This approach allows us to derive an informationtheoretic quantification of the cost of feedback control using a familiar expression for the work done on small fluctuating systems.
In this paper, we account for both the conversion of information to work and the work required to record and react to the information. Contrary to previous approaches that regard the feedback controller as external to the system and thus require a separate specification of a measurement process distinct from the system dynamics, we assume here that feedback-controlled system and controller together form an information engine. We emphasize the fact that the controller is a physical entity that can only interact with the feedback-controlled system via interaction potentials. The interaction potentials are designed such that an external experimenter only supplies predetermined modifications of the potential to realize the desired feedback control, and hence, does not need to know the actual system state. The resulting reciprocal interactions between feedback-controlled system and controller can then become effectively nonreciprocal [54] in the limit of large timescale separation between the two component's dynamics.
Section II specifies our physical controller model and gives the lower bound on the work required to run this type of controller, together with a protocol that achieves it. The example in Sec. III illustrates the situation.

II. MODEL OF FEEDBACK CONTROLLER
We consider the joint time evolution of a feedbackcontrolled system X and a controller Z. The system X is assumed to be small and in contact with a thermal bath at temperature T , which results in stochastic dynamics. Let x(t) and z(t) be the respective states of system and controller at time t. As an example, consider overdamped dynamics for both the feedback-controlled system X and the controller Z which interact via a (conservative) potential-energy landscape. For such a setup, heat and work are readily calculated [55][56][57].
We model information engines that employ repeated feedback, where measurement and feedback happen cyclically, with sampling period t s . The time τ to measure system X and update the controller based on this measurement is assumed to be much smaller than the time between sampling: τ t s . In these engines the measurement is assumed to happen without back-action. The system state is measured at times k t s , resulting in x k := x(kt s ), where k = {0, ..., K}. Using this information, the controller updates to a new value z k := z(kt s + τ ).
In the idealized case, the update time vanishes, τ → 0. In the following, we assume that during the time when the controller is updated, X does not change.
Between controller updates, the system changes dynamically, and the controller state is stable (in the sense that fluctuations are negligible), so we can write z[(k + 1)t s ] = z(kt s + τ ) = z k .
During this time interval the energy of subsystem X depends on the controller state. We consider the overdamped case, where only the potential energy affects dynamics. The relaxation potential V r (x, z k ) controls the dynamics of subsystem X during the kth relaxation step, giving time-dependent potential energy V r [x(t), z k ].
We assume alternating stability: the system does not change during the controller update, and the controller is stable during system relaxation. This can be achieved by fast controller updates and making the controller mobility sufficiently small during the relaxation steps, e.g., by making the controller sufficiently large.
The externally applied driving protocol thus modifies not only the interaction energy between X and Z, but also changes the mobility ν z (t) of the controller between a low mobility ν low to keep it as unchanged as possible during the system relaxation step, and a high mobility ν high to rapidly update the controller during the con-FIG. 1. A process with repeated feedback. A nearly stable controller Z with a very low mobility ν low provides a trapping potential for a colloidal particle X. At times kts, k = {0, 1, ...}, the controller's mobility is increased to ν high and the controller is quickly recentered on the particle through a fast update taking time τ ts. This process approximates an idealization in which the controller is stable during system relaxation and instantaneously updates its position at periodic intervals.
trol step. Changing the controller mobility in this way and assuming a short controller update time τ ensures that the subsystems are alternatingly stable: Each subsystem's relaxation time is too long to react during the other subsystem's update, thus introducing an effective non-reciprocity even though the forces on both subsystems are derived from a joint potential. Figure 1 illustrates the repeated-feedback process for a system that consists of a colloidal particle in a trap and a controller that periodically moves the trap center to the particle position (examined in detail in Sec. III). During the relaxation step the trap position changes very little, but during the control step it quickly recenters on the particle, which has almost no time to respond. A discrete-time notation updates the timestep counter only once in each cycle with the temporal ordering: Figure 2 illustrates the temporal ordering of the discrete-time dynamics.

A. Heat absorbed by feedback-controlled system
The total potential-energy change over a cycle is V r (x k+1 , z k ) − V r (x k , z k−1 ), which can be split into two contributions; V r (x k+1 , z k ) − V r (x k , z k ) when the controller is fixed and V r (x k , z k ) − V r (x k , z k−1 ) when the feedback-controlled subsystem X is fixed.
We assume that during the time period in which X evolves dynamically under V r (x, z k ) and changes its potential energy from V r (x k , z k ) to V r (x k+1 , z k ), the exper-imenter does no work on the joint X-Z system, so only heat is exchanged with the environment. (Our convention is that heat absorbed by and work done on the system are positive.)

B. Apparent work
If the controller subsystem Z was an external control parameter (i.e., not considered part of the system) rather than a cofluctuating subsystem, then q X k would be the only heat exchanged with the environment, and the work done on X by Z would be Thus, if the experimenter supposed in their accounting that there was no feedback control, then they would assume that this ought to be the work done on subsystem X. But in reality, there is feedback control, and this is not the actual work. For that reason, we refer to it as "apparent" work. Note that if there was no feedback control, then subsystem X, driven by control parameter Z, would have the nonequilibrium ("generalized") free energy [58][59][60][61][62] associated with the conditional distribution: The average work dissipated during a controller update z k−1 → z k at fixed x k would then be  C. Total work done on the joint system Crucially, in the case of the cofluctuating controller subsystem Z, w app k is just one part of the work done on the joint system. The work to update the controller from z k−1 to z k , which can be considerable, also needs to be accounted for.
The task of the controller is to rectify thermal fluctuations of the system, thereby converting input heat into output work. To achieve this, the controller itself is externally controlled by an experimenter; however, this external control is limited to modulating the coupling potential in a predetermined way such that the controller implements the desired feedback. In particular, the external experimenter need not know the system state x k or a measurement thereof. (Section II E illustrates how this can be achieved.) Nonetheless, the experimenter needs to supply the work required to change the controller's state.
We model manipulations of the controller with a timedependent control potential V c (x, z; t), thereby filling in the detailed temporal development of the controller between z k−1 and z k . Our treatment is thus a more specialized version than the treatment in [40]. The control potential steers the controller from state z k−1 to state z k at constant subsystem state x k . To achieve a controller update sufficiently fast to hold the subsystem fixed, we switch the controller mobility from ν low to ν high before the controller update and back to ν low at its end. In cycle k, the controller update steps are: (i) At time kt s : instantaneous switch from the relaxation potential to the start of the control potential, z; kt s ], and from low to high controller mobility, ν low → ν high .
(ii) Between t = kt s and t = kt s +τ : continuous manipulation of the control potential V c (x, z; t), thereby bringing the controller to the new value, z k .
(iii) At time kt s + τ : instantaneous switch to new relaxation potential, V c [x, z; kt s + τ ] → V r (x, z), and from high to low controller mobility, ν high → ν low .
To account for all steps, the following energy changes contribute to the work done on the joint X-Z system [55-57, 64, 65]: We define work in excess of apparent work as the "additional work" necessary to achieve the controller update: The average work done on the joint X-Z system over the entire protocol is W = K k=0 w k . Including the heat flow q Z k while X is fixed and only Z changes [step (ii) above], the total work and heat over a cycle are w k +q k = w app The definitions of q X k and w app k , however, yield illustrating the purpose of an information engine: converting input heat into output work. Combining Eqs. (6) and (7) reveals that all additional work to effect the controller change must be dissipated as heat: w add k = −q Z k . In the following we will address how this additional work can be minimized.

D. Minimum additional work
The nonequilibrium free energy of the joint system is . The free-energy change over a cycle is then, using the first law over a cycle (6): We split the entropy change into contributions from changing X and changing Z: We analogously split the free energy into ∆F k = ∆F Z k + ∆F X k , summing the contribution during adjustment of the controller Z, and the contribution while the subsystem X evolves in the relaxation potential V r , The second law implies that w k − ∆F Z k ≥ 0, and thus In summary, the additional work needed to adjust the controller has to at least compensate for the decrease in conditional entropy due to the controller update. The remaining uncertainty, quantified by H[Z k |X k ], measures the precision with which the controller adjusts its state z k to the feedback-controlled subsystem state x k . On the other hand, H[Z k−1 |X k ] corresponds to the precision of the controller's anticipation of the next subsystem state. Higher-precision controller updates impose larger minimum thermodynamic costs; conversely, higher-precision controller anticipation reduces minimum thermodynamic costs.
The entropy production by the joint X-Z system over one time step, ∆H X,Z can thus be bound by the sum of the change in entropy, , the heat produced when X changes, and the instantaneous nonpredictive information that the controller Z retains about the feedback- . This quantity is equal to the negative information flow, [43]. Thus we have: Less dissipation is required for a controller-update rule that (on average) performs better at predictively inferring the next state of the system, and thus captures less instantaneous nonpredictive information for the same amount of memory. This is also reflected in the fact that average additional work required to run the controller, summed over an entire experiment, W add := K−1 k=0 w add k , can not be less than the total instantaneous nonpredictive information that the controller state keeps about the signal which is causing it to change (feedback-controlled subsystem X), Bounds on the apparent work w app k extractable using feedback control, that are similar to the RHS of (13) and (14), have been found [52,53] without an explicit controller model. We find here that the RHS of Eqs. (13) and (14) provides the minimum additional work required for control, as carried out by a real-world, physically implemented controller. Importantly, our analysis does not require an external observer's measurement. Instead, control is carried out mechanistically by a time-dependent modification of the coupling potential, implementing the effect of a measurement and subsequent feedback by an external observer. In the next subsection we illustrate how our explicit controller architecture achieves the minimum additional work.
The controller-update rule could be optimized to be predictive and thus minimize dissipation, as proposed in [32]; however, in most practical applications, a controllerupdate rule p c (z k |x k ) is chosen (often heuristically) by the experimenter. Let us thus assume for the remainder of this paper that the update rule is given. This probabilistic rule could describe, e.g., a noisy measurement and the subsequent controller reaction to it. For the simple case of z drawn from a Gaussian p c (z|x) with mean x and standard deviation σ, the intuition for H[Z k |X k ] = ln σ + ln √ 2π + 1/2 is straightforward: sloppier adjustments result in a wider distribution (larger σ) and hence, larger H[Z k |X k ].

E. Protocols minimizing additional work
Together with the given X-dynamics, characterized by p(x k+1 |x k , z k ), and an initial condition p(x 0 ), the distri- With both initial and final distributions fixed by the given controller-update rule and system dynamics, we seek to minimize the average controller work. The lower bound of Eq. (12) is reached when w add . This can be achieved by choosing [66]: and quasistatically changing V c between t = kt s and t = kt s + τ , which is ensured by a sufficiently large controller mobility ν τ −1 . Similar protocols have been used to minimize thermodynamic costs when copying polymers [67].
The limits of interest, which lead to an idealized feedback process, are: ν low → 0 makes the controller stable during the relaxation step; τ → 0 makes the control step sufficiently short that the system is effectively immobile; ν high → ∞ with fixed large ν high ×τ 1 ensures that the controller-update protocol realizes the quasistatic step, and thus approaches the minimum-work implementation.
This idealization in terms of a concurrent doubletimescale separation achieves an effectively quasistatic update of the controller that is simultaneously instantaneous from the point of view of the system.
The bound on the (average) minimum additional work required for feedback control, Eqs. (12) and (14), together with the description of a protocol for a physical controller that reaches the bound, are our main results. These results highlight that feedback control needs to be understood and analyzed as being carried out by a physical system.
To summarize: Instead of assuming that an ethereal external observer makes a measurement and executes feedback on a feedback-controlled system, we highlight the fact that the controller is a physical system that is coupled to the feedback-controlled system via carefully designed interaction potentials. Reproducing the alternating stability found in many example measurementfeedback processes requires a clear separation of time scales between controller and feedback-controlled system. In our model, control is achieved by reversibly changing the controller's state, which is realized through a timedependent control potential and a high controller mobility. Because the system state is stable during this (fast) update, the system state selects the distribution for the next controller state as the conditional equilibrium distribution of the control potential. Standard energetic considerations permit calculation of the minimum work needed for this update, which is bounded by the free-energy change during the update, leading to an information-theoretic lower bound.

III. EXAMPLE FEEDBACK PROCESS
In the previous section we have found a bound on the minimum work required for feedback control. Here, we illustrate this finding by studying in detail a model of a simple measurement-feedback process in the idealized limit. We describe the model (Sec. III A) and solve for its dynamics (Sec. III B). In Sec. III C we calculate the apparent work without recourse to any model of how the control is achieved. Next, in Sec. III D we use the arguments of Sec. II D to give a lower bound on the additional work needed to achieve the desired control. In Sec. III E, we relax the idealization by allowing for a finite-but-short feedback time τ . We then give an explicit model for the controller's dynamics realizing the minimum-work protocol described in Sec. II E. We numerically simulate the entire process of feedback and relaxation and calculate the work done on the joint system to verify that the lower bound on the additional work is reached in the idealized limit.

A. Model description
Consider the position x of a Brownian particle obeying overdamped dynamics with Stokes friction coefficient γ. The particle diffuses in a harmonic trapping potential with stiffness κ. The controller's state z is the position of the trap center. The particle's dynamics, given a controller state z, evolve according to the Langevin equatioṅ where ξ(t) denotes Gaussian white noise with zero mean and covariance ξ(t)ξ(t ) = δ(t − t ), and we rescaled lengths by the standard deviation k B T /κ of the equilibrium distribution and times by the particle's relaxation time γ/κ in the trap. The relaxation potential is therefore V r (x, z) = 1 2 (x−z) 2 , and energy is measured in units of k B T .
Feedback consists of the controller periodically measuring the particle at times t = k t s , (k = 1, ..., K), and then updating its own state to the measurement outcome, thereby recentering the trap at the measurement. For simplicity, we assume that the measurement has a Gaussian error with mean zero and standard deviation σ, such that z k = x k + ση, whereη is a zero-mean Gaussian random variable with unit variance. When σ is sufficiently small, the controller "tracks" the particle and extracts energy from the particle's fluctuations. The resulting information engine resembles the model studied in [20], but does not reset the trap position to zero after each time step.

B. Model dynamics
Integrating Eq. (16) over one time step gives the conditional distribution [68,Sec. 4.5.4] p where N (x; µ, c) denotes a Gaussian distribution of x with mean µ and variance c. The update rule for the controller is Assuming that initially the particle is at x 0 = 0 and the controller is distributed around it according to Eq. (18), the time evolution of x is where is the increment in particle variance from one time step to the next. Therefore, the marginal particle distribution is Because the controller periodically recenters the trap around the fluctuating particle, on timescales longer than the feedback time t s the dynamics of the particle correspond to free diffusion with effective diffusion coefficient ∆c xx /2t s . The joint distribution of particle and ensuing controller state is where N [z; µ, C] is a multivariate Gaussian distribution of z with mean vector µ and covariance matrix C. Here, Due to the operation of the feedback, if σ < 1, the measurement is sufficiently precise that the particle distribution with respect to the trap center, is narrower than the corresponding equilibrium distribution, which can be interpreted as a lower effective temperature of the particle. The model therefore realizes an overdamped version of feedback cooling, which usually refers to feedback forces leading to velocity distributions being narrower than the equilibrium Maxwell-Boltzmann distribution in underdamped systems [69,70].

C. Apparent work
The feedback operation changes the internal energy of the joint system, which is interpreted as apparent work done on the system. Following Eq. (2), the average apparent work per control step (in scaled units) is To calculate this work explicitly, we require the joint distribution of the particle and the previous controller state, which, using Eqs. (17) and (22), becomes for the covariance matrix of particle position and controller position after the relaxation (X update) and before the control step (Z update), Then, the apparent work in Eq. (25b) reads When σ < 1, this is negative, indicating work extraction from thermal fluctuations. We also calculate the heat flowing into subsystem X during its relaxation, illustrating that the engine extracts apparent work from heat that flows into the system during the relaxation step.

D. Additional work and efficiency
Following Sec. II D, we calculate the lower bound (12) on the additional work from the joint probabilities of subsystem and controller [Eqs. (22) and (26b)]. The lengthy general expression simplifies in the limit k → ∞ to where | · | is the determinant. Figure 3(b) shows the minimum rate of average additional work. Comparing with Fig. 3(a), negative apparent work is accompanied by positive additional work, and vice versa. Figure 3(c) verifies that the average total work is nonnegative. The negative apparent work is thus more than compensated by costs incurred in running the controller. The total work only vanishes at σ = 1, because before and after the feedback the particle distribution in the trap is the same equilibrium distribution, and hence feedback does not change the free energy. For σ > 1 the roles of the feedback-controlled system and controller reverse in some respects, and positive apparent work is converted into negative additional work. Figure 3(b) illustrates that the costs of running the controller increase with increasing measurement accuracy (σ → 0). Moreover, the costs increase with feedback frequency (t s → 0): Frequent, accurate measurements are costly. This finding is illustrated by the information efficiency [28,48] defined as benefit (extracted work) relative to costs (additional work), shown in Fig. 3(d) for the k → ∞ limit. This measure of information efficiency is maximized at vanishing output power. Faster feedback and more accurate measurements reduce possible efficiency.

E. Explicit physical controller model
Here, we present an explicit model of a controller that realizes the minimum average additional work. We assume that the controller state z is described by a Langevin equation such that the dynamics of feedbackcontrolled system X and controller Z evolve according to the coupled Langevin equationṡ where ξ x (t) and ξ z (t) are uncorrelated Gaussian white noises and ν z is the piecewise-constant controller mobility switching between a large value ν high during the control step and a small value ν low during the relaxation step to achieve the joint system's alternating stability as described in Sec. II.
Although challenging, at least conceptually the dynamics of this joint system could be realized experimentally by a Brownian particle with tunable anisotropy in a two-dimensional potential-energy landscape that could be generated, e.g., by virtual potentials using a feedback trap [71,72].
The relaxation potential is the quadratic The control potential V c (x, z; t) quasistatically and reversibly carries the controller from the previous conditional controller distribution p(z k−1 = z|x k = x; t) ∝ exp [−V c (x, z; kt s )] at time kt s to the next conditional controller distribution p(z k = z|x k = x) ∝ exp [−V c (x, z; kt s + τ )] at time kt s + τ . (Recall that the control step is sufficiently short, τ 1, that the particle does not move, x(t) ≡ x k for t ∈ [kt s , kt s + τ ].) To this end, we dynamically change the control potential according to for time-dependent trap stiffness calculated from Eq. (26b), and κ(kt If all timescales are sufficiently separated (ν −1 high τ t s ν −1 low ), then the controller update is effectively instantaneous, compared to the system dynamics and the feedback loop time t s , but it also is effectively infinitely slow, compared to controller dynamics during the controller-update step.
In each timestep, the additional work (5) is Figure 4(a) shows the resulting average work contributions as a function of the controller mobility ν high during the control step. For large mobility (ν high 2 · 10 2 ), the controller achieves its task to track the particle: Figure 4(b) shows that the variance (x k − z k ) 2 of the controller around the particle after the control step matches the measurement variance σ 2 . Consequently, the apparent work w app k matches Eq. (28). For low ν high 10 2 , the apparent work increases to zero and the additional work decreases to zero, because the controller state does not change appreciably during the control step, hence the joint system remains close to equilibrium and little work is done or extracted.
A controller mobility ν high 10 4 is required for the apparent work to converge to the predicted value for ν high → ∞, which is negative (work is extracted), and for the additional work w add k to achieve the lower bound given by (30d). As ν high is decreased, changes in the control potential become too fast for the controller to track. Consequently, the control step is no longer quasistatic and the controller distribution lags behind its instantaneous equilibrium distribution, causing dissipation that results in greater additional work.
To bound the resulting peak in additional work [ Fig. 4(a)], let us consider a "worst-case" estimate of the additional work needed to nonreversibly adjust the controller. Consider instantaneously setting the control to the desired final control potential V c (x, z; kt s + τ ) with stiffness 1/σ 2 and then letting the controller relax to its new equilibrium distribution. The work equals the heat released during the controller's relaxation, which can be bound using the relative variance Var [z k−1 |x k ] before the control step (35c). We obtain the estimate w add The maximum of the additional work in Fig. 4(a) does not reach this value, because the mobility ν high is sufficiently high for the optimized process to harness some  (2) and additional work (36) as functions of controller mobility ν high during feedback, compared to respective model predictions (28) and (30d) for ν high → ∞. Symbols are averages from K = 10 4 timesteps, and standard errors of the mean are smaller than the symbol size. (b) Variance (x k − z k ) 2 of the controller around the particle after the controller update. (c) Efficiency computed by dividing average apparent work (benefit) by average additional work (costs). Symbols are simulations and the solid line is information efficiency (31b). Errorbars show standard errors of the mean. Simulation results are obtained by numerically integrating Eqs. (32a) and (32b) with time step dt = 10 −7 for feedback time ts = 0.3, measurement variance σ 2 = 0.1 (the parameters marked in Fig. 3), controller-update time τ = 10 −3 , and controller mobility ν low = 10 −2 during the relaxation step.
of the controller's relaxation dynamics, thus making the control step less costly than instantaneous switching.  Fig. 4(a) does not indicate higher efficiency because the lower additional work is more than compensated by lower extracted work.

IV. DISCUSSION
In this paper, we gave an expression for the minimum additional work as a function of a given feedback rule p c (z k |x k ). In some scenarios one might be interested in optimizing a control rule to maximize the total average work the engine produces [32], or other criteria. Together with the minimization of additional work we have pursued here, there could thus be a second optimization varying the feedback rule. Alternatively, both optimizations could be carried out together.
The study of the example system in Sec. III E shows that surprisingly large timescale separations are needed to achieve effectively instantaneous yet reversible control. If the controller mobility cannot exceed some maximal value, then control requires some minimum duration. The minimum-work protocol is then an optimal-transport process, which has been studied in the context of finitetime thermodynamics [73][74][75][76][77][78][79][80][81] Finding the additional cost due to fast control would be an extension of the approach presented here.
Achieving the necessary timescale separation for alternating stability requires varying the mobility of the controller. Whether this requires additional thermodynamic costs is a matter of practical concern as it depends on the controller implementation. For example, if an electronic memory is used as the controller, then one may be able to raise and lower energy barriers between the controller's discrete states, thereby drastically changing mobility at vanishing extra costs.
We consider a static relaxation potential V r (x, z). Although illustrative and simple to treat mathematically, this setup is not optimal: To harness all information gathered by the controller, a specifically designed, timedependent feedback potential is required. Such a process can be made feedback-reversible [82]. In our setup, the nonequilibrium relaxation dynamics of feedbackcontrolled subsystem X in a static potential always cause entropy production, even if the distinct control step is perfectly reversible, as assumed here. This is not a severe limitation because a simple modification could make the relaxation potential time-dependent, V r (x, z; t), allowing for feedback protocols that dissipate less heat and hence extract more work.
In our controller-update step, the control potential is specified as an evolving function of time. This may not be practical; a simpler but worse-performing alternative would set the desired final control potential V c (x, z; kt s + τ ) at the beginning of the control step and rely on the large mobility of the controller to achieve relaxation to the correct final distribution. Such a protocol is easier to implement but does not take advantage of the controller relaxation during the update, so requires much higher additional work as explained in Sec. III E.
In contrast to other work on repeated-feedback processes [35,37,38], our approach does not lead to transfer entropy [83] or conditional mutual information as lower bounds for control cost. Our approach uses a controller-update rule that does not depend on the last controller state. A recursive update rule could be used instead: p c (z k |x k , z k−1 ), for which the next controller state z k would depend on the current controller state z k−1 , which contains degrees of freedom storing memory of past measurements that are carried over to the next controller state. Recursive update rules which lead to lower dissipation are linked to learning and datacompression algorithms of the generalized information bottleneck class [84][85][86].
Modeling the controller as a physical system allowed us to identify the minimum work required to achieve the desired control through information processing and feedback in contrast to other approaches [34, 36-38, 52, 53, 87] that only bound the extractable work achievable through feedback control without direct relation to the energetic cost of information processing. Having an explicit model of the controller's dynamics and its coupling to the feedback-controlled system alleviates interpretational ambiguities about the controller's operation, as can be found in, e.g., [29]. The lower bound on the work necessary to update the controller can also be used to analyze operational costs of more complex information engines.
The utility of our setup is also reflected in the fact that there is no ambiguity about the time-reverse of a thermodynamic process with feedback. At first glance, the timereverse of a feedback process might seem acausal, with effect (a specific control action) preceding cause (a measurement of the system state). Consequently, it has been common practice, when deriving fluctuation relations and second-law-like inequalities with information, to consider a reverse process that randomly picks a specific control protocol from the ensemble of forward control protocols and executes it in reverse without feedback [34,[36][37][38], where approaches have differed in whether measurements are made in the reverse process and whether some post-selection of the resulting trajectories is needed [87]. Using, as we did, an exact specification of the potential and the controller mobility as a function of time, makes the time-reversed process transparently determined: it simply consists of executing the control on the joint system in reverse. With the initial condition of the reverse process starting from the final distribution of the forward process, entropy production and fluctuation theorems then follow straightforwardly [40].

V. CONCLUSION
In this paper we investigated information engines that employ repeated feedback, paying attention to the fact that the controller that realizes the desired feedback rule is a physical entity coupled to the feedback-controlled system via physical interaction potentials. We explicitly accounted for work needed to realize the prescribed controller dynamics. The average additional work needed to carry out the desired control cannot be less than the reduction in entropy it achieves.
Our work highlights the fact that feedback control, including measurement, computation, and erasure of information needed to run an information engine, can be achieved mechanistically. In our model these processes are completely internal to the joint system formed by a controller and feedback-controlled subsystem. The experimenter only supplies the scheduled modifications of the control potential and controller mobility and is not involved in any measurement or decision making.
The code for simulation and generating the plots in this paper can be found in Ref. [88].