Second-law-like inequalities with information and their interpretations

In a thermodynamic process with measurement and feedback, the second law of thermodynamics is no longer valid. In its place, various second-law-like inequalities have been advanced that each incorporate a distinct additional term accounting for the information gathered through measurement. We quantitatively compare a number of these information measures using an analytically tractable model for the feedback cooling of a Brownian particle. We find that the information measures form a hierarchy that reveals a web of interconnections. To untangle their relationships, we address the origins of the information, arguing that each information measure represents the minimum thermodynamic cost to acquire that information through a separate, distinct measurement protocol.


Introduction
The Kelvin-Planck statement of the second law of thermodynamics forbids the existence of a cyclically operating device whose sole effect is to convert heat from a single thermal reservoir into an equal amount of work [1]. However, we can circumvent this restriction, if our device operates via measurement and feedback: a possibility first envisioned by Szilard in his famous thought experiment [2]. Recently, there has been renewed interest in this old idea spurred by the development of a collection of distinct, second-law-like inequalities that quantify the interplay between the information gathered through measurement and the work that can be extracted in response through feedback. For continuously operating devices at temperature T , all these predictions bound the extracted work rateẆ ext aṡ by some information acquisition rate, generically denoted here asİ, which differs in each second-law-like inequality, and k B is Boltzmann's constant. The first inequality of this form was derived by Sagawa and Ueda for a single feedback loop [3], but subsequently has been extended to include the repeated use of feedback, allowing for the application to continuously operating information engines [4,5,6,7,8,9,10,11]. In this case, the information rate is identified as the rate of growth of the transfer entropy [12] from the system to the measurement device (or feedback controller) [8,10,13,14]. An alternative inequality identifies the information rate with the flow of mutual information between the system and a continuously-interacting auxiliary measurement device. This information flow approach has been developed for small systems modeled as continuous diffusion processes [15], discrete Markov jump processes [14,16], and for stochastic processes interacting discretly [17,18]. Yet another version has been suggested by Kim and Qian specifically for the feedback cooling of a harmonically-trapped Brownian particle, where the extracted work is bounded by a term they call entropy pumping [19].
To date there is no clear information-theoretic interpretation of this term. Nevertheless, this result conforms to the second-law-like structure in (1). Further developments in this direction are the inclusion of measurement errors and delay [20,21,22]. At first glance, this plethora of seeming similar predictions is confusing and raises questions about the interpretation as well as the utility of these information bounds. To help clarify the situation, a number of studies have compared some of these measures from different points of view [14,15,23,24]. Our goal in this paper is to build on these works by providing a comprehensive, pedagogical comparison of all these information measures within a single framework in order show clearly their relationships and limitations.
There are essentially two ways to view (1). The first is to treat (1) simply as a numerical bound on the extracted workẆ ext without reference to the physical underpinnings ofİ. This is the point of view we typically take when investigating feedback (or information) engines [25,26,27,28,23,29], where our goal is to optimally extract the maximum amount of work; the maximum being any or all of the possible information measures. In this respect, having so many bounds is problematic, since we are unsure which is the most appropriate. Nevertheless, this is the approach we take in the first half our paper in section 3. There we investigate the quantitative relationship between the various information measures by analytically calculating them in a Brownian particle model of feedback cooling, which we introduce in section 2. We use this particular model, since it has been studied theoretically [19,20,21] and could be implemented experimentally in the setups of [30,31]. The analytical tractability of this model further lets us examine these information measures from the point of view of optimal control theory, which reveals intimate connections among them. The second way to interpret (1) is to take seriously its resemblance to the second law, and ask how far can we push this analogy? In particular, the traditional statement of the second law dictates that the entropy production of the universe -system and surroundingsduring a thermodynamic process must be positive [1]. In feedback-driven systems, the surroundings not only include the traditional thermodynamic reservoirs, such as heat baths or chemical baths, but in addition they include an auxiliary system that records the measurement and feeds back that information. In this case, does (1) still represent the entropy production of the system and its surroundings, except now the surroundings contain the feedback device? This is the question we address in the second half of our paper in section 4. There we observe that the transfer entropy rate and information flow have clear interpretations as the minimum entropy production required to acquire that information. However, each one is associated with a different physical measurement scenario, that is with a distinct surroundings in much the same way a particle reservoir differs from a thermal reservoir.

Feedback cooling model
Throughout, we will illustrate the different information concepts with a model for the feedback cooling of an underdamped Brownian particle [19,20,21]. This will allow us to discuss each measure using the same language. We therefore in this section introduce the dynamics of the model, both on the individual trajectory level and the ensemble level, as well as collect germane results regarding its energetics and thermodynamics.

Dynamics, energetics, and thermodynamics without feedback
Our quantity of interest is the time-dependent velocity v t of a trapped, underdamped Brownian particle of mass m, coupled to a thermal reservoir at temperature T with viscous damping coefficient γ, evolving according to the Langevin equation [32] mv t = −γv t + f t + ξ t , ( where f t is an externally controlled force, and ξ t is zero-mean Gaussian white noise with covariance ξ t ξ s = 2γT δ(t − s). Starting here we set Boltzmann's constant to unity, k B = 1.
In the absence of control, f t = 0, the velocity v t relaxes to an equilibrium Boltzmann distribution p eq (v) ∝ exp[−mv 2 /(2T )]. In the following, we will vary f t using feedback in order to cool the particle, that is damp its thermal fluctuations, thereby reducing its kinetic temperature T kin = m v 2 < T . Before we get to that, it is helpful to first review the energetics and thermodynamics of a driven, underdamped Brownian particle without feedback, so that we can appreciate the differences that arise in the presence of feedback.
To this end, we require the Fokker-Planck equation associated with (2) for the time-dependent probability density p t (v) [33], where we have introduced the (probability) current J v t . Anticipating our discussion of the thermodynamics, we divide the current into its irreversible half, which is antisymmetric under time-reversal, and its reversible half, which is time-reversal symmetric, as [33,34,35] Key to this splitting is treating the force f t as even under time reversal, as typically assumed for a force arising from an external potential. With this identification, the irreversible portion of the current J irr t arises solely due to the forces imparted on the particle by its surroundings: the friction, −γv t , and the fluctuating force, ξ t .
Moving on to the thermodynamics, we have from stochastic energetics an unambiguous identification of the heat flow into the system as the work done by the thermal reservoir on the particle [20,34,36,37], which on average readṡ It notably only depends on the irreversible current arising from the forces due to the thermal reservoir. The particle's (internal) energy is its average kinetic energy By differentiating E with time and substituting in the Fokker-Planck equation (3), we are able to identify the extracted work rate via the first law of thermodynamicṡ E = −Ẇ ext +Q, as the average power delivered against the external force f t . From stochastic thermodynamics, we also have the (irreversible) entropy production rate [34,35,37] where we have the traditional splitting into the time variation of the system's Shannon entropy and the reversible entropy exchange with the environmenṫ Notably, the entropy production only depends on the irreversible current, since it is a measure of the time-reversal symmetry breaking of the dynamics [34]. This property is what allowed us to pullout the contribution due to the heat, which is also only a function of the irreversible current.

Dynamics and energetics with feedback
Our main focus in this paper is feedback cooling, where we vary f t in response to measurements of the velocity. Following [21], we consider a feedback protocol where we measure the velocity v t obtaining outcomes y t with some error, and then feed back those measurements by applying a force f t = −ay t that acts as an additional friction, extracting work. A simple way to incorporate measurement error is to add to our readout of v t Gaussian white noise η t of zero mean and covariance η t η s = σ 2 δ(t−s), with σ 2 quantifying the measurement uncertainty: for example as y t = v t + η t . However, white noise fluctuations are very violent. To make the problem more tractable, we smooth over the noise by applying a low-pass filter with time constant τ to the measurements: [38]. We are therefore led to the following modified dynamics including measurement and feedback [21] mv t = −γv t − ay t + ξ t where a is the feedback gain. It is important to note at this point that y t is merely a model of measurement outcomes. We are not making any assumption about the physical system that records the measurements, nor implements the feedback in response. In general, the joint system relaxes to a time-independent, nonequilibrium steady state, where heat is continuously being extracted as work to maintain the particle at the cooled kinetic temperature. This is the scenario we focus on in the following.
To discuss the energetics, we need the equivalent description of the dynamics in (13) in terms of the Fokker-Planck equation for the time-dependent probability density p t (v, y), with (probability) currents Again we can split the velocity current J v t into irreversible and reversible pieces, as in (5), This splitting singles out the irreversible current as solely due to the thermal reservoir as before [cf. (5)], which is required to correctly link the heat and entropy production in the following. Again, this division relies on choosing f t = −ay t as time-reversal symmetric, just as in the preceding section. Our focus is the steady state solution, which due to the linear, Gaussian dynamics is the Gaussian probability density [32], where the steady-state covariance matrix is and the associated steady-state currents are J v s and J y s . The entries of Σ can be determined by plugging (18) into (14), as detailed for a more general model in [21]; however their precise expressions are unilluminating and therefore relegated to Appendix A. We do observe that the reduced distribution of the velocity p s (v) = p s (v, y) dy is also Gaussian. Therefore, it has the same structure as an equilibrium distribution, but with a smaller variance, or a cooler effective temperature [21] T kin = T 1 + (a/γ)(aσ 2 /(2T )) + (1 + a/γ)(γτ /m) 1 + a/γ where the inequality is only satisfied in the regime of good cooling, aσ 2 ≤ 2T . Otherwise too much measurement noise is fed back into the velocity, effectively heating it. Again from stochastic energetics the heat current is identified as the energy lost due to the irreversible current arising from the thermal noise [20,36,37] which importantly only depends on the velocity as in (5), since the measurement and feedback do not affect the interaction with the thermal environment. In a similar way as before (9) the extracted work rate iṡ due to the correlations between the feedback force and the particle. In the steady state, W ext can be simplified using the defining equations for the elements of the covariance matrix Σ in Appendix A, in terms of the velocity's relaxation rate, 1/τ v = γ/m. When the feedback is successful, and we have reduced the kinetic temperature T kin < T , we must be extracting work, W ext > 0, recovering the results of [21]. We finally will require the fluctuating-trajectory solutions of (13) up to time t, v t 0 = {v s } t s=0 and y t 0 = {y s } t s=0 . We can obtain the probability densities for these trajectories by discretizing time and then using the usual procedure for obtaining pathintegral densities, which we sketch in Appendix B. The joint density P[v t 0 , y t 0 ] can be conveniently expressed in terms of two probability densitieŝ suitably normalized, and as with initial probability density p(v 0 , y 0 ). It cannot be under emphasized that eachP is not the conditional probability of the feedback process, i.e.,P[y t since v t and y t influence each other when there is feedback [8]. Instead, we can understandP[y t 0 |v t 0 , y 0 ] by first imagining that we fix the entire velocity trajectory v t 0 , and then evolve y t alone according to (13). This procedure has no feedback and the probability to observe a particular measurement trajectory is exactlŷ P[y t 0 |v t 0 , y 0 ]. A similar interpretation holds forP[v t 0 |y t 0 , v 0 ] as well. This distinction betweenP and P will become important in section 3.1 when we introduce the transfer entropy rate.

Information
In this section, we present the definitions of the various measures of information that can be used to bound the extracted work during a feedback process. In the next section, section 4, we will discuss the physics behind them.

Transfer entropy rate
The first information measure we discuss is the transfer entropy rate from v t to y t . The transfer entropy is a directional measure of information, which quantifies in an information-theoretic manner how much the dynamics (or more specifically the transition probabilities) of y t are influenced by v t [12]. For our continuous stochastic process, it readṡ In Appendix B, we justify this expression by discretizing the evolution and then utilizing the well-developed theory for repeated, discrete feedback [3,5,7,14,25,39]. When no measurements are taking place, the dynamics of y t is independent of v t , P[y t 0 |v t 0 , y 0 ] = P[y t 0 |y 0 ], and the transfer entropy rate is zero. On the other hand, the more influence the velocity has on the measurement outcomes the larger the transfer entropy rate. Furthermore, when there is only one measurement the transfer entropy simplifies to the mutual information [8]. An alternative, equivalent expression for the transfer entropy rate in the context of continuous feedback has been introduced by Sandberg et al [11]. A similar analysis was performed by Fujitani and Suzuki for discrete Markov processes [6,25]. The transfer entropy rate in feedback systems described by continuous-time, discrete Markov processes has been extensively studied in [10,14,39,40].
To compareİ v→y with the other information measures, we calculate its value in our model of feedback cooling. The calculation is facilitated by noting that for stationary Gaussian processes, as we have, integrals of the form (27) can be conveniently expressed in terms of the power spectra -Fourier transforms of the correlation functions. For (27), we demonstrate in Appendix C that it can be formulated aṡ where C yy (ω) is the power spectrum of y t , andĈ yy|v (ω) is the Fourier transform of the variance of y t given a fixed trajectory v t 0 . We have carried out the integral in Appendix D with the resulṫ New information is acquired at the relaxation rate of v t , γ/m = 1/τ v ; that is we learn new information about v t only as fast as v t changes enough to detect. In addition, the transfer entropy rate does not depend on the feedback parameters a and τ , but only on the measurement accuracy σ 2 through the dimensionless signal-to-noise ratio SNR = (2T /γ)/σ 2 , which quantifies the relative size of the measurement accuracy to the thermal diffusion of the velocity. As a result, for perfect measurements without error, σ = 0, the SNR diverges and with it the transfer entropy rate. Thus, error-free measurement corresponds to infinite information, consistent with the notion that infinite information is required to localize a continuous variable with perfect precision.

Information flow
We next consider the information flow, whose origin is in the exchange of information between the velocity and the auxiliary measurement device implementing the control. It was first considered in the context of interacting diffusion processes [15], but subsequently has been introduced in the analysis of the thermodynamics of continuouslycoupled, discrete stochastic systems [14,16,41]. When the coupling is not continuous, but each system takes turns evolving, the information flow simplifies to the mutual information [16,17,18]. In order to facilitate connections to the other information measures, we sketch in this section the basic arguments leading to the information flow, following the program outlined in [16], and then calculate its value in our feedback cooling model. First, we must note that strictly speaking this approach requires that y t be the degree of freedom of a physical system, not simply an abstract measurement outcome. Still, in this section we would like not to comment on the precise thermodynamics of y t , taking it only as a generic thermodynamic system. We will come back to its precise interpretation in section 4 when we compare the physics underlying the different information measures.
The key insight in this approach is that the (irreversible) entropy production of the joint system of v t and y t can be divided aṡ with positive contributions arising due to the irreversible current in the v-direction (16), and separately from y t ,Ṡ y i . The next step is to perform the traditional splitting ofṠ v i into the variation of the Shannon entropy due to v t [cf. (11)], and the heatQ (21) aṡ The additional contribution due to the influence of y t is an information-theoretic piecė which is (minus) the variation of the mutual information ‡ between v t and y t , due to the fluctuations of v t [42]. The mutual information I(v t ; y t ) is a measure of correlations, quantifying how knowledge of the measurement outcomes reduces uncertainty in the velocity. Whileİ flow may be positive or negative, in the regime of good cooling where we are extracting work, we will always haveİ flow ≥ 0. In the steady state, J v s = 0, andQ =Ẇ ext , so that (33) reduces to [14,16] in the form of (1). Employing the steady-state solution in (18), we have for the steady-state information floẇ where |Σ| denotes the determinant of Σ. Unfortunately, we have been unable to formulate a more transparent expression in general. Even still, the information rate again only grows as fast as the relaxation rate of the velocity γ/m = 1/τ v .

Entropy pumping
For the feedback cooling of a Brownian particle without errors an entropy pumping bound has been introduced by Kim and Qian [19]. This approach has subsequently been developed by Ge [43] and extended to the setup in (13) by Munakata and Rosinberg [20,21,22], which we discuss in this section. The entropy pumping approach is based on a coarse graining of the Fokker-Planck equation (14). Following [21], we formally integrate out y t from (14) to obtain the reduced Fokker-Planck equation where we have identified an effective feedback forcẽ Furthermore, we treatf fb t as time-reversal symmetric, as we would expect for an external force [21]. In which case, we single out from the coarse-grained current the irreversible current exactly as for the no-feedback case (5), This will allow us to connect the entropy production in the environment with the heat. Equation (38) is not a closed equation for p t (v); the measurement dynamics are required to solve it. Nevertheless, the entropy pumping approach is to treat (38) as a thermodynamically consistent equation for p t (v) with an effective external forcef fb t . In this case, the entropy balance is developed in analogy to the no-feedback setup, as in (10) where the second equality follows by substituting in definition of the coarse-grained currentJ t (v) in (40). Here, dv is equivalent to the expression for the rate of change of the system's Shannon entropy including feedback in (32), and the additional entropy pumping term arises due to the coarse-grained feedback force, As pointed out in [21], the feedback force is proportional to the minimum mean square error estimate of y t given v t . Other than that though, there does not appear to be a crisp interpretation of the entropy pumping as a form of information, like for the transfer entropy rate and information flow.
Using the steady-state distribution in (18), we have for the steady-state entropy pumping [21] with positivity guaranteed when there is cooling T ≥ T kin .

Trajectory mutual information
Another information measure that has aroused some attention is the mutual information rate between the entire v t 0 and y t 0 trajectories [13,44]. For continuous stochastic processes, the trajectory mutual information rate is [42] I traj = lim It quantifies how much the uncertainty about the entire velocity trajectory v t 0 is reduced given knowledge of the entire measurement trajectory y t 0 , and vice versa, as it is symmetric.
Theİ traj bound on the extracted work follows readily once we observe a close connection between the trajectory mutual information and the transfer entropy pointed out in [14]; by substituting P withP (26) inİ traj , it follows thaṫ after identifying the transfer entropy rate from y t to v t ,İ y→v ≥ 0, defined analogously toİ v→y (27). The positivity of the transfer entropy implies thaṫ giving (1) for the trajectory information, which is always weaker than the transfer entropy bound. The trajectory information rate has been studied in numerous contexts and has a well-known expression in terms of power spectra [45,46,47,48] that we recall in Appendix C, In Appendix D, we perform this integral to finḋ Comparing with (45), we have as a byproduct the transfer entropy rate from y t to v t ,

Maximum work
A final bound on the extracted work is simply to maximizeẆ ext in (23) with respect to the measurement parameters a and τ . While the result is not general, remarkably for linear Guassian processes it has a close connection with the transfer entropy rate, as first noticed by Sandberg et al [11]. Using standard calculus methods, the extracted work is bounded above by its maximal valuė akin to (1), for parameter values The optimal measurement has no low-pass filtering: It is immediately fed back into the particle to control it. Remarkably, the extracted work is again bounded by the transfer entropy rate, except multiplied by the cooled kinetic temperature of the particle, instead of T .  . The most striking feature of figure 1 is the hierarchy of information measures, apart fromẆ max ext , which does not actually have a generic information interpretation. In fact, this ranking holds quite generally. We have already seen thatİ traj ≥İ v→y in section 3.4 when discussing the second-law-like inequality for the trajectory information. The middle inequality,İ v→y ≥İ flow , has been demonstrated by Hartich et al [14] for continuous-time, discrete Markov jump processes. For diffusion processes, a similar conclusion was reached by Allahverdyan et al [15] except for a slightly different transfer entropy rate that uses only the most recent measurement, which upper bounds the transfer entropy rate considered here, as pointed out in [14]. Nevertheless, the proof for jump processes in [14] can be carried over to diffusion processes, once their evolution is discretized. The last inequality between the information flow and the entropy pumping also is generic. This follows by bounding the steady-state entropy production of v t in the information-flow description (31) using a coarse-graining inequality [21] to connect it to the coarse-grained, entropy-pumping approach: where we have employed the entropy balance of entropy pumping in (41). Clearly, As a lower bound on all other information measures, the entropy pumping can be given an information-theoretic interpretation, which till now has been lacking, as a minimal information requirement for successful feedback cooling. An alternative perspective on this hierarchy of information measures comes from considering the efficiency of work extraction By utilizing the smaller information measures, we will estimate higher efficiencies, even without changing the measurement or feedback procedure. This conclusion is somewhat surprising, since it makes the notion of efficiency somewhat arbitrary. We will come back to this observation later, after discussing the physical origins of the different information measures.
We also see in figure 1 that the transfer entropy rate and the trajectory mutual information diverge as the measurement error tends to zero, σ 2 → 0; whereas the other measures remain finite. Munakata and Rosinberg have also observed that the entropy pumping displays a nontrivial structure, attaining a maximum at the maximum cooling rate [21]. Figure 1 demonstrates thatİ flow displays a similar structure, but its maximum does not quite correspond to the maximum cooling. Most likely, this discrepancy arises due to the effect of coarse-graining.
3.6.2. Optimal control and the Kalman-Bucy filter. Interestingly, closer connections exist between the information flow, transfer entropy rate, and maximum extractable work that are revealed by re-examining our feedback problem from the perspective of optimal control theory.
The feedback cooling we have been addressing is a special case of a classic problem in optimal control theory: the characterization of feedback controllers that minimize quadratic performance objectives of the form where ρ > 0 is a constant parameter used to tune the trade-off between keeping small fluctuations in v t and applying a strong control force f t , for example [49]. For the special case of cooling, we have been focused on minimizing v 2 t alone, which corresponds to ρ → 0.
Assuming linear dynamics and Gaussian noise, the optimal feedback controller with access to noisy measurements v t + η t can be written in the form wherev t is the abstract dynamical state of the controller, and G and K are carefully chosen constants. According to the separation principle [49,50], these parameters G and K can be determined as the solutions to two independent optimization problems: the optimal gain G is obtained by minimizing J , temporarily assuming there is no measurement noise, σ = 0; whereas the optimal K is obtained by minimizing the estimation error, see below, and is independent of the tuning parameter ρ. While the exact expression for the optimal gain G is of little interest to us here, we do note that it tends monotonically to infinity as ρ → 0. This makes intuitive sense, since ρ → 0 means we only care about minimizing the variance v 2 t and assess no cost for large control forces f 2 t . On the other hand, optimal filtering theory selects an optimal K by minimizing the estimation error, given all the past measurements (v + η) t 0 . The steady-state optimum, achieved for is Thus,v t represents the best estimate of v t given all past measurements. In fact, no other filter, nonlinear or otherwise, can produce a better estimate than the one described here, which is known as the Kalman-Bucy filter [49,51]. Remarkably, the optimal controller (57) with Kalman-Bucy filter can always be realized using the feedback cooling dynamics in (13) by a simple rescalinĝ and choosing the parameters a and τ as This mapping allows us to investigate our information measures from a new point of view by replacing y t with the optimalv t . For starters, maximal cooling, which coincides with the maximum extracted work W max ext (50), is obtained when G → ∞, in which case the optimal controller (62) becomes recovering a * and τ * in (51) as expected.
The optimal controller also extracts the maximum amount of information. To see this, first note that optimality of the estimatev t implies that the estimation error is stochastically orthogonal to the estimate v t (v t −v t ) = 0 for all t [49]. This property greatly simplifies the steady-state covariance matrix where the variance of the estimate is simply Note optimal cooling is achieved by G → ∞, forcing σ 2 v → 0, so that fluctuations in the velocity σ 2 v = E are only caused by estimation error. Furthermore, by exploiting the structure of Σ in (64), the expression for the steady-state information flow (37) greatly simplifies, for all G. This is a very interesting observation, supporting the claimed optimality of the Kalman-Bucy filter. We already know thatİ flow ≤İ v→y . What we see here is that the class of controllers given by (57), i.e., with K fixed (59) and G free, saturates the bound, maximizing the information flow. Hence, a controller with a small gain G (zero even) only uses information to create an optimal estimate of the process, whereas a high gain cools as well. To gain further insight into equality (66), we have to look at the transfer entropy rate and information flow from a different perspective. Namely, the transfer entropy rate can also be defined as the rate of growth of the mutual information between v t and the entire trajectory of measurement outcomes y t 0 , that is the change in I(v t ; y t 0 ). On the other hand, the information flow is the rate of growth of the mutual information between v t and just the most recant measurement y t , that is the change in I(v t ; y t ). The inequalityİ flow ≤İ trans is then related to the simple idea that the entire trajectory of measurements contains more information than just the last. Now, it is known that the Kalman-Bucy filterv t is a sufficient statistic for the conditional distribution of v t given the measurements y t 0 [52]. In other words, everything useful in a collection of measurements for predicting v t is contained in justv t , or in terms of the mutual information I(v t ;v t 0 ) = I(v t ;v t ). This equality translated into rates implies (66). In figure 2, we illustrate how the extracted work depends on G, and how the maximum is asymptotically achieved. In addition, we see thatİ flow =İ v→y holds for all G. We can also conclude that with certain choices of a and τ (namely a KB and τ KB in (62)) our original setup (13) can always saturateİ flow ≤İ v→y , which is indeed observed in figure 1 for a ≈ 2.

Energetics of Information and Measurement
We have seen that there are various, distinct measures of information that each offer a nontrivial bound for the extracted work. However, there does not seem to be an obvious reason to prefer any of one these measures. To this end, we investigate their origins in this section. We will find that the transfer entropy rate and the information flow both correspond to the information that is recorded in an auxiliary system, or memory, and therefore is subject to the limits of thermodynamics, as originally suggested by Landauer [2]. In particular, we show that these two information measures both bound the minimum energy required to gather that information through distinct thermodynamic processes, implying that the energy that we are able to extract as work originates in the (free) energy supplied by the memory.

Information flow
Let us start with the simpler measurement scenario corresponding to the information flowİ flow . Actually, we have already touched on its physical interpretation when we introduced it in section 3.2. Recall, there we considered the measurement outcomes y t to correspond to a physical degree of freedom of an auxiliary system. We now clarify that interpretation by taking y t to be the position of a secondary, harmonicallytrapped, overdamped Brownian particle. To be thermodynamically consistent, the origin of the measurement noise must be a thermal reservoir, which requires imposing the Fluctuation-Dissipation theorem [37]: We have chosen the temperature of the measurement device to be the same as the controlled system, which is the customary choice. From this point of view, (13) is the equation of motion for an overdamped Brownian particle with viscous damping coefficient τ trapped in a harmonic potential V (y, v) = (y−v) 2 /2 of unit spring constant, centered about the velocity, as illustrated in figure 3. Alternatively, such a coupling can be implemented in an electric circuit as was presented in [11]. The result is that the position of the measurement oscillator y t feels a fluctuating force making it track the velocity v t , thereby establishing and maintaining correlations. Roughly speaking, the measurement oscillator is constantly learning new information at a rateİ flow , which keeps getting rewritten in the value of its position. When introducing the information flow, we divided the entropy production into two positive contributions (30), one due to the velocityṠ v i , and another due to the measurementsṠ y i . When studying the extracted workẆ ext , we focused onṠ v i . However, a similar analysis also holds forṠ y i , which verifies that the y-system must consume at leastİ flow free energy to sustain the correlations that promote feedback. Observing that as a position y t is even under time-reversal (consistent with our previous analysis in section 2.1), we develop its thermodynamics by splitting its current J y t (15) into irreversible and reversible portions as .

(68)
Notice that here the irreversible current is the time-symmetric contribution, since y t is even under time-reversal [34]. Then, repeating the analysis in section 3.2, we have that in the steady state [14,15,16] whereQ y = − yJ irr,y s (v, y)dvdy = (σ 2 y − T )/τ is the heat flow rate into y's reservoir. Thus, in the steady statė In order to track the velocity, y's environment continually absorbs heat at a ratė Q y . We verify this bound in figure 4, whereQ y is plotted withİ flow . The minimum Q y = Tİ flow is reached when the measurement device operates reversibly in the limit τ τ v , so that y t rapidly relaxes to its instantaneous equilibrium centered about v t : In addition, we have already argued that the entropy pumping lower bounds the information flow,İ flow ≥İ pump (54). As a result,İ pump offers a weaker lower bound on the energy required for an auxiliary system to provide the entropy-pumping feedback, Q y ≥ Tİ pump , which is verified in figure 4 as well.

Transfer entropy rate
The transfer entropy rate can also be understood as the minimum free energy required to measure, but with an alternative measurement scenario. In the previous section, the information flow was shown to bound the energy required to constantly rewrite a single memory with each new measurement y t . By contrast, the setup for the transfer entropy rate is much closer to that envisioned by Landauer and Bennett in their thermodynamics of computation [2, 53]: Each measurement is recorded separately in one of a collection of memories through a specific driven thermodynamic process; one example of which was recently described in [23].
The central idea is that each measurement outcome is recorded in a distinct memory. Therefore, to track the system over any finite time interval, say from time s = 0 to t, we require an infinite number of memories in which to record the infinity of measurements. However, this is difficult to analyze. So to proceed, we discretize time as s k = k∆s, with k = 0, · · · , N and ∆s = t/N , where the measurement outcome at time s k is denoted simply as y k ≡ y s k , and similarly v k ≡ v s k . To store these measurement outcomes, we imagine a collection of N auxiliary memories with phase space positions m k , prepared initially in positions m k,0 distributed according to ρ 0 (m k,0 ). The measurement is a thermodynamic process during a time interval of length θ in which the k-th memory is manipulated, with the velocity fixed, in such a way to reproduce the correlations with v k−1 embodied in the measurement statistics of y k . In other words, we demand that the statistics of the k-th memory after the measurement are m k,θ ∼ y k (equality in distribution).
To see how these ideas play out in our model system, consider the discretized version of the Langevin equation (13) where the ∆η k are independent Gaussian random variables of zero mean and covariance ∆η k ∆η l = σ 2 ∆sδ kl . Equation (71) is a rule that tells us how the measurement outcome y k at time s k depends on the velocity v k−1 as well as the past measurement outcome y k−1 stored in a previous memory. Such measurements that depend on past outcomes are sometimes called non-Markovian measurements [40]. Specifically, y k is characterized by a Gaussian probability density roughly centered about the velocity with a variance depending on the measurement error. Now, in view of our previous discussion, we desire a physical system to act as a memory and a measurement process that prepares that system in a statistical state with the probability density in (72). A natural choice is an overdamped harmonic oscillator coupled to a thermal reservoir at temperate T . Initially each memory oscillator is prepared in equilibrium with an arbitrary initial spring constant k 0 centered about zero, as illustrated in figure 5. Since each measurement is performed in sequence, it is System Memories Figure 5. Schematic illustration of the transfer entropy rate measurement scenario: At time s k , the velocity v k−1 is recorded in the the k-th memory, harmonic oscillator (red dot) with initial state m k,0 through a nonautonomous interaction that slowly shifts and expands its potential V (m k , v k−1 , m k−1 ), before quickly turning off. Concurrently, the probability density (pink shaded region) expands from ρ 0 (m k,0 ) to a width Ω 2 and shifts by µ k (v k−1 , y k−1 ), terminating the process in the measurement probability density ρ θ (m k,θ ) equivalent to (72), correlated with v k−1 and the past measurement outcome y k−1 stored in the previous memory state m k−1,θ . The process is then repeated, with each new measurement recorded in the next memory the tape.
attractive to visualize the phase spaces of the N measurement oscillators aligned in a row, or tape. Then one by one we couple each measurement oscillator to the system as well as past memories, so as to establish correlations. The density in (72) suggests that the measurement protocol for the k-th oscillator should be the quasistatic turnon of an interaction that shifts the center of the harmonic oscillator to µ k -which includes interactions with the past memories -while simultaneously expanding the spring constant to k 1 = T /Ω 2 , which results in the interaction potential as depicted in figure 5. As a result, upon completion of the k-th measurement the memory's position m k,θ has settled into an equilibrium distribution ρ θ (m k,θ |m k−1,θ , v k−1 ) ∝ exp [−V (m k,θ , m k−1,θ , v k−1 )/T ] equivalent to (72). To complete the measurement, we must freeze the state of the memory to lock in the correlations, and remove the interactions by turning off V . One possible, though admittedly idealized, option is to instantaneously set V = 0, and then immediately turn off the dynamics of the measurement oscillator -perhaps by quenching the temperature to zero -so that the oscillator can no longer move. By repeating this sequence of actions on each new memory, we store a collection of measurement outcomes, each in a different physical memory. Now to be precise each measurement has to be performed instantaneously from the point of view of the velocity. This merely means that the time-scale of the evolution of the individual memories much be must faster than the velocity, θ τ v , so that the measurement is completed before the velocity changes appreciably [23]. However, this assumption is not unreasonable, since measurements are usually assumed to read out the instantaneous state of the system.
Having described how we can mimic the measurement statistics in a physical situation, we now address the thermodynamics from a general point of view, applying the methods of [17,18,23]. Our analysis is based on the following second-law-like inequality that relates the work performed in an isothermal process to the increment in the nonequilibrium free energy [54,55]: For a thermodynamic system with microscopic states z, the work W performed along an isothermal process during which the system's probability density transitions from p(z) to p (z ) is bounded as where ∆F(z ) = F(z ) − F(z) is the change in the nonequilibrium free energy F(z) = U (z) − T S(z) defined in terms of the average internal energy U (z) and entropy S(z) = − p(z) ln p(z) dz. The nonequilbirium free energy is a natural extension of the equilibrium free energy to systems characterized by an arbitrary probability density, since it reduces to the equilibrium free energy for systems in equilibrium. We begin by focusing on the work done during the k-th measurement, W k , during which the k-th memory becomes correlated with not only the velocity v k−1 but all the past memories m k−1 0 = {m l,θ } k−1 l=0 ∼ {y l } k−1 l=0 through the velocity which depends on the entire past. Applying (74), we have where ∆F(m k,θ |m k−1 is the change in the nonequilibrium free energy of the k-th memory, corresponding to the change in the conditional density from ρ 0 (m k,0 |m k−1 0 , v k−1 ) = ρ 0 (m k,0 ) -due to the initial independence of each memory -to ρ θ (m k,θ |m k−1 0 , v k−1 ) = ρ θ (m k,θ |m k−1,θ , v k−1 ). We single out the new correlations by introducing the mutual information between m k,θ and v k−1 conditioned on all the past measurements as I(m k,θ ; v k−1 |m k−1 0 ) = S(m k,θ |m k−1 0 ) − S(m k,θ |m k−1 0 , v k−1 ) [42]. Substituting in this definition, (75) becomes where ∆F(m k,θ |m k−1 0 ) is the change in free energy conditioned on just the past memories: ρ 0 (m k,0 ) → ρ θ (m k,θ |m k−1 0 ). Summing over all measurements, we find where W = N k=1 W k is the work to perform all N measurements, ∆F(m N 1 |m 0 ) = N k=1 ∆F(m k,θ |m k−1 0 ) is the change in entire tape's free energy, and we have identified the discrete version of the transfer entropy [10], which is reviewed in Appendix B. Importantly, by construction, the statistics of each memory reproduce the statistics of the measurement outcomes, so equivalently Taking the limit as the number of measurements go to infinity while the time between them goes to zero, we obtaiṅ Thus, the transfer entropy rate is the minimum rate at which free energy is consumed to write to the memories. The slow protocol that we described previously saturates this bound, since it is quasistatic and therefore thermodynamically reversible. At this point, it is worthwhile to make a connection to a class of Maxwelldemon models that exploit a tape of low entropy, auxiliary systems or cells, similar to what we have just described [24,56,57,58,59,60]. Apart from the study in [60], these models use an ideal tape that has no internal energy, and therefore cannot exchange energy with the system, but only entropy; a setup sometimes referred to as an information reservoir [24,58]. Under these conditions, a second-law-like inequality has been predicted that shows that the extracted work is bounded by the increase in entropy of each individual auxiliary system, ignoring the correlations between the different cells. Our memories, on the other hand, have internal energy and therefore the natural thermodynamic quantity to consider is the free energy instead of the entropy. Therefore to fit our measurement model into this tape-model framework, we must relate our information bound on the work to measure to a bound that ignores the correlations. To this end, we start with the bound for the energy to measure W − ∆F(m N 1 |m 0 ) ≥ T I N v→y in (77), which includes through ∆F all the correlations between different memories. By noting that ignoring correlations and conditioning increases the entropy, H(m N 1 |m 0 ) ≤ k H(m k,θ ) [42], we can conclude that ignoring the correlations decreases the free energy F(m N 1 |m 0 ) ≥ k F(m k,θ ). As a result, we have from (77) and the initial independence of each memory the series of inequalities For the ideal tape with no internal energy this reduces to k ∆H(m k,θ ) ≥ W ext recovering the ideal-tape bound [24,56,57,58,59,60] in our setup. Equation (81) may lead us to conclude that the bound on the extracted work from the tape-model framework, W − N k=1 ∆F(m k,θ ), is weaker than that provided by the transfer entropy. However, this would be too hasty, because these tape models allow a more general interaction between the tape cells and the system. Whereas, in our setup the memory evolution is assumed to occur separately with the velocity fixed, the tape models consider a dynamics where the memory (or cell) would be allowed to evolve simultaneously with the velocity. From this point of view, the measurement model we have presented is a special case of these more general tape models, and it is exactly our assumption that the velocity is frozen during measurement that allows us to tighten the tape-model bound using the transfer entropy. Further comparisons of such tape models with other information measures and more traditional statements of the second law can be found in [23,24].
Finally, it should be noted that the preceding second law analysis can be viewed as a specific implementation of the information flow framework (outlined in sections 3.2 and 4.1) applied to a nonautonomously driven auxiliary memory composed of a sequence of many subsystems, see [16].

Discussion
The transfer entropy rate and information flow both bound the energy consumed during measurement. However, each measurement scenario is distinct, and in general each of these information measures will not bound the energy consumption for the other's measurement scenario. An example whereİ v→y >Q y /T is possible is presented in [14]; thus, the transfer entropy rate does not generally lower bound the amount of heat dissipated by a single memory being constantly rewritten. Our model corroborates this observation, as verified in figure 4 by the crossing ofQ y /T andİ v→y . The one exception is if the the controller implements the Kalman-Bucy filter (57). In which case, the equality of the information measures,İ flow =İ v→y , implies that there is a unique lower bound to the energy required for measurement.
To conclude this section, we take a broader perspective. Our observation that the transfer entropy rate and information flow both represent the minimum (free) energy consumed (or alternatively the entropy produced) in the auxiliary memory to create that information, suggests that it is reasonable to interpret some second-law-like inequalities as actually telling us something about the thermodynamics of the system and its surroundings, where the surroundings include the measurement device. This allows us to incorporate information into the standard statement of the second law of thermodynamics through a kind of information reservoir on equal footing with other traditional thermodynamic reservoirs, similar to what was suggested for tapes in [24,58]: which is equivalent to (1) in the steady state. Here,İ represents the minimum entropy produced in the environment that allows for feedback, with the minimum attained for reversible measurement. The appropriate choice ofİ -transfer entropy rate or information flow -depends on which type of information reservoir we wish to use. From this point of view, the efficiency ε introduced in (55) is a true measure of energetic efficiency that quantifies how faithfully the energy supplied by a reversible memory is extracted back out as work.

Summary
We have explored a collection of information measures that appear in second-law-like inequalities for measurement and feedback, using the tools of stochastic thermodynamics and optimal control theory. We have seen that these measures form a hierarchy of bounds on the extracted work, and that the Kalman-Bucy filter optimally will extract information and energy. Even though each measure offers a different numerical bound on the extracted work, they also each correspond to different ways of gathering information. With this distinction in mind, these second-law-like inequalities can be seen as manifestations of the second law of thermodynamics, since they include the entropy production of the system and surroundings, including the controller.

Appendix B. Path probabilities and the transfer entropy rate
In this appendix, we demonstrate how we arrive at (26) for the trajectory probability density P, and how this structure allows the compact expression for the transfer entropy rate in (27). The analysis precedes by discretizing the evolution over the time interval s = 0 to t into steps of width ∆s = t/N as s k = k∆t for k = 0, . . . , N with v k ≡ v s k and y k ≡ y s k . We are interested in determining the probability density P[v N 0 , y N 0 ] to observe the pair of discrete trajectories v N 0 = {v k } N k=0 and y N 0 = {y k } N k=0 . To this end, we discretize the Langevin equation (13) as where ∆ξ k (∆η k ) are independent, zero-mean, Gaussian random variable with covariance ∆ξ k ∆ξ l = 2γT ∆sδ kl ( ∆η k ∆η l = σ 2 ∆sδ kl ). From this we deduce that to lowest order in ∆s the transition probability splits into separate v and y evolutions as [15] P (v k+1 , y k+1 |v k , y k ) = P (v k+1 |v k , y k )P (y k+1 |v k , y k ) Thus, the joint trajectory probability takes the form P[v N 0 , y N 0 ] = P (v N |v N −1 , y N −1 )P (y N |v N −1 , y N −1 ) · · · P (v 1 |v 0 , y 0 )P (y 1 |v 0 , y 0 )p(v 0 , y 0 ), (B.3) with arbitrary initial density p(v 0 , y 0 ). Since the evolution naturally divides, it suggests introducing the trajectory conditional probabilitieŝ in terms of which the joint trajectory probability becomes Equations (24), (25), and (26) are the continuous time versions of the preceding equations obtained in the limit ∆s → 0. In this discretized setup, we can directly apply the theory of discrete feedback [3,5,7,14,25,39]. Here, the transfer entropy after N measurements is given as We see that the transfer entropy is the relative entropy between the transition probability of y given v, P (y k+1 |v k , y k ), and the unconditioned transition probability, P (y k+1 |y k 0 ), averaged over (v k , y k 0 ). Recall that the relative entropy between two probability densities f (x) and g(x) is D(f ||g) = f (x) ln[f (x)/g(x)]dx [42]. In this way, the transfer entropy measures the affect the velocity has on the measurement dynamics, that is, how distinguishable the measurement dynamics given the velocity are from the measurement dynamics without the velocity. Expanding the sum we can rewrite (B.7) as The continuous time version appears in (27).

Appendix C. Power spectra formulae for information rates
In this appendix, we sketch how entropy rates for stationary Gaussian processes can be expressed in terms of the processes' correlation functions, following the developments in [47,48]. Let us consider a discretization with spacing ∆s = t/N of a Gaussian stochastic process x = {x k } N k=0 . It is completely characterized by its mean µ = {µ k } = { x k } and covariance matrix C with elements C mn = (x m − µ m )(x n − µ n ) , which we assume to be time-independent, C mn = c(|m − n|), an example being a stationary process: The power spectra formulae for the information rates follow from the observation that the entropy of such a Gaussian distribution is completely characterized by the covariance matrix: H( x) = N 2 ln(2πe) + 1 2 ln |C|.

(C.2)
Since the process is causal, the covariance matrix has a Toeplitz structure, C mn = c(|m − n|), which allows us to diagonalize it in the limit N → ∞ using its Fourier transform C(ω) = N s=0 e −iωs c(s), with ω = 2π/t. In which case, the entropy rate can be expressed as [48] H = lim Taking the continuous time limit ∆s → 0, we recover the expression in (27). Similarly, the trajectory mutual information iṡ dω, where C(ω) is the Fourier transform of the covariance matrix of the joint measurement and velocity process. One can show, as in [47], that C(ω) = C vv (ω)C yy (ω) − |C vy (ω)| 2 , (C.6) which when substituted into (C.5) recovers (47) after the taking ∆s → 0.