Reconsideration of the generalized second law based on information geometry

The maximum work formulation of the second law of thermodynamics has been generalized to transitions between nonequilibrium states. The generalization involves the relative entropy between nonequilibrium states and canonical states. The relative entropy scaled by the temperature of the canonical state quantifies the work available for extraction from the nonequilibrium state. This scaled relative entropy can be interpreted as an energy-dimensional divergence in information geometry. The generalized Pythagorean theorem relating three energy-scaled divergences, which we interpret as thermodynamic distances, gives a geometrical interpretation of the generalized maximum work formulation. Under this interpretation the optimal cyclic operation to extract work from a nonequilibrium state is discussed in a simple two-level quantum system.


Introduction
The role of information processing in understanding the second law of thermodynamics has been evident since Maxwell [1] introduced his demon [2,3]. Landauer [4], Bennett [5] and others [6][7][8], also pointed out that information processing itself may be accompanied by thermal effects, sometimes in subtle ways. These considerations have led researchers to recognize information as a resource on par with heat and work in formulating a thermodynamics of information [9][10][11][12][13].
We reconsider, based on information theory, the maximum work formulation of the second law generalized to transitions between nonequilibrium states [14][15][16][17][18][19]. The generalized second law is stated in terms of the initial and final relative entropies between each nonequilibrium state and corresponding canonical states, as well as the Helmholtz free energy of the canonical states. The relative entropy is known as the Kullback-Leibler (KL) divergence [20,21] in information theory. The KL divergence is always non-negative and measures the 'information distance' between two probability distributions; it also serves as a bridge connecting information theory and geometry.
Information geometry [22,23] is geometry on a Riemannian manifold. Probability distributions are the points of the Riemannian manifold. The Riemannian metric is provided by a divergence. The KL divergence furnishes the Fisher information matrix as the Riemannian metric tensor. Since the (KL) divergence is not symmetric in the two probability distributions it is not really a proper mathematical metric. Even so, Amari and Nagaoka [22,23] demonstrated a generalized Pythagorean theorem (GPT) for a 'right-angled triangle' of three probability distributions and introduced projection as an important concept in information geometry.
In this paper we introduce an interpretation of the generalized second law based on information geometry. We need an energy-dimensional 'distance' [24] 3 to measure extractable work from a nonequilibrium state, since 'information distance' measured by the KL divergence is dimensionless. This energy dimensional 'distance,' hereafter referred to as the 'thermodynamic distance,' is the KL divergence scaled by a temperature. Our temperature has an energy dimension as we set Boltzmann's constant k=1 throughout this paper.
The Riemannian metric tensor corresponding to the energy-scaled KL divergence is an energy-scaled Fisher information matrix. The geometric structure based on the energy-scaled Fisher information matrix is completely different from the geometric structure based on the original Fisher information matrix. The validity condition of the GPT based on the scaled KL divergence is an isentropic condition as will be shown in section 3. On the other hand, the validity condition of the GPT based on the bare KL divergence is an isoenergetic condition as will be shown in appendix B. This important difference has not been recognized, since many researchers mainly studied isothermal processes in which the temperature is just a constant [24]. In our approach the temperature is a variable parameter.
In the next section we start by recalling the maximum work formulation of the generalized second law. The generalized second law is universal for both classical and quantum systems. It is valid even for a finite integrable system since it can be derived using only two fundamental properties: the conservation of Gibbs-Shannon (GS) entropy and the non-negativity of the KL divergence. To clarify the role of information theory, we focus on a thermally isolated Hamiltonian system. We exclude any phenomenological assumptions such as detailed balance, course graining, or the existence of a heat reservoir that would produce a canonical ensemble.
We do not have an a priori temperature in a thermally isolated system, nevertheless; an effective temperature characterizing the nonequilibrium state can be introduced. Also, we show that the canonical distribution plays the role of a reference state for an initial nonequilibrium state, as well as a target for the final state that achieves maximum work extraction. This important role of the canonical distribution in a thermally isolated system is unconventional.
In sections 3 and 4, we reconsider the generalized second law based on information geometry. We consider the case of a cyclic operation to make clear the role of the scaled KL divergence in section 3. For a cyclic operation, the work is written only in terms of the scaled KL divergences because there is no change in the Helmholtz free energy. The work is bounded from below by the scaled KL divergence between a nonequilibrium initial state and a canonical state. The canonical state is parameterized by its temperature. The maximum work is determined by minimizing the 'thermodynamic distance.' The GPT for the scaled KL divergences gives orthogonality between the one-parameter line of canonical states and the isentropic surface of state space. The maximum work is given by the canonical state that intersects the surface, with the temperature determined by the isentropic condition 4 . We extend our arguments to the case of a general non-cyclic operation in section 4. We also illustrate a concrete simple example of the GPT for the scaled KL divergences using the adiabatic expansion of the ideal gas at the end of section 4.
In section 5, we apply the geometrical interpretation of the generalized maximum work formulation to a simple two-level quantum system. The optimal cyclic operation to extract work from a nonequilibrium state is determined by minimalizing the scaled KL divergence between the final state and the final canonical state. The geometrical interpretation of the generalized maximum work formulation gives us a systematic method to figure out a protocol to realize the optimal operation.
In appendix B we show that the geometric structure based on the bare KL divergences is different from that based on the scaled KL divergences. The GPT for the bare KL divergences gives orthogonality between the oneparameter line of canonical states and the isoenergetic surface. It also gives as an immediate corollary the principle of maximum entropy.

The generalized second law
We start with a review of the maximum work formulation of the second law generalized to transitions between nonequilibrium states [14][15][16][17]. The generalized second law is valid for both classical and quantum systems. We consider a thermally isolated Hamiltonian system and focus on the period of the operation starting at t=0 and finishing at t=T. The state is described using the probability density, ρ (x, t) (or ρ t in abbreviated form), at time t and phase-space point x, which is either specified or is obtained by dynamical evolution from a previously specified state. The dynamics of the system is governed by its Hamiltonian, H(x, κ(t)) (abbreviated as H t ), that has an explicit time dependence due to the external parameters, κ(t), under our control. Hereafter we abbreviate a function of time A(t) as A t .
The time evolution of the state is written using the time evolution operator U as where U T is defined as where T is the time-ordered product and L is the Liouvillian for the Hamiltonian. We employ the bracket notation to describe the expectation value of an observable A(x) as, where Γ is the phase space of the Hamiltonian system and A* is the (transposed) complex conjugate of A 5 . Thus the expectation value of the Hamiltonian of the system at time t is given by H t t r á ñ | . The work done on this thermally isolated system is given just by the change in the internal energy of the system, i.e., where the internal energy of the system at time t is The canonical distribution of the system with respect to the Hamiltonian and with a parameter α will play an important role: Here, the partition function we are going to rewrite by using its relation to the Helmholtz free energy so that the canonical distribution may be expressed simply as the exponential function We also will employ the Gibbs-Shannon (GS) entropy of the system as where we set the Boltzmann constant k=1 to make the thermodynamic entropy compatible with the dimensionless entropy in information theory.
The cross entropy appearing in information theory is defined as Specifically, the cross entropy between a probability density and a canonical distribution is given as, where we used the property of a normalized distribution, 1 1 r < > = | . Of chief importance for the generalized second law will be the KL divergence between two states, ρ A and ρ B , given by The KL divergence measures the 'distance' between the two states. Its most important property in the derivation of the generalized second law will be its non-negativity: The KL divergence is zero only when ρ A =ρ B . See [20,21,25] for proofs of these facts. The basic quantitative relationship we need to obtain the generalized second law emerges when we take the KL divergence of an arbitrary state, ρ, with respect to a canonical state, ρ can (α). This gives, where we substituted the right-hand-side in equation (12) into the cross entropy in equation (14). The KL divergence of an arbitrary state with a canonical state gives us a sum of terms involving the entropy of the arbitrary state, the free energy of the canonical state and the internal energy of the arbitrary state.
Using the above relation, the internal energy of the state at time t is written as 5 We employ the bracket notation for a quantum system as A A Tr Since the work is given by the change in the internal energy as in equation (4) where we used the conservation of the GS entropy in a thermally isolated Hamiltonian system, i.e., S[ρ T ]=S[ρ 0 ]; and ΔF (α) ≡ F T (α)−F 0 (α) is the change in the Helmholtz free energy of the system. Since there is no heat bath and we do not have an a priori temperature, we call the parameter corresponding to an inverse temperature as α.
The work equality, equation (18), is just a statement of energy accounting when we know the initial and final state of the system as well as the initial and final Hamiltonians. Let us suppose that we are just given a nonequilibrium initial state and have a Hamiltonian with external parameters under our control. Until we choose a schedule for the parameter control of the Hamiltonian the final state is not determined. Due to the nonnegativity of the KL divergence term involving the unknown final state, equation (18) provides us under the condition of a known initial state the following work inequality: where we denote the lower bound for the work as LB  a ( ). First, we note that the inequality (19) is true for any value of α.
which means that it has a global maximum as a function of α as discussed in appendix A. Thus, there exists a best value for α; namely, the value for which LB  a ( ) is a maximum. This is so because the maximum of LB  a ( ) is a value for the work that may be achieved. A lesser value for LB  a ( ) (i.e., a more negative value when considering work extraction) while giving a true inequality would not be an achievable value for the work since this value would not satisfy the inequality of W max. LB   a [ ( )], which must be true. To find the maximum of LB  a ( ) we make its dependence on α explicit by writing the KL divergence, as in equation (16), in terms of the entropy, free energy and internal energy as Now, taking the derivative with respect to α, and using the fact that the derivative of the free energy with respect to temperature is minus the entropy gives us The value of a b =˜where this equals zero we call the effective (inverse) temperature. This gives us the isentropic condition of So, the effective temperature is determined as the temperature of the canonical distribution with respect to the final Hamiltonian that has the same entropy as the initial nonequilibrium state. This condition is entirely reasonable as the dynamics preserves the entropy and the final distribution being canonical means that the maximum energy permissible has been extracted. The isentropic condition is also naturally derived from the GPT in information geometry, which will be discussed in the next section. The generalized second law for a nonequilibrium initial state in a thermally isolated system is thus, where the effective temperature, 1 b -, is determined by the isentropic condition, equation (22).
If the Hamiltonian is changed back to its original form, as in a cyclic process, then ΔF=0 and the work on the system from the initial nonequilibrium state is just In both cases the work extracted from the system is W -.
The KL divergence in the right-hand-side of equation (24) measures the 'information distance' corresponding to the available informational resource. The effective temperature converts the available informational resource to the extractable work. Considering the Kelvin principle in the context of a thermally isolated system leads us to regard a canonical state as an equilibrium state for such a system as well 6 . 6 When the initial state is a canonical state, the effective temperature is the temperature of the canonical state from the isentropic condition.
The divergence in the right-hand-side of equation (24) vanishes so that no work is extractable for any cyclic operation. This argument can be applied to an isothermal system obtained by dividing a thermally isolated total system into the system of interest and a reservoir with a canonical distribution [15]. We note that the microcanonical state is usually regarded as the equilibrium state in a thermally isolated system. As is well known though, the expectation value of any physical observable for a canonical state can be approximated by that of the corresponding microcanonical state in a many-body thermodynamic system in the sense of the law of large numbers.

Reconsideration based on information geometry
As shown in equation (18), the extractable work for a thermally isolated system is given as the difference between two KL divergences scaled by a temperature α −1 (and the change in the Helmholtz free energy). This work equality suggests that there is an information-geometric foundation of the generalized second law. In this section we consider a cyclic operation because this case has a clearer geometric structure than the case of a non-cyclic operation, which we will consider in the next section.
Information geometry considers the geometric structure in parameter space of families of probability distributions [22,23]. A parameter (vector) θ specifies the distribution ρ(θ) and the notion of 'distance' between two distributions is provided by a divergence function D[ρ(θ A ) P ρ(θ B )], which satisfies the following three conditions: .
(2) Uniqueness: (3) Positive-definiteness of the metric tensor g i j , : The geometric structure is determined by the metric tensor appearing in the above local divergence. The KL divergence satisfies the above conditions and its metric tensor is the well-known Fisher information matrix, Now we define the key concept, 'thermodynamic distance.' The 'thermodynamic distance' is the KL divergence between an arbitrary probability distribution and a canonical distribution scaled by the temperature of the canonical distribution, A divergence scaled by a positive parameter satisfies the above three conditions so that the scaled divergence is also a divergence. This scaled KL divergence has energy dimension and measures the extractable work in the generalized second law as a 'thermodynamic distance.' The metric tensor of the scaled KL divergence is the Fisher information matrix scaled by the temperature. The different metric tensor gives us completely different geometric structure 7 as will be shown in this section.
The maximum extractable work for a cyclic process is determined by the following inequality obtained from equation (18), where we used ΔF (α)=0 and the non-negativity of the KL divergence involving the final distribution. In order to find the lower bound of W in equation (26) (i.e., the upper bound for the extractable work), we have to find the value of α that makes the right-hand-side of equation (26) maximum. In the previous section we derived the condition of the effective temperature by differentiating the right-hand-side of equation (26). Now we use the information-geometric structure to derive the same condition. The minimum (or shortest) distance from a point to a plane is obtained by the Pythagorean theorem in elementary geometry. Similarly the minimum divergence from a probability distribution to a curved surface is obtained by the GPT in information geometry. The GPT is based on three divergences. Suppose that a point P is a probability distribution and  is a curved surface which does not include point P in the space of probability distribution. When there exists a point Q within  such that the geodesic line from P to Q is orthogonal to  as illustrated in figure 1, the GPT holds as From the non-negativity of the divergence, we obtain which means that the divergence between P and Q is minimum [22,23]. 7 Although the divergence is not always symmetric, it is dual symmetric in information geometry [22,23] by the scale transformation. The validity conditions of the GPT are based on these dual orthogonalities as will be shown later. The dual variable E -(S) is related to the original β (β −1 ) through the Legendre transformation based on the well-known convex function −β F ( F -) by which the dual variable is given as −E=d The GPT based on three bare KL divergences has been well studied in information geometry [22,23]. The validity condition of the GPT based on three bare KL divergences is the isoenergetic condition. It determines the minimum KL divergence from a probability distribution to a set of canonical distributions (one-parameter geodesic line of canonical distribution). The principle of maximum entropy is derived from the GPT based on three bare KL divergences. We show the details in appendix B.
Since the (information) entropy is dimensionless and the principle of maximum entropy is based on dimensionless KL divergences, we expect that the maximum work formulation is based on energy-dimensional divergences. Our energy-dimensional divergence is equal to (the minus of) the right-hand-side of equation (26) for the pair of the initial probability distribution and the initial canonical distribution. The change of scaled KL divergences is the (dissipative) work in equation (18) for a cyclic process. The minimization of this scaled KL divergence means the maximization of the work in equation (26).
The GPT based on scaled KL divergences holds as the following theorem: -- Substituting the following relation into equation (31), Since equation (26) is valid for any α, the right-hand-side of equation (34) provides the greatest lower bound of W consistent with the isentropic condition (29); hence, its negative gives the maximum extractable work from the system. The geometric image of the GPT based on scaled KL divergences is illustrated in figure 2. The set of canonical distributions is drawn as a line parametrized by the temperature. The (geodesic) line connecting ρ 0 and can,0 r b (˜) on the isentropic surface is orthogonal to the parametric line of canonical distributions in terms of the scaled KL divergence.
Comparing figures 2 and B1 in appendix B, we can see that the bare KL divergences and the energy-scaled KL divergences give us different geometrical structures. The validity condition of the GPT based on the bare KL divergence is an isoenergic condition as illustrated in figure B1 in appendix B. The Riemann metric tensor corresponding to the bare KL divergence is the Fisher information matrix. The orthogonality is based on the dual relation between the inverse temperature and the internal energy in the Fisher information matrix. On the other hand, the validity condition of the GPT based on the energy-scaled KL divergence is an isentropic condition as illustrated in figure 2. The Riemann metric tensor corresponding to the energy-scaled KL divergence is an energy-scaled Fisher information matrix. The orthogonality is based on the dual relation between the temperature and the GS entropy in the energy-scaled Fisher information matrix. This important difference has not been recognized, since many researchers mainly studied isothermal processes in which the temperature is just a constant. In our approach the temperature is a variable parameter.

Reconsideration for a non-cyclic process
The maximum work formulation of the generalized second law for a non-cyclic operation ( F 0 a D ¹ ( ) ) may also be obtained by the GPT based on the scaled KL divergences. The work inequality is written as equation (19), When b is the inverse effective temperature determined by the isentropic condition (22), i.e., the following inequality holds for any α, which means that the inverse effective temperature maximizes the right-hand-side of equation (19).
We can prove equation (35) as follows. First, we rewrite the difference between the two sides of equation (35) in terms of scaled KL divergences as, where we used the relation, and the isentropic condition equation (22). , The GPT (37) can be rearranged to yield the right-hand-side of equation ( A simple example that illustrates the GPT is a thermally isolated ideal gas which is confined to a container with a movable wall. Suppose that initially the volume of gas is V i and the temperature is i 1 b -. The wall is then adiabatically moved, expanding the volume to V f as work is extracted from the gas. This is a non-cyclic process. The GPT for scaled KL divergences in this case is given by equation (37). We can describe the initial state and the final canonical state using a Maxwellian velocity distribution ρ Max as ρ 0 =ρ Max (β i , V i ) and ρ can, where N is the number of particles and p ν and m are momentum and mass of a particle, respectively. The indicator function χ V serves to indicate the configuration space dependence of the distribution and Z is the partition function, The GPT is then written as which holds for the effective temperature b determined by the following condition, The proof is straightforward using equations (41) and (42) as  gives the maximum work as discussed in this section. The condition of effective temperature (44) is the well-known polytropic process equation. Of course it can also be derived from the isentropic condition and the Sackur-Tetrode equation for the entropy of an ideal gas.

Application to a two-level quantum system
We apply the GPT to an optimization problem in nonequilibrium statistical mechanics. From the generalized maximum work formulation, the maximum work is extracted by an operation under which the final state is the canonical state with the final Hamiltonian and the effective temperature. The final canonical state with the effective temperature is the ideally optimal state to extract the maximum work. The ideally optimal state may be difficult to realize experimentally. It is important to figure out what is the 'closest' (minimum divergence) state from the ideally optimal state in a set of realizable final states. The 'closest' state is the experimentally optimal state to extract work that satisfies the orthogonality condition.

A spin-1/2 particle operated by a magnetic field
We consider a thermally isolated two-level quantum system as a simple example [16]. The application of nonequilibrium statistical mechanics to simple, few-degrees-of-freedom quantum systems is a current topic of much interest. Time-dependent two-level quantum systems often appear in the context of matter-field interactions or nuclear magnetic resonance. Such systems have been used to demonstrate the validity of the fluctuation relations of modern nonequilibrium statistical mechanics. [26,27] We consider a spin-1/2 particle embedded in a magnetic field that can be controlled [28]. The Hamiltonian of the system is where σ i (i=x, y, z) are the standard Pauli matrices and we choose the direction of the magnetic field restricted to the x−z plane with f t the rotation-angle of the magnetic field around the y-axis at time t. The Hamiltonian can be rewritten by using rotation operators in spin space, The eigenvalues of the Hamiltonian are E 0 =−ÿω/2 and E 1 =ÿω/2, which we identify as the ground and excited state energies, respectively.
The spin-1/2 particle is externally operated from t=0 to t=T. The operation is represented by the timedependent angle in the Hamiltonian. We choose a cyclic operation, f 0 =0 at t=0 and f T =2π at t=T. The angle increases from 0 to 2πmonotonically and f τ =π at t=τ. The Hamiltonian is proportional to the Pauli matrix σ z at t=0. It is proportional to −σ z at t=τ. When t=T the Hamiltonian returns to its original form at t=0.
We briefly discuss how to extract the maximum work from a simple noneqilibrium initial state such as a pure excited state [14][15][16]. The maximum work can be obtained by the cyclic operation including two processes: (1) The short stabilization process for tä(0, τ) in which the pure excited state becomes the ground state by (2) The restoration process for tä(τ, T) to the original Hamiltonian, H T =H 0 , without any transition to the excited state. These processes are analogous to those that have been discussed on how to extract the maximum work from a noneqilibrium initial state in a thermodynamic system [14][15][16]. The first corresponds to the stabilization process to prevent spontaneous relaxation in a thermodynamic system. For simplicity we consider here the sudden process, τ→0. The second is generally a quasi-static (isentropic) process without dissipation. After the sudden process, the initial state does not change and the Hamiltonian changes as H H The control parameter of our operation is the rate at which the field is rotated. We will find the optimal angular frequency to extract work. The solutions for finite frequencies correspond to dissipationless non-quasi-static processes. The

A vector representation of a quantum state
In this subsection we introduce a vector representation of a quantum state. This subsection prepares for the optimization of the rotation-angle frequency discussed in the next subsection. A quantum state is represented as a point vector in parameter space in information geometry. The orthogonality condition from the GPT is written in terms of these vectors [22,23].
A quantum state is written as a 2×2 density matrix that can be expanded as a linear combination of four Hermitian matrixes: the unit matrix I and the three Pauli matrixes σ i (i=x, y, z). Any state except pure states is written as a positive definite density matrix, where η is a three dimensional real vector, h h h = | |is its unit vector, and , , ). Hereafter, we call the above representation based on a linear combination of four Hermitian matrixes as the M-representation.
The parameter vector η naturally appears in an exponential-type representation (E-representation), The equivalence between two representations is confirmed by the Euler-like formula, We write the nonequilibrium initial state at time t=0 as We choose 0, sin , cos ) for the nonequilibrium initial state. From the condition 2 j p < (| | ), the initial energy E 0 =Tr[ρ 0 H 0 ] is positive so that work is extractable. We chose the x component 0 for simplicity. We may be able to adjust x 0, ĥ by an additional operation around the y-axis.
From the solution of the Schrödinger equation, the time evolution of the density matrix is given, Tr . 58 We used the orthogonality of Pauli matrixes, Tr[σ i σ j ]=2δ i, j . The norm t h | | is constant in time under the unitary time evolution, Tr . 59 The final state at time t T = is written as We note that the time period T may also be considered as the control parameter in our operation, since the angular frequency Ω=π/T. We will discuss the optimal final state to extract work in the next subsection.
The canonical state at time t (tä(0, T]) with the inverse of the effective temperature b is written in E-representation as The KL divergence may be divided into the sum of the negative entropy and the cross entropy, where the cross entropy is After substituting the above sum into Δ, T TT where we used S S , ]. Similar to the entropy, the cross entropy between state A and state B is calculated by using both E-and M-representations, and 0, 0, 1 From the nonequilibrium initial state, the canonical state with the effective temperature is the closest 'point' in 'thermodynamic distance.' Since our new concept of 'thermodynamic distance' has an important role in nonequilibrium statistical mechanics, it can also be called as thermodynamic divergence. One important recognition in this paper is that the geometric structure based on the KL divergence scaled by a variable temperature is different from that based on the bare KL divergence. This geometric difference does not play any role for a constant temperature in an isothermal process. For a variable temperature, the validity condition of the GPT based on the energy-scaled KL divergences is an isentropic condition. On the other hand, the validity condition of the GPT based on the bare KL divergences is an isoenergetic condition.
The geometrical interpretation of the generalized maximum work formulation gives us a systematic method to figure out a protocol to realize the optimal operation. In this paper we applied the geometrical interpretation of the generalized maximum work formulation to a simple two-level quantum system. The optimal cyclic operation to extract work from a nonequilibrium state was determined by minimalizing the scaled KL divergence between the final state and the final canonical state.
In this paper we discussed only adiabatic processes in thermally isolated Hamiltonian systems in order to highlight the information-geometrical foundation of the thermodynamics. Although we excluded any phenomenological assumptions such as the existence of a heat reservoir that would produce a canonical ensemble, we would need high and low temperature heat reservoirs to consider a heat engine [15]. Since the information geometry is flexibly applicable for any parametrized probability distributions, the scaled KL divergence would play important roles in the heat engine. One may consider scaling of the KL divergence by a quantity with power dimension, such as temperature divided by time. Then a GPT based on this new scaled KL divergence would give us important knowledge for optimizing the power. We will discuss this problem in the near future.  Acknowledgments TN deeply appreciates A and R Nakamura for their great helps. HHH would like to thank N Hasegawa, L Reichl and T Washio for their supports. TN and HHH are grateful to S Amari for his suggestions and encouragement. DJD appreciates the suggestions of E Karpov, V Basios and J Lutsko. This work was performed under the Cooperative Research Program of 'Network Joint Research Center for Materials and Devices.' This work was also supported by Hitachi Automotive Systems, Ltd.
Appendix A. Concavity of LB  with respect to the temperature To determine the effective inverse temperature we use the fact that LB  a ( ) has one global maximum as a function of α. This is most easily seen from the concavity of LB  as a function of the temperature, α −1 . To show this we use the facts (known from thermodynamics and also holding for a canonical distribution) that the derivative of the Helmholtz free energy with respect to temperature is minus the entropy and the derivative of the entropy with respect to temperature is positive.
We have,