Improvement of Error Correction in Nonequilibrium Information Dynamics

Errors are inevitable in information processing and transfer. While error correction is widely studied in engineering, the underlying physics is not fully understood. Due to the complexity and energy exchange involved, information transmission should be considered as a nonequilibrium process. In this study, we investigate the effects of nonequilibrium dynamics on error correction using a memoryless channel model. Our findings suggest that error correction improves as nonequilibrium increases, and the thermodynamic cost can be utilized to improve the correction quality. Our results inspire new approaches to error correction that incorporate nonequilibrium dynamics and thermodynamics, and highlight the importance of the nonequilibrium effects in error correction design, particularly in biological systems.


Introduction
In information processing and transfer, the transmission of messages through communication channels is often impaired by unwanted noise. As a result, it is inevitable that the useful information carried by these messages will experience some loss during transmission. The channel coding principle, developed by R. W. Hamming [1], C. E. Shannon [2], R. G. Gallager [3], and A. J. Viterbi [4] states that no matter how we encode the messages, it is impossible to transmit them at a rate above the information capacity with zero error probability through noisy channels. Therefore, to reduce errors, we must either decrease the transmission rate or improve the quality of the channels. Error correction coding methods have been developed based on these theories to minimize the loss of useful information at the information receiver end.
In biological systems, error correction is a crucial process that plays an essential role in maintaining the fidelity and accuracy of biological information transfer [5][6][7]. For example, in DNA replication, error correction mechanisms help to identify and correct errors that occur during the copying of DNA molecules, while in protein synthesis, error correction mechanisms ensure that the correct amino acids are incorporated into the growing polypeptide chain, minimizing the occurrence of misfolded or non-functional proteins. Therefore, investigating error correction from the perspective of statistical physics and biophysics is important in understanding the underlying principles of these processes.
In information theory, error correction methodologies are widely used to ensure the quality of information transmission in noisy communication channels [8][9][10][11][12][13][14][15]. While these techniques are often studied in an engineering context [16][17][18][19][20][21], the underlying physics is not always clear. Information processing and communication are not isolated events but should be considered as open systems that exchange energy and information with their environments, which can have a significant impact. The complexity and stochasticity of these processes, along with environmental influences, make information transmission a nonequilibrium process. In this context, different symbols in messages can be distorted by channels with varying error probabilities during transmission, resulting in nonequilibrium behavior of the received messages in channel models. Understanding the relationship between the error correction in decoding and the nonequilibrium in transmission is crucial, but there is still a lack of studies exploring this connection. The memoryless channel model is a critical component in the fields of information and communication theories [22,23]. This study focuses on investigating the impact of nonequilibrium dynamics and thermodynamics on error corrections in the memoryless channel model. We establish the Markovian information dynamics [24][25][26] of this channel with block inputs and outputs and quantify the nonequilibrium of the memoryless channel models by calculating the difference between their transmission probabilities. This nonequilibrium measures the strength of the information flux or nonequilibrium information driving force in information transmission. We prove that the performance of error correction at the decoder can be analytically expressed as a function of the nonequilibrium of these models. Our analysis reveals that error correction performance, characterized by the upper bound of the error probability, is a convex function of the nonequilibrium in information transmission, with the unique maximum error probability being 1 at the equilibrium point of the information dynamics. Additionally, the upper bound monotonically decreases as the nonequilibrium increases. Moreover, we found that the dissipation cost in information transmission, characterized by the entropy production rate (EPR) [27][28][29][30][31][32][33][34][35][36][37][38][39][40], increases as the nonequilibrium increases. We discovered that the EPR can be used to measure the reliability of the information transmission, indicating that the thermodynamic cost can be utilized to enhance the quality of information transmission and reduce the decoding error probability. Our findings inspire us to develop novel approaches for error correction based on nonequilibrium dynamics and thermodynamics. We tested our conclusions using a simple binary memoryless channel model, and the numerical results supported our findings. Finally, we discuss the potential applications of our conclusions in biological systems.

Memoryless Channel Model and Block Encoding
We start with the memoryless channel model to explore error correction. The basic idea behind any error correction strategy is to encode information redundantly within a larger system in order to mitigate the impact of errors and prevent information loss.
An information sender transmits random symbols s which form a set S. To reduce the error during the transmission in the noisy channel, each symbol s can be encoded by an encoder at the sender end into a block code of length N, X s = {x 1 , x 2 , . . . , x N } consisting of the letters x = 0, 1, .., n − 1 (n is the total number of x). This is known as the fixed block encoding strategy [2]. The set of all the input block codes X s is denoted by X . However, due to the noisy channels, the receiver may receive a noisy output block Y = {y 1 , y 2 , . . . , y N } from the channel with output letters y = 0, 1, .., n − 1. The set of all the output block codes Y is denoted by Y. In this process, the rate of transmission is then determined by the length of each block code, i.e., R = 1/N.
For general memoryless channels, we assume that the channel corrupts the symbols s independently according to different noise distributions, respectively. Determined by the noises, the random mapping between the input letters x in the input block and the output letters y in the output block can be described by the transmission probabilities, q(y|x), which quantify the conditional probabilities that one receives the letter y when the letter x is sent. Since the channel is memoryless, the total transmission probability of the output block Y when the input block X s (s ∈ S) with length of N is sent can be written as In fact, the input and output letters in most general channels are continuously valued. However, the essence of the information transfer problem is that it is essentially discrete. Therefore, we will concentrate on discrete memoryless channels in this work, without the loss of generality.

Error Correction Decoding
In general, if we let the length of the block codes X s be some number N, then there exist n N possible block codes (n is the total number of the letters x), which can be assigned to the symbols s at the encoder. To decode the received noisy blocks correctly, the assignment of X s to each s should be consistent with the a priori knowledge of the noise or transmission probability distributions q(y|x). Then, we have to set up a strategy, the so-called "error correction", to map the noisy output blocks Y back into the space of symbols of the information source S.
A simple decoding rule is to choose the decoding result s for which This means to choose a symbol s from the set S that maximizes the transmission probability or likelihood q N (Y|X s ) (given in Equation (1)) of the received block corresponding to the original sent block rather than other ones s = s. According to the decoding rule given in Equation (2), the block codes X s should be carefully chosen such that the space of the output blocks Y can be divided into M (M is the total number of the symbols) mutually disjointing decoding subsets Y s corresponding to each transmitted symbol s, i.e., Due to the decoding rule in Equation (2), the transmission probability of each Y ∈ Y s corresponding to X s , given by Equation (1), should be greater than that of Y conditioning on another input block X s for s = s. The reason behind this approach is straightforward: by selecting each input block code X s in a manner that is consistent with the a priori knowledge of the transmission probabilities, we can decode any output block Y easily. Specifically, decoding Y as belonging to Y s yields symbol s, while decoding Y as belonging to Y s yields symbol s . Any other decoding scheme, such as decoding Y ∈ Y s as s, is more prone to errors.
As an example, we consider the binary memoryless channel. The transmitted symbols are given as s = a, b and the encoding letters given as x = 0, 1. Assume that the transmission probabilities satisfy the following conditions By using the block codes of length 3, we can encode the symbol a into the block X a = 000 and b into the block X b = 111. We then have the set of the input block codes X = {000, 111} (N = 3) which can be sent into the channel. This encoding method guarantees that the output blocks is separated into two disjoint subsets.
Thus, according to the decoding rule of maximizing the transmission probability, we simply decode the output blocks within the decoding set Y a = {000, 001, 010, 100} into a and decode the blocks in the decoding set Y b = {011, 101, 110, 111} into b. If we make the block length N larger (the rate of transmission R = 1/N then becomes smaller), then the law of large number works and we can make the transmission more accurately with less error probability [22].

Random Encoding and Decoding Strategy
Although the fixed block encoding strategies can be effective in numerous scenarios, it is important to acknowledge that designing the fixed block codes can be challenging. Alternatively, the random encoding strategies may be more convenient in many cases, and they can achieve similar performance compared to the fixed block encoding [1].
For the random encoding strategies, we can encode the original symbols s with the random input blocks X s according to an input distribution Q N (X), and still decode the output block Y according to Equations (2) and (3). Here, the random choice of X s can be implemented by randomly choosing the letters x in a block independently according to an identical input distribution Q(x) such that the probability of a block X s = {x 1 , x 2 , . . . , x N } can be given by We now explain how to encode and decode the information by using the random encoding strategy for the binary memoryless channel in the last subsection. If we use the random encoding method instead in this example with the same settings in the above, and if the input letters x = 0, 1 are chosen randomly with equal probability, i.e., Q(x = 0) = Q(x = 1) = 1/2, then all the possible input blocks can be chosen from the input set X = {000, 001, 010, 011, 100, 101, 110, 111} with the same probability (1/2) 3 = 1/8. A possible assignment of the blocks for each symbols can be X a = 000 and X b = 111, with the same output decoding sets shown in the above. Alternatively, another possible assignment can be X a = 011 and X b = 001. Consequentially, the output decoding set for the symbol a turns out to be Y a = {010, 011, 110, 111}; and the decoding set for the symbol b becomes Y b = {000, 001, 100, 101}.

Performance of Error Correction
Although we can reduce data transmission errors by elaborating the encoder or increasing the block length N, there is still a risk of decoding errors, if the input block X s is corrupted and transformed into an output block Y that does not belong to the intended subset Y s , but instead belongs to another subset Y s (s = s). In this case, the block Y, which was originally associated with symbol s, may be mistakenly decoded into s . Therefore, it is necessary to estimate the error probabilities of decoding in order to quantify the performance of error correction.
In general, the error probability is a non-trivial function that requires sophisticated methods to calculate accurately. However, it may be more practical to estimate a simple upper bound of the error probability, which is often sufficiently close to the true value. Reducing this upper bound on the error probability is likely to improve error correction performance.
By randomly selecting code blocks X s of length N in accordance with the input distribution outlined in Equation (4) and applying the decoding rules specified in Equations (2) and (3), we can determine the upper bound of the average error probability associated with decoding any transmitted symbol s as a symbol s that differs from s. An upper bound is given by [19] with , for memoryless channels (6) where M ≥ 2 is the total number of the possible transmitted symbols s; Q N (X) = ∏ i Q(x i ) is the arbitrary input distribution of all the possible input block codes X as described in Equation (4); q N (y|x) is the transmission probabilities of the channel with respect to the block codes. If the channel is memoryless, then q N (y|x) can be reduced into the products of the transmission probabilities with respect to the letters in a block as shown in Equation (1). This gives rise to the second equality in Equation (6) for the memoryless channels on average. Here, "average" means to estimate the error probability over the ensemble of the block codes X independently of any concrete assignment of the blocks for the symbols.
When the parameter ρ is chosen carefully, the bound in Equation (5) can be surprisingly close to the true value. The importance of this inequality is that it can be used as an estimate of the highest error probability or the worst performance of error correction under a given condition. A smaller upper bound always means better performance.
It can be seen that the upper bound in Equation (5) depends on both the input distribution Q(x) and the transmission probabilities q N (Y|X s ) or q(y|x) of the channel. In practice, this observation provides two alternative methods to reduce the upper bound of error probability and to improve the performance of error correction. Note that q(y|x) is the characterization of the memoryless channel and it mainly depends on the environments of the channel. For the information channels which are too large to control, such as the classical communication systems, the related studies mainly focus on the optimization of the input distribution Q(x) to reduce the upper bound of the averaged error probabilities for a better performance of the error correction. However, for the small systems where the channels are small enough to control, the input distributions may almost totally depend on the systems themselves, such as the DNA in a cell and the neural network in the brain. It is then possible to optimize the transmission probabilities q(y|x) through the control on the channel via either genetic and epigenetic modulations for better information transmission, or, say, for improving the reliability of the error correction in the systems.

Concavity of Upper Bound of Error Correction
For the purpose of the transmission probability or channel control, we will prove in the following that the worst performance of the error correction characterized by the upper bound in Equation (5) is a concave function of the transmission probability distribution q N (Y|X), given a fixed block length N, input distribution Q N (X), and parameter ρ. This finding underscores the worst-case performance of error correction, highlighting the need to optimize the transmission probabilities to enhance error correction.
We first show that all the possible transmission probability distributions q N form a convex set Q N = {q N : q N (Y|X) ≥ 0, ∑ Y q N (Y|X) = 1}. Here, the constraints on Q N are consistent with the nonnegativity and normalization of the conditional probability distribution. To see the convexity of Q N , we choose arbitrarily two transmission probability N ∈ Q N . Consider the convex combination at each output Y in the following form One can easily check that both the constraints, nonnegativity and normalization, are satis- N is also a transmission probability distribution, which is contained in Q N . Then, the convexity of Q N can be verified directly by using the definition of the convex set.
Then, we introduce the function which can be recognized as the 1 ρ norm of the function q at each Y (the norm sums up the X in q , and means 'error'). One can show that the function U in the upper bound in Equation (5) can be rewritten as It should be noted that, with fixed N, Q N (X), and ρ, the Minkowski inequality of the norms for ρ ∈ [1, 2] guarantees the concavity of the function U on the convex set Q N . By using the definition of the concave functions [41], one can show where the Minkowski inequality of the norms for ρ ∈ [1, 2] is used in the second line.
Since the prefactor (M − 1) ρ−1 in Equation (5) is a positive constant (if ρ is fixed), then the upper bound of the error probability is always a concave function of the transmission probability distribution q N . Note that in the present work we use the conventional rules that the concave function is the concave down function with a negative second derivative and the convex function is the concave up function with a positive second derivative. Furthermore, the upper bound achieves the unique maximum (M − 1) ρ−1 when q N (Y|X s ) = q N (Y|X s ) for all s = s. Consequentially, we have that P N (Y) = q N (Y|X) for all s. This result has an intuitive explanation: the output Y is totally independent of the input X s and hence is independent of the transmitted symbol s, so that arbitrary s cannot be distinguished from another s at the decoder. This leads to the worst performance of the error correction where the error occurs almost with probability 1 no matter which input distribution Q(x) is chosen.
For the memoryless channels, the upper bound of the error probability can be given by the second equality in Equation (6), 1 ρ ρ can be regarded as the function U for the blocks of length N = 1, and q(y|x) is the transmission probability of the channel for the letters in the blocks. Thus, u is concave on the set of q(y|x), denoted by Q, and Q is a convex set, according to the proof in Equation (7). Then, the worst performance of a memoryless channel can happen with error probability 1 when q(y|x) = q(y|x ) for all x = x.
In summary, we have demonstrated the concavity of the upper bound of error correction. Note that the environments of the channel are inherently complex. The nonequilibrium that arises from this complexity can influence both the information transmission and the error correction significantly. With the perspective of the nonequilibrium information dynamics and thermodynamics, we will investigate the relationship between the performance of the error correction and the channel-induced nonequilibrium in the following sections. Specifically, we will show that the worst performance of the error correction, where the error probability is almost 1 and occurs in the equilibrium state of the memoryless channel, where the output is independent of the input completely.

Information Dynamics of Memoryless Channel
To introduce a dynamical description to the memoryless channel model, it is reasonable to assume that the channel needs time to deal with the codes, i.e., the sender transfers the blocks X t−1 at time t − 1 and the receiver gets the outputs Y t at time t, where the processing time is assumed to be unit. On the other hand, if the random encoding method is employed, the correspondence between the block code X and the transmitted symbol s is not necessarily one to one. To simplify the discussion without losing the generality, we will not delve into a detailed discussion on assignments of the block codes for the transmitted symbols. Instead, we assume that the input blocks X are independently generated according to an identical input distribution Q N (X), where N is the length of the block codes. In addition, each letters x within an input block can be independently chosen based on an identical input distribution Q(x). The relationship between Q N (X) and Q(x) is provided in Equation (4).
In the context of information dynamics, we are interested in understanding how the input and output of the channel evolve in time together. We use the notation Z t = (X t , Y t ), which is the composition of the input block X t and output block Y t , to represent the information state of the channel model. Due to the dynamical description in the above, the input X t , generated following the distribution Q N (X t ), is independent of both X t−1 and Y t , and the output Y t merely depends on the previous input X t−1 according to the transmission probability q N (Y t |X t−1 ). In addition, both X t and Y t are independent of Y t−1 . Therefore, the correlation between the information state Z t−1 = (X t−1 , Y t−1 ) and the successive state Z t = (X t , Y t ) in time follows a Markovian connection. The transition probability from Z t−1 to Z t can be given by With this transition probability, the evolution of the distribution P(Z t ) follows the Markovian information dynamics as: It is recognized that P( as the stationary distribution of the information state, because P(Z) remains unchanged in the time evolution, i.e., P(Z ) = ∑ Z P(Z |Z)P(Z). It is noteworthy that, due to the memoryless nature of the channel, the information dynamics can be governed in terms of how the input and output letters, x t and y t , evolve in time. Both x t and y t can be regarded as the block codes of length N = 1. The transition probability in Equation (8) for the block can be reduced into the transition probability for the letter. By introducing the information state for the letter z t = (x t , y t ), the transition probability for the letter z t is given by P(z t |z t−1 ) = Q(x t )q(y t |x t−1 ). (10) Consequentially, the Markovian information dynamics for the letter can be derived from the information dynamics for the block in Equation (9): Then, P(z) = P(z t→∞ ) = Q(x)P(y) with P(y) = ∑ x Q(x)q(y|x) is the stationary distribution of the information state z.
The transition probability P(Z t |Z t−1 ) in Equation (8) is recognized as the information driving force [24][25][26] behind the information dynamics for the block because P(Z t |Z t−1 ) transforms an information state Z t−1 to another state Z t in time and determines the stationary distribution P(Z). For the same reason, the transition probability P(z t |z t−1 ) in Equation (11) can be treated as the information driving force for the letter or block length N = 1.

Characterization of Nonequilibrium
Due to the exchange of energy and information between a system and its environment, the classical information channels should be considered as a nonequilibrium system. In particular, in ref. [42], a classical measurement model was developed to explore nonequilibrium phenomena of the information dynamics. It is shown that the Markov dynamics governing sequential measurements is governed by an information driving force that can be decomposed into two components: an equilibrium component that maintains time reversibility and a nonequilibrium component that violates time reversibility. In this work, we will examine the information dynamics of noise channels using this nonequilibrium framework.
The nonequilibrium nature is reflected in the time-irreversible behavior of the information dynamics. The time irreversibility means that the probability of a time sequence of the information state Ω = {Z 1 , Z 2 , . . . , Z t } = {(X 1 , Y 1 ), (X 2 , Y 2 ), . . . , (X t , Y t )} is different from the probability of the corresponding time-reversal sequence Ω = {Z t , Z t−1 , . . . , In the Markovian information dynamics given by Equation (9), the output Y t only depends on the previous input X t−1 in the forward in time transition. On the other hand Y t is only determined by the "future" input X t+1 in the backward in time transition. Furthermore, X t together with Y t−1 and Y t+1 has no influence on Y t in both time directions, and can be neglected in both transitions. Therefore, the time-irreversibility can be characterized by the time-irreversible information flux, which is defined as the difference between the probability of the transition from X t−1 to Y t forward in time and the probability from X t+1 to Y t backward in time.
By using Equation (9), we can define the information flux as with Here, the blocks X t , Y t−1 , and Y t+1 are summed away from the probabilities because they are not important to the quantification of the time-irreversibility. We see that when the input distribution Q N (X) or Q(x) is given, the information flux J in Equation (12) merely depends on the difference between the transmission probabilities of the channel which is quantified by d in Equation (13). Then, d describes the strength of the information flux in this situation. In particular, for the letter case where block length N = 1, the information flux with the strength d can be given by the transmission probability q(y|x) according to Equations (12) and (13):

Nonequilibrium Decomposition for Transmission Probability
The information flux J can be used to characterize the time-irreversibility because it is originated from the nonequilibrium of the transition probability or information driving force P(Z t |Z t−1 ) given by Equation (8). To show this point more explicitly, we firstly decompose P(Z t |Z t−1 ) into two parts [23,28], with where d is given by Equation (13). Then, from the Markovian nature of the information dynamics, the probability of a time sequence Ω = {Z 1 , Z 2 , . . . , Z t } and that of the corresponding time-reversal sequence Ω = {Z t , Z t−1 , . . . , Z 1 } can be given by Equation (9) as follows, Now, we can discuss the relation between the time reversibility and the driving force of the information dynamics. When P(Ω) is equal to P( Ω) for arbitrary possible time sequence Z, the information dynamics is under equilibrium state so that the timereversal symmetry of each time sequence is preserved and the dynamics is time-reversible. From Equations (12) and (13), we can observe that P(Ω) = P( Ω) for arbitrary possible time sequence Z is satisfied if and only if all the possible P d ( ) holds for arbitrary Y t , X t−1 and X t+1 . In this situation, P m (Z i−1 , Z i , Z i+1 ) does not depend on any input block X. One can denote P m (Z i−1 , Z i , Z i+1 ) = P N (Y i ) and P(Ω) = P( Ω) = ∏ t i=1 Q N (X i )P N (Y i ). Therefore, P m preserves the time-reversal symmetry and the "equilibrium" of the information dynamics. Then, P m is recognized as the equilibrium driving force in the information dynamics.
If P d (Z i−1 , Z i , Z i+1 ) = 0, the information dynamics is time-irreversible and the channel is under nonequilibrium state. This is to say that the driving force P d breaks the time-reversal symmetry of the dynamics, and drives the dynamics away from the equilibrium state. Then, P d is recognized as the nonequilibrium driving force of the information dynamics.
Combining Equation (12) with Equation (16), the information flux can be rewritten as This equation shows the relation between the information flux J and the nonequilibrium driving force P d . Therefore, the information flux can also be used to characterize the time-irreversibility. On the other hand, the information flux J can be treated as the nonequilibrium driving force since it shares the same factor d with the nonequilibrium driving force P d as shown in Equation (12). For this reason, the strength of the information flux J also works as the strength of the nonequilibrium driving force (or simply as the nonequilibrium strength). According to the discussion in the above, it can be seen that the information dynamics in Equation (9) is under equilibrium state if and only if d = 0 for arbitrary input and output blocks X and Y. This leads to the detailed balance or equilibrium condition of the information dynamics, If the detailed balance condition is violated, the dynamics is under nonequilibrium state. Then, the absolute value |d| characterizes the degree of the detailed balance breaking and the nonequilibrium of the dynamics. On the other hand, the entity m Y t (X t−1 , X t+1 ) given by Equation (16) works as the strength of the equilibrium driving force P m (or simply as the equilibrium strength). Due to the time-reversal symmetry of m, it preserves the detailed balance condition in Equation (18). Furthermore, both m and d merely depend on the transmission probability of the channel. Thus whether the information dynamics in equilibrium or not is totally determined by the channel itself.
There is a deep connection between the nonequilibrium in information transmission and the performance of the error correction. As a glimpse of this connection, we can see that the error correction given achieves the worst performance, which means that the error occurs with probability 1, if and only if the information dynamics achieves the equilibrium condition in Equation (18). This is because the input X and output Y are totally independent of each other (q N (Y|X) = q N (Y|X )), and there is no useful information transferred during the information dynamics. Then, arbitrary two transmitted symbols s and s cannot be distinguished by the decoder anymore. Otherwise, the nonequilibrium characterized by the non-vanishing nonequilibrium strength d can improve the performance of the error correction, which will be discussed in detail in the following.

Nonequilibrium Decomposition for Transmission Probability
There is a relation between the information driving force P(Z t |Z t−1 ) and the transmission probability q N (Y|X), which is given by Equation (8). The decomposition of information driving force P(Z t |Z t−1 ) in Equations (15) and (16) suggests a decomposition of transmission probability q N (Y|X) in the form of Here, the two parts m and d given in Equation (20) correspond to the equilibrium and nonequilibrium strengths in the nonequilibrium information dynamics, respectively.
In particular, the nonequilibrium decomposition for q N (Y|X) can be also applied to the transmission probability q(y|x) for the letters. The explicit form is then given by Due to the decomposition in Equations (19) and (20), both m and d can change independently within the properly given ranges of the constraints. Consequentially, q N (Y|X) can be changed by altering m and d. This can improve the performance of the error correction as discussed in the above. Due to the nonnegativity and normalization of the transmission probabilities q(x|s), the constraints on m and d can be given as follows, The second constraint is originated from nonnegativity of the transmission probability It is important to observe that when the equilibrium strength m is fixed at each block code X and Y, the nonequilibrium strength d can form a convex set.
The convexity of this set can be easily observed as follows. As shown in the proof of the concavity of the upper bound in Equation (7), the transmission probabilities q N (Y|X) = m Y (X, X ) + d Y (X, X ) form a convex set Q N . If m is fixed in q N (Y|X), then d becomes the affine transform of q N (Y|X) in the form of d Y (X, X ) = q N (Y|X) − m Y (X, X ) (here m Y (X, X ) becomes a constant), which transforms the elements in Q N into D N . Since Q N is a convex set and the affine transform always preserves the convexity [41], then D N is also a convex set.
Although there exist large numbers of the decompositions of the transmission probabilities q N (Y|X) in the form of Equation (19) corresponding to different block lengths N, the decomposition in Equation (21), which corresponds to the letters or the block length of N = 1, gives a fundamental description of this nonequilibrium information dynamics. This is because not only every input block code is composed of the letters x which are distorted independently into the output letters y, but also the detailed balance condition (Equation (18)) for the information dynamics at block lengths of N (Equation (9)) is determined by the detailed balance condition for the information dynamics at block lengths of N = 1 (Equation (11)). This can be seen from Equation (1) that q N (Y|X) = q N (Y|X ) for arbitrary input block X = X if and only if every q(y|x) = q(y|x ) or d y (x, x ) = 0 for all x = x, which is exactly the detailed balance condition for the block length of N = 1. In other words, if the information dynamics for letters in Equation (11) achieves the equilibrium, then every information dynamics for the block codes achieves the equilibrium. Furthermore, the absolute value of the nonequilibrium strength |d| for the letters given in Equation (22) characterizes the degree of the nonequilibrium of the memoryless channel model.

Nonequilibrium Information Thermodynamics and Performance of Error Correction Entropy Production Rate Guarantees Reliability of Information Transmission
It is well known that the nonequilibrium in a system can give rise to thermodynamic dissipation cost from the net input energy, matter, or information. The dissipation cost can be properly quantified by the entropy production rate (EPR). It is defined as the log ratio between the probability of the time sequence Ω = {Z 1 , Z 2 , . . . , Z t } and that of its time-reversal sequence Ω = {Z t , Z t−1 , . . . , Z 1 }, by taking the average over all the possible sequences and over time [28], Here, the process is supposed to be stationary and ergodic. Mathematically, the EPR is in the form of the relative entropy D KL (P(Ω)||P( Ω)) between the distributions P(Ω) and P( Ω). Thus, EPR is always nonnegative due to the nonnegativity of the relative entropy. In particular, EPR = 0 if and only if the equality P(Ω) = P( Ω) holds for all the possible Ω and Ω. This indicates that the system dynamics is time-reversible. Otherwise, the system dynamics is time-irreversible for EPR > 0 [27,36].
Physically, the term r = 1 t log quantifies the rate of the stochastic entropy change in both the system and the environments [28,37]. The stochastic entropy change r fluctuates along with the time sequences and can be either positive or negative. However, the average of r over the ensemble of the system, which is the EPR exactly, follows the fluctuation theorem and becomes nonnegative. This can give rise to the second thermodynamic law that the (averaged) entropy change in both the system and the environments can never decrease. The vanishing EPR indicates that the system is at the equilibrium state and there is no net exchange between the system and the environments on average. Otherwise, the system is at nonequilibrium state with an inevitable thermodynamic dissipation from the system to the environments quantified by nonzero EPR. We are interested in the EPR because it is not only the quantification of the thermodynamic dissipation cost but is also a proper description of the reliability of the information transmission. To see this, we evaluate the EPR of the information dynamics in Equation (9) by substituting the probabilities of the time sequence Ω and its time-reversal sequence Ω in Equation (17) into the expression of the EPR in Equation (24). This yields [25,26], The first equality in Equation (25) expresses the EPR by using the micro states with the perspective of the nonequilibrium information dynamics. Note that the information flux J is provided by Equation (12). One can define l that quantifies the detailed information loss during the transition in the channel as follows This term l is in the first equality in Equation (25). It can also be given by the information difference as l = I(X, Y) − I(X , Y). Here, the information I refers to the reduction in the uncertainty of the output block Y caused by the input block X, quantified by the difference between the prior uncertainty − log P N (Y) and the posterior uncertainty − log q N (Y|X) as Intuitively, if the input X reduces more uncertainty in the output Y compared to another input X , then Y carries more information of X than X , and consequentially the transmission probability q N (Y|X) > q N (Y|X ). For no loss of the generality, let us consider the following case. It is possible that the input blocks at time i − 1 and at time i + 1 can be X and X = X, respectively, but these two inputs can appear at i − 1 and at i + 1 in a time-reversed order. Thus {X i −1 = X , X i +1 = X} can be regarded as the time-reversal of the sequence {X i−1 = X, X i+1 = X }. However, due to the randomness of the channel, the same output Y can be originated both from X and X , and can be received at time i and i , respectively. Assume that Y carries more information of X than X . When Y appears to be from X but is actually from X , then the information of X is lost with the quantification l = I(X, Y) − I(X , Y) > 0. Here, the nonnegativity of information loss can be guaranteed by the symmetry in the EPR, because l > 0 while the information flux J > 0; otherwise, l < 0 at J < 0. Thus, either l < 0 or l > 0 can be regarded as the positive information loss.
Driven by the non-vanishing information flux J = 0 under nonequilibrium, the information loss or dissipation can be positive. Otherwise, the vanishing information flux J leads to zero information dissipation under equilibrium. Since the term l is the difference in logarithm of the probability, one can regard the information loss as the analog of voltage (potential related to the population) difference while the information flux as the current in electronic circuit. The entropy production rate is then the analog of the power generated or cost in the electric circuit. This gives the physical interpretation of the first equality in Equation (25).
The second equality in Equation (25) expresses the EPR as a quantification of the reliability of the information transmission, with the nonequilibrium strength of the information flux and the information driving force d provided in Equations (19) and (20). Note that the time in the subscript of every block in the first equality in Equation (25) is no longer important in the second equality any more.
To see the EPR as the quantification of reliability, an easy-to-understand explanation is that if the peaks of two transmission probability distributions q N , conditioning on the inputs X and X , respectively, do not coincide with the same outputs. Then, the corresponding symbols are transmitted with less random interference in the channel, and the decoder can more easily distinguish between symbols. Conversely, if the peak positions of two transmission probability distributions overlap, it is difficult to decode the corresponding symbols correctly. The distance between two transmission probability distributions can be used to measure the difference between their peak positions, and the EPR is an appropriate tool for this measurement.
The third equality in Equation (25) expresses the EPR or the reliability of the information transmission in terms of the nonequilibrium decomposition of the transmission probability q(y|x) for the letters given by Equations (21) and (22). This expression arises from the memoryless nature of the channel, and then the nonequilibrium of the transmission probability q(y|x) becomes significantly important to the information dynamics. Consequentially, the nonequilibrium part d y (x, x ) in q(y|x) determines the nonequilibrium of the dynamics. As a function of the nonequilibrium strength d, the EPR is convex on the convex set D of d, with the equilibrium strength m and the input distribution Q(x) being fixed. The convexity of D has been proved in Section 3.3 (D = D N for N = 1). Now, let us prove the convexity of the EPR in the following way.
The convexity shown in Equation (6) indicates that larger EPR values correspond to larger nonequilibrium strengths or distances between transmission probability distributions. Thus, improving the condition of transmission can be achieved by increasing the energy cost (EPR) of information transmission, or by enlarging the nonequilibrium strength of the channel. It is noteworthy that, when considering a fixed equilibrium strength m and input distribution Q(x), the EPR achieves the unique minimum 0 if and only if the information dynamics in Equation (10) reaches equilibrium, i.e., d = 0. This corresponds to the worst condition of transmission that all the symbols cannot be distinguishable at the decoder.

Nonequilibrium Dissipation Cost Improves Performance of Error Correction
The nonequilibrium energy cost quantified by EPR (Equation (25)) can improve information transmission by reducing the interference between symbols, and then can improve the error correction performance.
To see this, we note that the upper bound of the decoding error probability in Equation (5) quantifies the worst performance of the error correction. Since the nonequilibrium of the transmission probability q(y|x) is fundamental to the information dynamics, we use the nonequilibrium decomposition for q(y|x) Equations (21) and (22) and the upper bound for the memoryless channel in the second equality of Equation (6) We have proven in Equation (7) that u is a concave function of q(y|x) with fixed equilibrium strength m, input distribution Q(x), and parameter ρ. Since the correspondence between q(y|x) and d y (x, x ) is an affine transformation given by Equation (21), then the concavity of u is preserved by this affine transformation, i.e., u is a concave function of d with the unique maximum 1 at the equilibrium point of the information dynamics in Equation (10) where d = 0. Otherwise, the function u decreases as each absolute value |d y (x, x )| increases.
Consequentially, the upper bound of the error probability, given as (M − 1) ρ [u(d)] N , also achieves the unique maximum (M − 1) ρ where d = 0, and decreases monotonically as each absolute value |d y (x, x )| or the nonequilibrium in the information transmission increases. Thus, higher nonequilibrium decreases the error probability and thus improves the performance. In particular, when d = 0, the equilibrium is reached, then the performance of the error correction is the worst. This also indicates that one needs to go to the nonequilibrium regime to improve the error corrections.
On the other hand, the EPR in Equation (25) can describe the reliability of the information transmission in the perspective of the nonequilibrium information dynamics. In addition, the EPR is directly related to the measure of the nonequilibrium characterized by d. Then, the performance of the error correction can also be a function of the EPR. We find that the performance of the error corrections becomes better with the decrease in the upper bound of the error correction when the EPR increases. Then, we can see that the dissipative cost can enhance the performance of the error corrections. This shows how nonequlibrium in transmission can help to improve the error correction in decoding from the thermodynamic perspective.

An Illustrative Case of Binary Memoryless Channel
In this section, we illustrate the above idea through analysing how the performance of error correction is influenced by the nonequilibrium in the binary memoryless channel model. In this example, all the entities that we have studied in the previous sections can be given in explicit forms and be interpreted clearly.
The transmitted symbols are simply assumed to be s = a, b. The encoder assigns binary block codes to the symbols such as a = 000, b = 111. For the simplicity of the encoder, let us use the random encoding method, with the input distribution for the encoding letters The input block codes are interfered by the two random and independent noise sources which form the channel. The settings of this case are as follows. If the encoder generates the letters x independently and identically distributed with the input distribution Q(x) at time t, then the receiver receives the output symbols y = 0, 1 at t + 1, which satisfy the following equation where η and ξ are the noises generated in the two noise sources, respectively, at that moment. The probabilities of the noises are given by P(η = 0) = e 1 , P(η = 1) = 1 − e 1 , P(ξ = 0) = e 2 , P(ξ = 1) = 1 − e 2 .
By using Equation (28), we obtain the correspondence between x t and y t+1 under the noises (see Table 1). The noise m influences the input letter x = 0, since y t+1 = ξ t when x t = 0. Meanwhile, the noise n distorts the letter x = 1, since y t+1 = η t when x t = 1. Table 1. Correspondence between the output y t+1 and the input x t under noises.
According to the description in the above, the transmission probabilities q(y|x) can be given by the probabilities of the noises as follows The difference between the transmission probabilities is recognized as the strength of the nonequilibrium part of the driving force or the time-irreversible information flux according to Equations (14) and (15), which is given by and the strength of the equilibrium part of the driving force can be given by The decomposition of the transmission probabilities in this case can be given by Equation (21) as The constraints on m and d can be given by Equation (23) as  Table 2. We show that more nonequilibrium or information flux leads to more effective error correction at the decoder. As both the quantifications of the thermodynamic dissipation cost and the reliability of the error correction, the EPR can be rewritten as the function of d, as |d| being the strength of the nonequilibrium driving force or the information flux characterizing the degree of the nonequilibrium. From Equation (25), the EPR is given by where N is the block length. In Figure 1, the EPR as the function of d is potted for different m and p. It is shown that, for the fixed m and p, the EPR has a convex shape with a global minimum 0 at d = 0. The minimum indicates that the reliability of the information transmission is zero at equilibrium. Otherwise, the EPR monotonically increases when the nonequilibrium or absolute value |d| increases (see Figure 1).  Table 2. The minima of all the EPRs are zero at the equilibrium point where d = 0, and the EPRs increase monotonously as the absolute value |d| increases.
The performance of the error correction, which is described by the upper bound in Equation (5), has an intrinsic connection with the nonequilibrium in the information transmission characterized by d in Equation (21). By using Equation (27), we see that the upper bound of the error probability G is a simple function of d, which is explicitly given by G = (M − 1) ρ−1 U(Q, q N , ρ) where the parameters are set as ρ = 2 and M = 2.
In Figure 2, we have plotted the upper bound G as the function of d for different m and p. It can be easily verified that G monotonically decreases as the nonequilibrium |d| increases (see Figure 2). In addition, we also plot the upper bound G as the function of EPR in Figure 3, which shows that G monotonically decreases as the thermodynamic dissipation cost EPR increases. Therefore, we conclude that the larger absolute value of the asymmetry of the noise probability |d|, or the strength of nonequilibrium driving force (or the information flux), characterizing the larger nonequilibrium, results in the smaller error probability and the better performance of the error correction.  Table 2. The maxima of all the upper bounds go to 1 at the equilibrium point where d = 0. Correspondingly, the performance decreases monotonously as the absolute value |d| increases.  Table 2. The upper bound of the error probability decreases monotonously as the EPR increases.

Conclusions
In this study, we aim to investigate the performance of error correction in information transmission. Our focus is on the memoryless channel model, and our findings suggest that nonequilibrium dynamics can enhance error correction performance. From a thermodynamic perspective, we demonstrate that increasing the nonequilibrium dissipation cost in the information transmission can lead to improved error correction performance of the decoder. These results may have broader implications beyond memoryless channels, and we plan to explore this in future studies.
The transfer of information is a critical process in biological systems, occurring in a variety of scenarios such as neural networks, DNA replication, and tRNA selection during translation. In molecular biology, the central dogma states that genetic information flows from DNA to RNA and from RNA to protein, with information referring to the precise determination of sequence for nucleic acid bases or amino acid residues in proteins [5]. Despite the stochastic nature of the underlying chemical processes, these information transfer processes require high accuracy. While kinetic proofreading is a well-known mechanism for error correction in biochemical reactions [6], the underlying mechanism for improving the performance of error corrections is still not fully understood, despite the existence of several classes of error-correction codes at the cellular level [7]. Our study reveals that the nonequilibrium effects resulting from dissipation cost to the environment are essential for improving the performance of error corrections, a finding with potential implications for investigating the universal mechanism by which information is transferred with high accuracy in biological systems. We plan to explore this further in future research.