Memory effects can make the transmission capability of a communication channel uncomputable

Most communication channels are subjected to noise. One of the goals of information theory is to add redundancy in the transmission of information so that the information is transmitted reliably and the amount of information transmitted through the channel is as large as possible. The maximum rate at which reliable transmission is possible is called the capacity. If the channel does not keep memory of its past, the capacity is given by a simple optimization problem and can be efficiently computed. The situation of channels with memory is less clear. Here we show that for channels with memory the capacity cannot be computed to within precision 1/5. Our result holds even if we consider one of the simplest families of such channels—information-stable finite state machine channels—restrict the input and output of the channel to 4 and 1 bit respectively and allow 6 bits of memory.


I. INTRODUCTION
The foundations of the theory of communications were single-handedly established by Shannon in his seminal work [1].The natural question that Shannon posed is: What is the maximum rate that can be achieved with an arbitrarily small error?In an ingenuity tour de force, he proved that for the case of memoryless channels, this quantity -the capacity of the channel-defined in such operational way, has a very simple entropic expression: it coincides with the maximization on the input alphabet of the mutual information between input and output in one use of the channel.This coding result was complemented by the Blahut-Arimoto (BA) algorithm [2], [3] which allows to efficiently approximate the capacity of any memoryless channel within any desired precision.
What is the situation for channels with memory?Regarding coding theorems, more and more general classes of channels were successfully dealt with [4], [5], [6], [7] culminating in the generalized capacity formula [8].In this last work, Verdu and Han derived a generalization of Shannon's coding theorem which essentially makes no assumption regarding the structure of the channel.When it comes to algorithms that approximate the capacity, despite considerable effort, the situation is nowadays less successful.Even if we restrict to finite state machine channels (FSMCs) the problem remains open.However, there is a rich literature dealing with particular cases such as Gaussian channels with inter-symbol interference (ISI) [9] or Gilbert-Elliott channels [10].The results in [10] were extended in [11] to a larger class of channels, although both papers deal with iid sources and channels without intersymbol interference.The restriction of ISI channels was removed in [12], [13], [14] using Montecarlo methods and in [15] using an iterated algorithm similar to BA.However, these results do not address FSMCs in full generality and can not guarantee the precision of the result.The methods in [16], [17] allow computing analytically the capacity of some channels with ISI but with memory effects that vanish.An alternative approach is based on assuming some concavity properties of the mutual information rate [18], [19].However, these assumptions do not hold for arbitrary FSMCs [20].
It is the main aim of this work to show that an algorithm that computes the capacity of an arbitrary FSMC cannot exist.In order to be precise, a FSMC with n input symbols and m states is determined by [9] (a) An initial state s 0 .We will only consider FSMC in which the initial state is fixed and known to the sender and receiver.(b) A set of conditional probability assignments p(y, s|x, s ) which describes the probability of output symbol y and transition to state s if the FSMC is in state s and gets x as input.In order to avoid problems of approximating p(y, s|x, s ) we will only consider in this paper FSMC for which the probability assignments p(y, s|x, s ) are rational numbers.Moreover, we will only consider in this paper FSMC which are information stable (see below for the definition) and for which p(y, s|x, s ) are in product form p(y|x, s )p(s|x, s ).Our main result can then be stated as: Main Result 1.There does not exist any algorithm that whenever it gets as input the set of probability assignments {p(y|x, s ), p(s|x, s )} s,y,x,s of an information stable FSMC N with 10 input symbols and 62 states, it outputs a rational number c so that the capacity of N verifies It is obvious that the same result then holds for n input symbols and m states as long as n ≥ 10 and m ≥ 62.

arXiv:1601.06101v2 [cs.IT] 19 May 2016
Indeed, we will prove something slightly stronger.We will obtain the main result about approximation algorithms from another result about decision algorithms.
Fix a rational number λ ∈ (0, 1].We will give explicitely a subfamily S λ of FSMC (information stable and with rational conditional probability assignments in product form) with 10 input symbols and 62 states, with the additional property that all channels N ∈ S λ have capacity ≥ λ or ≤ λ/2.Main Result 2. There does not exist any algorithm that whenever it receives as input the set of probability assignments {p(y|x, s ), p(s|x, s )} s,y,x,s of N ∈ S λ , it outputs 1 if the capacity of N ≥ λ and it outputs 0 if the capacity of N ≤ λ/2.
It is clear that if we consider Main Result 2 for S 1 we get Main Result 1.That is, if we could approximate the capacity within error 1/5 and we have a channel for which we know its capacity is ≤ 1/2 or ≥ 1, we could decide which is the case.
The plan for the rest of the text is as follows.We introduce the notation in Section II.We review the definition of Turing Machine in Section III in order to rigorously state our results.Section IV is dedicated to probabilistic finite automata (PFA) and decision problems for PFA.In Section V we define finite state machine channels and their capacity.We present our channel construction and prove the undecidability of its capacity in Section VI.We finish the paper in Section VII with a brief discussion of the implications of our result and some open questions.

II. NOTATION
We denote random variables by capital letters X, Y, ..., sets and PFAs by calligraphic capital letters X , Y, ..., channels by capital bold face letters X, Y, ..., and instances of random variables by lower case letters x, y, ....We denote vectors with the same convention, whenever confusion might arise a superscript indicates the number of components of the vector and a subscript the concrete component: We indicate a consecutive subset of n components of the vector with subscript notation [a, a+n−1]: x [a,a+n−1] = (x a , x a+1 , ..., x a+n−2 , x a+n−1 ).
A vector is called a probability vector if all its entries are non-negative and add up to one.A matrix is called a stochastic matrix if all its columns are probability vectors.A stochastic matrix takes probability vectors to probability vectors.

III. FORMAL STATEMENTS OF THE MAIN THEOREMS. TURING MACHINES
In order to make the Main Theorems mathematically rigorous, we have to recall the definition of a Turing Machine (TM) as the formal definition of what an algorithm is.For more details one can consult for instance [21], [22].
A TM represents a machine with a finite set of states that can read from and write to an infinitely long memory in the form of a tape.The tape is divided into cells that can hold a single symbol from a finite alphabet.Initially, the tape contains some arbitrary but finite string that we call the input followed by an infinite sequence of blank symbols.The operation of the machine is controlled by a head that sits on top of a cell of the tape.The head operates as follows: it reads the symbol below it; then, depending on the symbol and the current state it writes a symbol, moves left or right and transitions to a new state.The set of states includes the halting state.The TM halts after it transitions to the halting state.The output of the TM consists of the, possibly empty, string of symbols starting from the leftmost non-blank symbol to the rightmost non-blank symbol.
Formally, a TM is defined by a triple M = (Q, Σ, δ) where Q represents the finite set of states including an initial and a halting state, Σ is the finite set of symbols that a cell may contain and it includes the blank symbol and δ : A configuration is a complete description of the status of a TM.It consists of the current state, the contents of the tape and the position of the head.In the initial configuration, the tape contains the input string and the head of the TM is in the initial state and situated on top of the leftmost cell of the input.Once the initial configuration is fixed a TM evolves deterministically and may or may not eventually halt.
Let us fix n = 10, m = 62.In order to specify a FSMC with n input symbols and m states, it is enough to give N = nm(n + m) = 44640 rational numbers corresponding to the conditional probability assignments.Therefore, the set of FSMC can be seen as a subset of positive elements in Q N .Fix an explicit injective map σ from the set of positive elements in Q N into the natural numbers N.For instance, consider the first 2N prime numbers p 1 , . . ., p 2N and define Any other explicit σ will do the job.Indeed, σ is just a way to encode the given FSMC as natural number, which then can be transformed into a valid input of a TM.For instance, it would be transformed into a string of zeroes and ones if Σ = {0, 1, #}.Main Results 1 and 2 can be then respectively restated as: Main Result.There does not exist any TM that halts on all inputs in σ(FSMC) and, on input σ(N), outputs a rational number c so that the capacity of N verifies Main Result.There does not exist any TM that halts on all inputs of the form σ(N) for N ∈ S λ and it outputs 1 if the capacity of N ≥ λ and it outputs 0 if the capacity of N ≤ λ/2.

IV. PROBABILISTIC FINITE AUTOMATA
A PFA A is given by a tuple A = (Q, W, X , v, F).Q denotes a finite set of states, W denotes a finite input alphabet, X denotes a finite set of stochastic matrices with cardinality equal to the cardinality of the input alphabet, v denotes an initial probability distribution over Q and F ⊆ Q denotes a set of accepting states.We say that the PFA is rational if the coefficients of X and v are rational numbers.We will only consider rational PFA in the sequel.
The action of a PFA is defined by the transition probabilities from one state to another as a function of the input symbols.If the automaton is in the state q a and reads the letter w it transitions to the state q b with probability: where we denote by a, b the scalar product between vectors a and b and by π X a vector with ones in the positions indicated by X and zeroes in the remaining positions.We exploit the same notation for the probability that the automaton transitions from the state q a to the state q b after reading the word w = (w 1 , . . ., w |w| ) ∈ W |w| : More generally, if we have a probability distribution over the states given by the column vector x and the PFA reads the letter w then the new distribution over the states is given by X w x.A particularly relevant probability is the probability that the automaton ends in an accepting state after reading some word w.We call this probability the probability of accepting w or the value of w.It can be computed We call the value of A, which we denote by val A , the supremum of the acceptance probabilities over all input words: Whenever possible, we will represent graphically the different automata constructions.We will follow the following conventions.A state is denoted by a circle.An accepting state is denoted by a circle with a double line around it.In all automata in this paper, the initial distribution will have one coefficient with weight one.We indicate the corresponding state with an arrow that does not come from any state.
We indicate with w,p − − A that if the automaton reads the letter w it transitions from the origin of the arrow to the state pointed by the arrow with probability p.In order to avoid clutter, we simplify the notation in several cases.If we do not show transitions corresponding to all input symbols, the missing transitions correspond to self-loops with probability one.We drop the probability and just write w − A if a transition occurs with probability one.We drop the input symbol and just write p − A if all input symbols transition with the same probability.
Example 1.Consider the PFA given in Figure 1.The automaton in the figure has three states Q = {q 1 , q 2 , q 3 }, two input symbols W = {a, b}, the initial state is q 1 and there is a single accepting state q 3 .By looking at the figure we can construct the stochastic matrices: Fig. 1: Automaton with three states Q = {q 1 , q 2 , q 3 }, two input symbols W = {a, b}, the initial state is q 1 and there is a single accepting state q 3 .Now, assume that we see the word w = baa, we can easily compute its value: We need to consider special types of PFAs that we name as freezable and resettable.
We call a PFA a freezable PFA if one of the transition matrices is equal to the identity matrix X id .The reason is that for such a PFA reading the symbol corresponding to the identity leaves the state probabilities unchanged.Let u be any probability vector, then We call a PFA a resettable PFA if one of the transition matrices, X rt , takes the state back to the initial state.Let u be any probability vector, then Definition 1.Let γ be a map from PFAs to PFAs such that given a PFA A, γ(A) is a PFA with an extended input alphabet composed of the original alphabet together with the additional symbols id and rt and the corresponding matrices X id and X rt as given by ( 8) and (9).That is, γ transforms any PFA into a freezable and resettable PFA.
The key lemma we will need about PFAs is the fact that their value cannot be approximated within a constant error.Let us give the precise statement.Fix a rational number λ ∈ (0, 1].Lemma 1.One can give explicitely a subfamilty T λ of rational freezable and resettable PFA with alphabet size 5 and 62 states with the following properties: (i) val A is either ≥ λ or ≤ λ/2 for all A ∈ T λ .
(ii) There cannot exist an algorithm that on input A ∈ T λ decides which is the case.
The definition of T λ will be given in Appendix C, Eq. ( 90).The formal meaning of (ii) is clear from Section III.

V. FINITE STATE MACHINE CHANNELS
A channel can depend on past inputs and outcomes in very complicated ways.We focus our interest on finite FSMC which is the set of discrete channels that have its behavior dictated by a finite state machine [9].Let X , Y and S be a finite sets that represent the input alphabet, output alphabet and set of states.A FSMC is characterized by the time-invariant conditional probabilities p(y, s|x, s ) for all states s, s ∈ S, input symbols x ∈ X and output symbols y ∈ Y.These conditional probabilities denote the probability that the channel outputs the symbol y and transitions to the state s given that the channel is in state s and receives the input symbol x.In this paper, we will restrict our attention to those FSMC for which p(y, s|x, s ) have a product form p(y|x, s )p(s|x, s ).
We assume that the initial state s 0 is known by both the transmitter and the receiver.We denote by W n s0 the sequence of probability distributions induced by s 0 that give the probability of a sequence of outputs given a sequence of inputs into the channel: where Analogously we can define a sequence of probability distributions to characterize the state of the channel: Note that we have used, abusing the notation, W n s0 both to define the conditional probability of the output and the state.Consider two random variables X, Y with joint distribution p(x, y), the information spectrum is the distribution of the random variable i X,Y given by: The mutual information is the expected value of the information spectrum: A channel is said to be information stable [4] if for all γ > 0 there exists a sequence of random variables {X i } ∞ i=1 such that: where: In full generality one may need to resort to the capacity formula of Verdu and Han [8] in order to compute the capacity of a FSMC.However, if the channel is information stable then the capacity is given by the mutual information rate [4]: VI. THE FAMILY S λ AND THE PROOF OF MAIN THEOREM 2 Given a freezable and resettable PFA A we define the channel V A as follows.The input alphabet of the channel takes values in {0, 1} × W, which we identify with two different inputs: a data input and a control input.The data input is transmitted to the output: noiselessly if A is in an accepting state or, if A in any other state, the channel outputs uniformly at random an element of the output alphabet.More concretely, the output of the channel, which is independent of the control input, is defined by the following conditional probability: The control input is fed to A, which begins in the initial state, and the state transition probabilities are dictated by the PFA: The next key Lemma will be proven in Appendix A.
Lemma 2. The capacity of V A is given by: The family S λ in Main Theorem 2 is defined simply as with T λ the family introduced in Lemma 1 and defined in Section C, Eq. (90) below.Main Theorem 2 is then a trivial consequence of Lemma 1 and Lemma 2.
Furthermore it is also a corollary of Lemma 2 that all the channels in S λ are information stable: The proof will be given in Appendix B.

VII. DISCUSSION
We have proven that no generalization of the Blahut-Arimoto algorithm can exist that approximates the capacity for all information stable FSMC to any desired precision.This result, as most in information theory, relies on asymptotics.But asymptotics are just a convenient tool to analyze and compare resources that, in the physical world, can only be used a finite number of times.More prosaically, if one asks the question, what is the maximum communications rate over n uses of some channel N with error at most ; then these computability problems seem to fade.A sharp rigorous proof of this statement is worth further investigation.
Future work could also investigate the minimum dimensions for which uncomputability holds.Our construction builds directly on top of several strong undecidability results of PFAs.However, recent developments underlying these results suggest that it should be possible to reduce the dimensions of our construction [23].
Finally, what other problems could be attacked with similar techniques?The proof technique can be extended to the capacities of quantum channels with memory.We will make the explicit analysis in a forthcoming paper [24].Furthermore, our result connects with recent work regarding the different capacities of memoryless quantum channels [25], [26], showing some evidence that these capacities might be uncomputable.Also, memoryless zero error capacities, both classical and quantum, are known to have highly non-trivial behaviour [27], [28], [29], [30], [31].Are any of these quantities uncomputable?The techniques used here exploit directly the memory of the channel.Hence, it is unlikely that a similar proof would hold for a memoryless capacity.However, it might be possible to extend our results to the zero-error capacities of channels with memory.

APPENDIX A PROOF OF LEMMA 2
Proof: First we will prove that val A is an achievable rate, that is: C(V A ) ≥ val A .Then we will prove that val A is an upper bound on the mutual information rate and in consequence C(V A ) ≤ val A .
Let δ > 0, then there exists some word w > 0 such that val(A, w) = val A − δ, furthermore let |w| = m.Consider the following protocol, the input into the control register is the deterministic sequence (c i ) ∞ i=1 with This choice induces a memoryless channel when regarded in blocks of m + n uses of the channel.That is, every block of m + n inputs into the data input encounters exactly the same noisy channel once the control input is fixed by (22).In consequence, given this particular control input, any mutual information between the input and the output over m + n uses is an achievable rate (once normalized over the number of uses).For the data input, we choose the uniform distribution.
The following chain of inequalities holds for the conditional entropy of the output given the input: where (23) follows by the chain rule, the inequality (24) by bounding the entropy of the first m uses by m, the inequality ( 25) by removing the conditioning on Y [1,m] and ( 26) holds from bounding the conditional entropy by (30) that we prove below.
After the first m uses, the automaton behaves like a noiseless channel with probability val A − δ and like a completely random channel with the complementary probability.In consequence, we can bound the conditional entropy of the output of the uses m + 1 to m + n as follows: where φ = (1, 0, . . ., 0) is a completely deterministic probability vector of length the maximally entropic vector of length 2 n − 1 and h( ) := − log − (1 − ) log(1 − ) is the binary entropy function.Now we can use (26) to bound the mutual information of the first m + n uses: Finally, by choosing n larger than That is, for all δ > 0 the rate val A − 2δ is achievable.Now we will prove that C(V A ) is upper bounded by val A .Let S i denote the state of the PFA at use i, since the output only depends on the control input through the PFA state we have that . In consequence, we can bound from below the conditional entropy of the output given the input as follows: Finally, we can plug the bound on the conditional entropy to obtain the desired result:

APPENDIX B PROOF OF COROLLARY 1
Proof: From the proof of Lemma 2 we know that ∀δ > 0, ∀t ∈ N there exists n t (wlog n t+1 ≥ n t ) and X t = {X 1 , . . ., X nt } such that: We define the following source to input into the channel: and the sequence of random variables that is input over the first n uses is The sequence {m i } ∞ i=1 is chosen such that: where W n is the random variable induced by V n at the output of the channel and n is related to t by (46 Note that the input V n is composed of independent random variables, hence: and also note that for all t and β ∈ [0, n t ]: In order to achieve (48), we see how m t can be chosen: In order to verify (48) it suffices to choose m t larger than (54) such that the following holds However, for technical reasons in the concentration bounds that follow we choose: In the following we prove that ∀η > 0: Let us expand the probability expression in (57): Now we will exploit that i V n ,W n can be expressed as a sum of l = t i=1 m i + α + 1 independent random variables (see (46)).For these sums we can bound the two-tailed probability via Hoeffding's inequality [32].More concretely, let {X i } l i=1 be a sequence of l independent random variables, let t ≥ 0 and let We can make clearly the identifications with (60) and (50).However, before applying Hoeffding's inequality let us bound the denominator in the exponential term: The last inequality follows because from (56) we have that (n t+1 ) 2 ≤ m t ≤ n.Now if we apply Hoeffding's inequality to (60) we can bound it from above by: and in consequence the limit when n goes to infinity is zero for all η > 0.

APPENDIX C PROOF OF LEMMA 1
Lemma 1 is essentially proven by Gimbert and Oualhadj in [33] with a very elegant construction (a succinct sketch can be found in [34]).We include a full proof here for completeness, to cover the case of an arbitrary λ (in [33] they only consider the case λ = 1) and to include in the construction an undecidability result of Hirvensalo [35].This allows us to give the concrete estimates of alphabet size 10 and 62 states that appear in Lemma 1.

A. The construction of Gimbert and Oualhadj
Lemma 3 (Proposition 5 [34]).Let D x,y be the automaton in Figure 2 and x Proof: First, we need to make some observations regarding D x,y .If the input letter b is fed two or more consecutive times the automaton is forced it into the states sink, q 3 and q 6 from which the automaton cannot exit.For any such a word, the acceptance value is y.Hence, we concentrate our attention to words of the form a n1 ba n2 b . . .ba nt b.For any word w of this form the acceptance value is: Furthermore, the upper bound is reachable.To verify this, consider the word wa n , p q 1 wa n − −− A q 3 does not change and we can make p q 4 wa n − −− A q 5 approach 1 − p q 4 wa n − −− A q 6 by choosing n large enough.Both p q 1 w − A q 3 and init start p q 4 w − A q 6 admit a very compact form: Let us consider first x ≤ 1/2.This implies that x ≤ 1 − x and in consequence Let > 0, for any word w such that p[q 1 → q 3 ] = 1 − we have p[q 4 → q 6 ] ≥ 1 − and val(D x,y , w) ≤ y.
Let us assume now that x > 1/2.We are going to prove that for any ∈ (0, x) there exists a word w such that: Consider the sequence of words {w k } ∞ k=2 where w k = a n2 ba n3 n . . .ba n k and the lengths n 2 . . .n k are given by Let b > 1 be a number such that x b = 1 − x.The following sequence of inequalities holds: x bni (77) Note that the sum in the right hand side of (80) when k goes to infinity is very similar to the Riemann zeta function evaluated at a real argument strictly larger than one.For this arguments it is well known [36] that it can be bounded by If we apply this bound to (80) we obtain Furthermore, (84) remains an upper bound for finite k since we are only dropping positive contributions.Hence, (71) is verified for all k.Let us now verify that there exists k such that the requirement (72) also holds.Consider the following sum and this sum diverges for any non-zero x and finite C .This implies that lim k→∞ k i=2 (1−x ni ) = 0 and that there exists a finite k such that Now, we are going to modify D x,y .The main idea is that x will be replaced by the probability that an automaton A init q 1 q 3 q 2 q 4 q 5 accepts a word w A .This is achieved very easily, see Figure 3, once the state of D A,y reaches A it continues inside the automaton until it sees c which is a symbol outside the input alphabet of A. Then, it will transition to one of two different states depending on whether or not A is in an accepting state.We indicate the transitions from an accepting state by and the transitions from a non-accepting state by .Let w A be an arbitrary input word into A then: In the following we reduce the problem of finding the value of D A,y to the emptyness of the set L A>λ .This is the set of words with acceptance probability strictly higher than λ.That is: L A>λ = {w ∈ W * : val(A, w) > λ}.Proof: Assume first that L A>1/2 is not empty.Then there exists some w A such that val(A, w A ) > 1/2.Hence, we can construct the sequence w k = (aw A c) n2 . . .(aw A c) n k with the lengths n 2 . . .n k given by (73).Following the proof of Lemma 3 we have that for > 0 there exists k such that w k verifies conditions (71) and (72).Assume now that L A>1/2 is empty.We can restrict our attention to words of the form (aw Furthermore for any word w we have that val(A, w) ≤ 1 − val(A, w) and in consequence > 0 (89) implies that for any word such that p q 1 w − A q 4 = 1 − we have that p q 4 w − A q 6 ≥ 1 − and val(D A,y , w) ≤ y.
Let us close this section by defining the family T λ as

B. The undecidability result of Hirvensalo
To use the above construction in order to prove Lemma 1, up to issue of restricting to freezable and resettable channels (that we will take care of below), it only remains to show that there cannot exist an algorithm that on input a PFA, decides whether L A>1/2 is empty or not.This problem, known as the emptiness problem, was proved undecidable in [37], [38], [39].Recently, new proofs with explicit bounds in the number of states and the cardinality of the alphabet have been derived in [40], [35], [34] together with an undecidability proof of several related sets.Here, we will rely on Theorem 1 ( [35]).Let k be an integer equal or greater than 7 and (n, m) be a duple of integers that is equal or pointwise larger than (2, 5k − 10).There cannot exist an algorithm that, on input a PFA with alphabet size n and m states, decides the emptiness of L A>δ for δ = 1/(5k − 10) Taking Theorem 1 as a starting point, we can amplify the result and obtain undecidability for any rational δ ∈ (0, 1) (in particular for δ = 1/2).
Corollary 2. Fix any rational number δ.There cannot exist an algorithm that on input a PFA with alphabet size 2 and 27 states, decides the emptiness of L A>δ Proof: Given an arbitrary PFA A = (Q, W, X , v, F) and p ∈ (0, 1) we are going to construct two PFAs B p and C p such that: L A>δ is empty ⇔ L Bp>pδ is empty ⇔ L Cp>pδ+1−p is empty.
Let us first construct B p = (T , W, Y, u, F).The set of states is T = {Q ∪ init ∪ sink}.The input alphabet is equal to the original one.For any input symbol x ∈ W we define the stochastic matrices of B p as follows: Note that we have added two rows and columns to track the two new states.Let us parse the action of the automaton as defined by the stochastic matrices.If it is in any of the original states, its behavior remains unchanged.If the automaton is in the sink state no matter what input symbol it reads the PFA remains in the sink state.Finally, if the automaton is in the init state upon reading the input symbol x with probability 1 − p it will transition to the sink state and with probability p it will transition to whatever the original automaton would have transitioned from the initial distribution.In other words, the new distribution on the states will be given by (p X x v, 0, 1 − p).
The initial distribution of B p has weight one on the init state, that is: u = (0, . . ., 0, 1, 0).The construction of C p is identical except that we add the sink state to the set of accepting states.We have depicted both constructions in Figure 4.
For any input word w ∈ W * we have that val(A, w) = p val(B p , w) = p val(C p , w) + 1 − p.Hence, L A>δ is empty ⇔ L Bp>pδ is empty ⇔ L Cp>pδ+1−p is empty.

C. Resettable and freezable channels
By the definition of the family T λ given in (90), Lemma 1 is just a consequence of Lemma 4, Corollary 2 and the following lemma, whose proof finishes the paper.⊇ This direction is trivial since any input word w of Ã is also an input word of γ( Ã) and val(γ( Ã), w) = val( Ã, w).⊆ Let us divide the input words into two sets: W 1 the words that either end with the symbol rt or consist of a string of id and W 2 which is the complementary set, that is, words that have at least one symbol different than id and do not end with the rt symbol.The acceptance probability of any w ∈ W 1 is simply the acceptance probability of a distribution with unit probability on the initial symbol.Since for D Ã,y the acceptance and initial symbols are disjoint, the value of w is zero.That means that no word from W 1 can be in the set {w : val(γ( Ã), w) ≥ λ} for any value of λ ∈ (0, 1]. First, consider any word w ∈ W 2 that contains at least one identity symbol, it can be written as w 1 idw 2 where w 1 and w 2 are two sequences of input symbols and at least one of both is non empty.We have that val(γ( Ã), w) = val(γ( Ã), w 1 w 2 ) and by applying this argument to all the identity symbols in the word we find a new word w with no identity symbols such that val(γ( Ã), w) = val(γ( Ã), w ).Hence we can restrict our attention to words with no identity symbol.
Second, we consider any word w ∈ W 2 that contains at least one reset symbol, it can be written as w 1 rtw 2 where at least w 2 is non empty.We have that val(γ( Ã), w) = val(γ( Ã), w 2 ), again we can apply this argument to all the reset symbols in the word and find a word w with no reset or identity symbols such that val(γ( Ã), w) = val(γ( Ã), w ) = val( Ã, w ).

Fig. 3 :
Fig. 3: The automaton D A,y has value ≤ y if L A>1/2 is empty and value ≥ 2y if not.

Lemma 4 .
Given a PFA A and y ∈ [0, 1/2], the automaton D A,y has value ≤ y if L A>1/2 is empty and value ≥ 2y if not.

Fig. 4 :
Fig. 4: The automata B p (left) and C p (right) can be used to amplify the undecidability of the emptiness problem to arbitrary δ ∈ (0, 1).
2 ) : A has a binary alphabet and 27 states (90) with γ as in Definition 1.

Lemma 5 .
val Ã = val γ( Ã) for all PFA Ã of the form D A,y .Proof: Given a PFA A, we define the set values(A) = {val(A, w)|w ∈ W * }.This is the set of achievable values or, alternatively, it can be regarded as the range of the function val(A, w) once the PFA A is fixed.It is then enough to show that values( Ã) = values(γ( Ã)) for any PFA Ã of the form D A,y .