Information recovery from randomly mixed-up message text

This paper is concerned with finding a fingerprint of a sequence. As input data one uses the sequence which has been randomly mixed up by observing it along a random walk path. A sequence containing order exp (n) bits receives a fingerprint with roughly n bits information. The fingerprint is characteristic for the original sequence. With high probability the fingerprint depends only on the initial sequence, but not on the random walk path.


The information recovery problem
Let ξ : Z → {0, 1} designate a double-infinite message-text with 2 letters. Such a coloring of the integers is also called a (2-color) scenery. Let S = {S(t)} t∈N be a recurrent random walk on Z starting at the origin. In this paper we allow the random walk S to jump, i.e. P (|S(t + 1) − S(t)| > 1) > 0. We use S to mix up the message-text ξ. For this we assume that ξ is observed along the path of S: At each point in time t, one observes χ(t) := ξ(S(t)). Thus, χ designates the mixed up message-text, which is also the scenery ξ seen along the path of S. The information recovery problem can be described as follows: observing only one path realization of the process χ, can one retrieve a certain amount of information contained in ξ? A special case of the information recovery problem is when one tries to reconstruct the whole ξ. This problem is called the scenery reconstruction problem . In many cases being able to reconstruct a finite quantity of the information contained in ξ, already implies that one can reconstruct all of ξ. This paper is concerned with the information recovery problem in the context of a 2-color scenery seen along a random walk with jumps. The methods which exist so far seem useless for this case: Matzinger's reconstruction methods [Mat99a;Mat05] do not work when the random walk may jump. Furthermore, it seems impossible to recycle the method of Matzinger, Merkl and Löwe [LMM04] for the 2-color case with jumps. The reason is that their method, requires more than 2-colors. Hence, the fundamentally new approach is needed. That is presented in this paper.

Main assumptions
Let us explain the assumptions which remain valid throughout this paper: • ξ = {ξ(z)} z∈Z is a collection of i.i.d. Bernoulli variables with parameter 1/2. The path realization ξ : z → ξ(z) is the scenery from which we want to recover some information.
Often the realization of the process {ξ(z)} z∈Z is also denoted by ψ.
We also assume that S has positive probability to visit any point in Z, i.e. for any z ∈ Z there exists t ∈ N, such that P (S(t) = z) > 0.
• ξ and S are independent.
• Let f : D → I be a map. For a subset E ⊂ D we shall write f |E for the restriction of f to E.
Thus, a ⊑ b holds if a can be obtained from b by "removing the first or the last element".

Main result
The 2-color scenery reconstruction problem for a random walk with jumps is solved in two phases: 1. Given a finite portion of the observations χ only, one proves that it is possible to reconstruct a certain amount of information contained in the underlying scenery ξ.
2. If one can reconstruct a certain amount of information, then the whole scenery ξ can a.s. be reconstructed. This is proven in the second phase.
This paper solves the first of the two problems above. The second problem is essentially solved in the follow-up paper [LM02a]. In order to understand the meaning of the present paper, imagine that we want to transmit the word ξ m 0 . During transmission the lector head gets crazy and starts moving around on ξ following the path of a random walk. At time m 2 , the lector head has reached the point m. Can we now, given only the mixed up information χ m 2 0 , retrieve any information about the underlying code ξ m 0 ? The main result of this paper theorem 1.1, shows that with high probability a certain amount of the information contained in ξ m 0 can be retrieved from the mixed up information χ m 2 0 . This is the fingerprint of ξ m 0 , referred to in the abstract. Here is the main result of this paper. and an event E n cell OK ∈ σ(ξ(z)|z ∈ [−cm, cm]) with c > 0 not depending on n such that all the following holds: 1) P (E n cell OK ) → 1 when n → ∞.
The mapping g can be interpreted as a coding that compresses the information contained in ξ m 0 ; the mappingĝ can be interpreted as a decoder that reads the information g(ξ m 0 ) from the mixed-up observations χ m 2 +1 0 . The vector g(ξ m 0 ) is the desired fingerprint of ξ m 0 . We call it the g-information. The functionĝ will be referred to as the g-information reconstruction algorithm. Let us explain the content of the above theorem more in detail. The event {ĝ(χ m 2 0 ) ⊑ g(ξ m 0 )} is the event thatĝ reconstructs the information g(ξ m 0 ) correctly (up to the first or last bit), based on the observations χ m 2 0 . The probability thatĝ reconstructs g(ξ m 0 ) correctly is large given the event {S(m 2 ) = m} holds. The event {S(m 2 ) = m} is needed to make sure the random walk S visits the entire ξ m 0 up to time m 2 . Obviously, if S does not visit ξ m 0 , we can not reconstruct g(ξ m 0 ). The reconstruction of the g-information works with high probability, but conditional on the event that the scenery is nicely behaved. The scenery ξ behaves nicely, if ξ ∈ E n cell OK . In a sense, E n cell OK contains " typical" (pieces of) sceneries. These are sceneries for which the g-information reconstruction algorithm works with high probability. Condition 3) ensures that the content of the reconstructed information is large enough. Indeed, if the piece of observations χ m 2 0 were generated far from ξ m 0 , i.e. the random walk S would start far from 0, then g(ξ m 0 ) were independent of χ m 2 0 , and P ĝ(χ m 2 0 ) ⊑ g(ξ m 0 ) would be about 2 −n 2 . On the other hand, if S starts from 0 and n is big enough, then from 1) and 2), it follows that P ĝ(χ m 2 0 ) ⊑ g(ξ m 0 ) S(m 2 ) = m > 3/4 (1.1) and P ĝ(χ m 2 0 ) ⊑ g(ξ m 0 ) ≥ 3 4 P (S(m 2 ) = m).
Since, by local central limit theorem, P (S(m 2 ) = m) is of order 1 m ≥ e −2n , we get that P ĝ(χ m 2 0 ) ⊑ g(ξ m 0 ) is at least O(e −2n ). Although, for big n, the difference between 2 −n 2 and e −2n is negligible, it can be still used to make the scenery reconstruction possible.

History and related problems
A coloring of the integers ξ : Z → {0, 1, . . . , C − 1} is called a C-color scenery. In a sense, the scenery reconstruction started with the so-called scenery distinguishing problem that can be described as follows: Let ψ a and ψ b be two non-equivalent sceneries which are known to us. Assume that we are only given one realization of the observations χ := ψ • S, where ψ ∈ {ψ a , ψ b }. Can we a.s. find out whether ψ is equal to ψ a or ψ b ? If yes, we say that the sceneries ψ a and ψ b are distinguishable. Kesten and Benjamini [BK96] considered the case where the sceneries ψ a and ψ b are drawn randomly. They take ψ a to be an i.i.d. scenery which is independent of ψ b . In this setting, they prove that almost every couple of sceneries is distinguishable even in the two dimensional case and with only 2 colors. Before that Howard [How97] had shown that any two periodical non-equivalent sceneries are distinguishable. The problem of distinguishing two sceneries which differ only in one element is called the single defect detection problem. In [How96], Howard showed that single defects can always be detected in periodic sceneries observed along a simple random walk. Kesten [Kes96] showed that one can a.s. detect single defects in the case of 5-color i.i.d. sceneries. The question of Kesten whether one can detect a single defect in 2-color sceneries lead Matzinger to investigate the scenery reconstruction problem: Given only one path realization of {χ(t)} t∈N , can we a.s. reconstruct ξ? In other words, does one path realization of χ a.s. uniquely determine ξ? In general, it does not: in many cases it is not possible to distinguish a scenery from a shifted one. Furthermore, Lindenstrauss proved [Lin99] the existence of sceneries which can not be reconstructed. However, one can reconstruct "typical" sceneries: Matzinger takes ξ randomly, independent of S and shows that one can reconstruct a.s. the scenery up to shift and reflection. In [Mat05] and [Mat99a], he proves this for 2-color sceneries observed along the path of simple random walk or a simple random walk with holding. In [Mat99b], he reconstructs 3-color i.i.d. sceneries observed along a simple random walk path. The two cases require very different methods (for an overview of different techniques, see [ML06]). Later Kesten [Kes98] asked, whether one can also reconstruct two dimensional random sceneries. Loewe and Matzinger [LM02b] give a positive answer provided the scenery contains many colors. Another question was formulated first by Den Hollander: To which extent can sceneries be reconstructed when they are not i.i.d. in distribution. Loewe and Matzinger [LM03] characterize those distributions for which Matzinger's 3-color reconstruction works. Yet another problem comes from Benjamini: Is it possible to reconstruct a finite piece of a scenery close to the origin in polynomial time? We take for this polynomially many observations in the length of the piece we try to reconstruct. Matzinger and Rolles [MR03a;MR06] provide a positive answer. The scenery reconstruction problem varies greatly in difficulty depending on the number of colors and the properties of the random walk. In general, when there are less colors and the random walk is allowed to jump, the problem gets more difficult. Kesten [Kes98] noticed, that Matzinger's reconstruction methods [Mat99a] and [Mat05] do not work when the random walk is allowed to jump. Matzinger, Merkl and Loewe [LMM04] showed that it is possible to reconstruct a.s. a scenery seen along the path of a random walk with jumps, provided the scenery contains enough colors. However, with more colors the system is completely differently behaved. This implies that the method of Matzinger, Merkl and Loewe is not useful for the 2-color case with jumps. The present paper is the first step towards reconstructing the 2-color scenery. Let us mention some more recent developments and related works. A generalization of the scenery reconstruction problem is the scenery reconstruction problem for error-corrupted observations. In that problem, there exists an error process ν t , t ∈ N and error-corrupted observationsχ so thatχ t equals to the usual observation χ t if and only if ν t = 0. The error process is i.i.d. Bernoulli and independent of everything else. The problem now is: is possible to reconstruct the scenery based on one realization of the processχ only? Matzinger and Rolles [MR03b] showed that almost every random scenery seen with random errors can be reconstructed a.s. when it contains a lot of colors. However, their method cannot be used for the case of error corrupted 2-color sceneries. The error-corrupted observations were also studied by Matzinger and Hart in [HM06]. A closely related problem is the so-called Harris-Keane coin tossing problem introduced and studied by Harris and Keane in [HK97] and further investigated by Levin, Pemantle and Peres in [LPP01]. In the scenery reconstruction results above, the reconstructable scenery is a "typical" realization of a random (i.i.d. Bernoulli) scenery. A periodic scenery is not such kind of "typical" realization, so the abovementioned results do not apply for the case of periodic scenery. Howard [How97] proved that all periodic sceneries observed along a simple random walk path, can be reconstructed. This lead Kesten to ask what happens when the random walk is not simple. In [LM06], Matzinger and Lember give sufficient conditions for a periodic scenery being reconstructable when observed along a random walk with jumps. A problem closely related to the reconstruction of periodic sceneries, is the reconstruction of sceneries with a finite number of ones. This problem was solved by Levin and Peres in [LP04], where they prove that every scenery which has only finite many one's can a.s. be reconstructed up to shift or reflection when seen along the path of a symmetric random walk. The used a more general framework of stochastic scenery. A stochastic scenery is a map ξ : Z → I, where I denotes a set of distributions. At time t, one observes the random variable χ(t), drawn according to the distribution ξ(S t ) ∈ I. Given S and ξ, the observations χ(t) for different t's are independent of each other. The observations are generated as follows : if at time t the random walk is at z, then a random variable with distribution η(z) is observed. Hence, at time t, we observe χ(t), where L(χ(t)|S(t) = z) = η(z). Recently, Matzinger and Popov have been studied continuous sceneries [MP07]. They define a continuous scenery as a location of countably many bells placed on R. In continuous case, instead of random walk, a Brownian motion is considered. Whenever the Brownian motion hits a bell, it rings. So, unlike the discrete scenery reconstruction, there are no colors: all the bells ring in the same way. The observations consists of time lengths between successive rings. For a well-written overview of the scenery distinguishing and scenery reconstruction areas, we recommend Kesten's review paper [Kes98]. An overview of different techniques as well as the recent developments in scenery reconstruction can be found in [ML06]. Scenery reconstruction belongs to the field which investigates the properties of a color record obtained by observing a random media along the path of a stochastic process. The T T −1 -problem as studied by Kalikow [Kal82] is one motivation. The ergodic properties of observations have been investigated by Keane and den Hollander [KdH86], den Hollander [dH88], den Hollander and Steiff [dHS97] and Heicklen, Hoffman and Rudolph [HHR00]. An overview of mentioned results as well as many others can be found in [dHS06].

Organization of the paper
In order to explain the main ideas behind the g-information reconstruction algorithm, we first consider a simplified example in Subsection 1.6. In this example, ξ is a 3-color i.i.d. scenery instead of a 2-color scenery. The 2's are pretty rare in the scenery ξ: P (ξ(z) = 2) is of negative exponential order in n. The one's and zero's have equal probability: P (ξ(z) = 0) = P (ξ(z) = 1). The (random) locationsz i of the 2's in ξ are called signal carriers. For each signal carrierz i , we define the frequency of ones atz i . The frequency of one's atz i is a weighted average of ξ in the neighborhood ofz i . The g-information g(ξ m 0 ) if a function of the different frequencies of ones of the signal carriers which are located in the interval [0, m]. The vector of frequencies works as a fingerprint for ξ m 0 . The reading of this fingerprint works as follows: Typically, the signal carriers are apart from each other by a distance of order e n . Suppose that S visits a signal carrier. Before moving to the next signal carrier, it returns to the same signal carrier many times with high probability. By doing this, S generates many 2's in the observations at short distance from each other. This implies: when in the observations we see a cluster of 2's, there is a good reason to believe that they all correspond to the same 2 in the underlying scenery. In this manner we can determine many return times of S to the same signal carrier. This enables us to make inference about ξ in the neighborhood of that signal carrier. In particular, we can precisely estimate the frequencies of ones of the different signal carriers visited. This allows us to estimate g(ξ n 0 ). The estimatorĝ is the desired decoder. The details are explained in Subsection 1.6. However, it is important to note, that between this simplified example and our general case there is only one difference: The signal carriers. In the general case we can no longer rely on the 2's and the signal carriers need to be constructed in a different manner. Everything elsefrom the definition of g andĝ up to the proof that the g-information reconstruction algorithm works with high probability -is exactly the same. (Note that the solution to our information recovery problem in the simplified 3-color case requires only five pages!) For the general case with a 2-color scenery and a jumping random walk, the main difficulty consists in the elaboration of the signal carriers. In Section 2, we define many concepts which are subsequently used for the definition of the signal carriers. Also there, some technical results connected to the signal carriers are proved. The signal carriers are defined in Section 3. The main goal of the paper is to prove that the g-reconstruction algorithm works with high probability (i.e. that the estimatorĝ is precise). For this, we define two sets of events: the random walk dependent events and the scenery dependent event. All these events describe typical behavior of S or ξ. In Section 3, we define the scenery dependent events and prove that they have high probability. In Section 4 the same is done for the events that depend on S. In section 5, we prove that if all these events hold, then the g-information reconstruction algorithm works, i.e. the event E n g works := {ĝ(χ m 2 0 ) ⊑ g(ξ m 0 )} holds. The results of Section 3 and Section 4 then guarantee that the g-information reconstruction algorithm works with high probability. This finishes the proof of Theorem 1.1.

3-color example
In this subsection, we solve the scenery reconstruction problem in a simplified 3-color case. We do not change the assumptions on S.

Setup
Recall that ξ m 0 and χ m 2 0 denote the piece of scenery ξ|[0, m] and the first m observations χ|[0, m], respectively. We aim to construct two functions Then (1.1) holds, implying that with high probability we can reconstruct g(ξ m 0 ) from the observations, provided that random walk S goes in m 2 steps from 0 to m.
Since this is not yet the real case, during the present subsection we will not be very formal. For this subsection only, let us assume that the scenery ξ has three colors instead of two. Moreover, we assume that {ξ(z)} satisfies all of the following three conditions: We define m = n 2.5 (1/P (ξ(0) = 2)). Because of b) this means n 2.5 exp(n/ ln n) ≤ m(n) ≤ n 2.5 exp(n).
The so defined scenery distribution is very similar to our usual scenery except that sometimes (quite rarely) there appear also 2's in this scenery. We now introduce some necessary definitions.
Letz i denote the i-th place in [0, ∞) where we have a 2 in ξ. Thus We make the convention thatz 0 is the last location before zero where we have a 2 in ξ. For a negative integer i < 0,z i designates the i + 1-th point before 0 where we have a 2 in ξ. The random variablesz i -s are called signal carriers. For each signal carrier,z i , we define the frequency of ones atz i . By this we mean the (conditional on ξ) probability to see 1 exactly after e n 0.1 observations having been atz i . We denote that conditional probability by h(z i ) and will also write h(i) for it. Formally: h(i) := h(z i ) := P ξ(S(e n 0.1 ) +z i ) = 1 ξ .
It is easy to see that the frequency of ones is equal to a weighted average of the scenery in a neighborhood of radius Le n 0.1 of the pointz i . That is h(i) is equal to (Of course this formula to hold assumes that there are no other two's in except the two atz i . This is very likely to hold, see event E n 6 below). Let . We now define some events that describe the typical behavior of ξ.
• Let E n 6 denote the event that in [0, m] all the signal carriers are further apart than exp(n/(2 ln n)) from each other as well as from the points 0 and m. By the definition of P (ξ(i) = 2), the event P (E n 6 ) → 1 as n → ∞.
• Let E n 1 be the event that in [0, m] there are more than n 2 + 1 signal carrier points. Because of the definition of m, P (E n 1 ) → 1 as n → ∞.
When E n 1 and E n 6 both hold, we define g(ξ m 0 ) in the following way: Conditional on E n 1 ∩ E n 6 we get that g (ξ m ) is an i.i.d. random vector with the components being Bernoulli variables with parameter 1/2. Here the parameter 1/2 follows simply by symmetry of our definition (to be precise, P (g i (ξ m i ) = 1) = 1/2 − P (h(i) = 1/2), but we disregard this small error term in this example) and the independence follows from the fact that the scenery is i.i.d. and g i (ξ m 0 ) depends only on the scenery in a radius Le n 0.1 of the pointz i and, due to E n 6 , the pointsz i are further apart than exp(n/2 ln n) > L exp(n 0.1 ). Hence, with almost no effort we get that when E n 1 and E n 6 both hold, then condition 3) is satisfied. To be complete, we have to define the function g such that 3) holds also outside E n 1 ∩ E n 6 . We actually are not interested in g outside E n 1 ∩ E n 6 -it would be enough that we reconstruct g on E n 1 ∩ E n 6 . Therefore, extend g in any possible way, so that g (ξ m 0 ) depends only on ξ m 0 and its component are i.i.d.

1.6.2ĝ-algorithm
We show, how to construct a mapĝ : {0, 1} n 2 → {0, 1} n and an event E n cell OK ∈ σ(ξ) such that P (E n cell OK ) is close to 1 and, for each scenery belonging to E n cell OK , the probability is also high. Note, when the scenery ξ is fixed, then the probability (1.3) depends on S, only.
The construction ofĝ consists of several steps. The first step is the estimation of the frequency of one's h(i). Note: due to E n 6 we have that in the region of our interest we can assume that all the signal carriers are further apart form each other than exp(n/(2 ln n)). In this case we have that all the 2's observed in a time interval of length e n 0.3 must come from the same signal carrier. We will thus take time intervals T of length e n 0.3 to estimate the frequency of one's.
Let T = [t 1 , t 2 ] be a (non-random) time interval such that t 2 − t 1 = e n 0.3 . Assume that during time T the random walk is close to the signal carrierz i . Then every time we see a 2 during T this gives us a stopping time which stops the random walk atz i . We can now use these stopping times to get a very precise estimate of h(i). In order to obtain the independence (which makes proofs easier), we do not take all the 2's which we observe during T . Instead we take the 2's apart by at least e n 0.1 from each other.
To be more formal, let us now give a few definitions. Let ν t 1 (1) denote the first time t > t 1 that we observe a 2 in the observations χ after time t 1 . Let ν t 1 (k + 1) be the first time after time ν t 1 (k) + e n 0.1 that we observe a 2 in the observations χ. Thus ν t 1 (k + 1) is equal to min{t|χ(t) = 2, t ≥ ν t 1 (k) + e n 0.1 }. We say that T is such that we can significantly estimate the frequency of one's for T , if there are more than e n 0.2 stopping times ν t 1 (k) during T . In other words, we say that we can significantly estimate the frequency of one's for T , if and only if ν t 1 (e n 0.2 ) ≤ t 2 − e n 0.1 . LetX t 1 (k) designate the Bernoulli variable which is equal to one if and only if χ(ν t 1 (k) + e n 0.1 ) = 1.
When ν t 1 (e n 0.2 ) ≤ t 2 − e n 0.1 we defineĥ T the estimated frequency of one's during T in the following obvious way:ĥ Suppose we can significantly estimate the frequency of one's for T . Assume E n 6 ∩ E n 1 hold. Then all the stopping times ν t 1 (e n 0.2 ) stop the random walk S at one signal carrier, sayz i . Because of the strong Markov property of S we get then that, conditional on ξ, the variables X t 1 (k) are i.i.d. with expectations h i . Now, by Höffding inequality, So that, with high probability,ĥ T is a precise estimate for h(i): The obtained preciseness ofĥ T is of the great importance. Namely, it is of smaller order than the typical variation of h(i). In other words, with high probability |h(i) − h(j)| is of much bigger order than exp(−n 0.2 /4), i = j. To see this, consider (1.2). Note that, for each z, is constant, and, conditional under the event that in the radius of L exp(n 0.1 ) are no more 2's in the scenery thanz i , we have that ξ(z i + z) are i.
Since our random walk is symmetric we get that z∈[−Le n 0.1 ,Le n 0.1 ] 1 4 (µ i (z)) 2 is equal to 1/4 times the probability that the random walk is back at the origin after 2e n 0.1 time.
By the local central limit theorem that probability is of order e −n 0.1 /2 . This is much bigger than the order of the precision of the estimation of the frequencies of one's, e −n 0.2 /4 . Since h(i) is approximately normal, it is possible to show that with high probability all frequencies h(0), h(1), . . . , h(n 2 +1) are more than exp(−n 0.11 ) apart from 1 2 . By the similar argument holds: If {z i } i∈I is the set of signal carriers that S encounters during the time [0, m 2 ], then for each pair i, j ∈ I, the frequencies of ones satisfy Let E n 3 be the set on which both statements (h(i)'s are more than exp(−n 0.11 ) apart from 1/2 and from each other) hold.

Define
E n cell OK := E n 1 ∩ E n 3 ∩ E n 6 . Since E n 1 , E n 3 and E n 6 all depend on ξ, only, so does E n cell OK . From now on we assume that E n cell OK hold and we describe theĝ-construction algorithm in this case: Phase I Determine the intervals T ⊆ [0, m 2 ] containing more than e n 0.2 two's (in the observations.) Let T j designate the j-th such interval. Recall that these are the intervals where we can significantly estimate the frequency of one's. Let K designate the total number of such time-intervals in [0, m 2 ].
Let π(j) designate the index of the signal carrierz i the random walk visits during time T j (due to E n 6 , the visited signal carriers are further apart than Le n 0.2 from each other and, hence, there is only one signal carrier that can get visited during time T j . Thus the definition of π(j) is correct.) Phase II Estimate the frequency of one's for each interval T j , j = 1, . . . , K. Based on the observations χ m 2 0 only, obtain the vector (ĥ T 1 , . . . ,ĥ T K ) = ĥ (π(1)),ĥ(π(2)), . . . ,ĥ(π(K)) .
Hereĥ(i) denotes the estimate of h(i), obtained by time interval T j , with π(j) = i.
The further construction of theĝ-reconstruction algorithm bases on an important property of the mapping π : {1, . . . , K} → Z.
Namely, with high probability π is a skip free walk, i.e. |π(j) − π(j + 1)| ≤ 1. Clearly, after being near to the pointẑ i , S moves to the neighborhood ofẑ i+1 orẑ i−1 (recall that on E n cell OK , 2's are rather far from each other). Say, it goes to the neighborhood ofẑ i+1 . The important property of S is that, with high probability, before moving to the vicinity of the next 2 (that is located inẑ i+2 orẑ i ) it visitsẑ i+1 sufficiently many times. This means that there exists a time interval [t 1 , t 2 ] of length e n 0.3 such that ν t 1 (e n 0.2 ) ≤ t 2 − e n 0.1 . For big n, this clearly holds holds, if, after visitingẑ i+1 once, before e n 0.3 − e n 0.1 steps, S visitsẑ i+1 again at least e n 0.21 times. This can be proven to hold with high probability. Hence, the random walk during time [0, m 2 ] is unlikely to go from one signal carrier to another without signaling all those in-between. By signaling those in-between, we mean producing in the observations for each signal carrierz i a time intervals of for which one can significantly estimate the frequency of one's h(i). In other words, with high probability, the mapping π is a skip-free random walk. In particular, π(1) ∈ {0.1}, i.e. π * ≤ 1, where π * := min{π(j) : j = 1, . . . , K}, π * := max{π(j) : j = 1, . . . , K}.
If S(m 2 ) = m, then by the event E n 1 , it holds π * > n 2 .
Phase III Apply clustering to the vector (ĥ T 1 ,ĥ T 2 , . . . ,ĥ T K ), i.e. define Formally there are K clusters. However, if E n 3 holds and for every T , the estimateĥ T satisfies (1.4), then for each different i, j either C i = C j or C i ∩C j = ∅. To see this note that |ĥ T j −ĥ T i | ≤ exp[−n 0.12 ] if and only if they estimate the same signal carrier. Indeed, ifĥ T j andĥ T i estimate the same signal carrier, then by (1.4), their difference is at most 2 exp[−n 0.2 /4] < exp[−n 0.12 ]. On the other hand, ifĥ T i andĥ T j estimate h(i) = h(j), respectively, then Hence the clusters C i and C j coincide if and only if π(i) = π(j), otherwise they are disjoint. Thus,f j is the average of all estimates of h(π(j)) and, therefore,f j is a good estimate of h(π(j)). Since C i and C j coincide if and only if π(i) = π(j), it obviously holds thatf i =f j if and only if π(i) = π(j). (1.5) After phrase III we, therefore, (with high probability) end up with a sequence of estimatorŝ f (z π(1) ), . . . ,f (z π(K) ) that correspond to the sequence of frequencies h(π(1)), . . . , h(π(1)). Or, equivalently, j →f j is a path of a skip-free random walk π on the set of different reals {f (z π * ), . . . ,f (z π * )}.

Real scenery reconstruction algorithm
We now present the so-called real scenery reconstruction algorithm -A R n . This algorithm is able to answer to the stated questions up to the (swift by) one element.
The algorithm works due to the particular properties of π and {f (z π * ), . . . ,f (z π * )}. These properties are: A1) π(1) ∈ {0, 1}, i.e. the first estimated frequency of one's,f 1 must be either an estimate of h(1) or of h(0). Unfortunately there is no way to find out which one of the two signal carriersz 0 orz 1 was visited first. This is why our algorithm can reconstruct the real scenery up to the first or last bit, only; A2) π(K) > n 2 . This is true, because we condition on S(m 2 ) = m and we assume that there are at least n 2 + 1 2-s in [0, m] (event E n 1 ); A3) π is skip-free (it does not jump); . . ,f K ) be the vector of real numbers such that the number of different reals inf is at least n 2 + 1. The vectorf is the input for A R n .
• From here on we proceed by induction on j : once R j is defined, we define R j+1 :=f s where s := 1 + max{j :f j = R j }.
• Proceed until j = n 2 + 1 and put The idea of the algorithm is very simple: Take the first elementf 1 off and consider all elements of the input vectorf that are equal tof 1 and find the one with the biggest index (the lastf 1 ). Let j 1 be this index. Then takef j 1 +1 as the first output. By A1),f 1 is eitherf (z 0 ) orf (z 1 ); by A2) and A3),f j 1 +1 ie eitherf (z 1 ) orf (z 2 ). Now look for the lastf j 1 +1 . Let the corresponding index be j 2 and takef j 2 +1 as the second output. By A2) and A3),f j 1 +1 is eitherf (z 2 ) orf (z 3 ) (depending whether the first output weref (z 1 ) orf (z 2 )). Proceed so n 2 times. As a result, on obtains one of the following vectors (f (z 1 ), . . . ,f (z n 2 )) or (f (z 2 ), . . . ,f (z n 2 +1 )).

This means
Phase IV Apply A R n tof . Denote the output A R n (f ) by (f 1 , . . . , f n 2 ). By (1.6) , Now recall that we are interested in reconstructing the g i (ξ m 0 ) := I [0,5) (h(i)) rather thanĥ(i). Thus, having estimates for h(z i ), namelyf (z i ), we use the obvious estimator for g i : Recall that because of E n 3 , with high probability all random variables h(1), . . . , h(n 2 + 1) are more than exp(−n 0.11 ) apart from 1 2 . Since exp(−n 0.11 ) is much bigger than the preciseness of our estimate, with high probability we havef (z i ) < 0.5 if and only if h(z i ) < 0.5. By (1.7) this meanŝ Hence, when E n cell OK holds, thenĝ is properly defined and the probability (1.3) is high. In particular, by choosing n big enough, it can be proven to be greater that 3 4 . Since we are not interested inĝ beyond E n cell OK , we extend the definition ofĝ arbitrary.

Whole truth about signal probabilities
In the previous section we considered the case where the scenery has three colors: {0, 1, 2}. The locations of the 2's where called signal carriers. The i-th such place was denoted byz i . In reality we have only two colors 0 and 1. Thus, we need to show that with 2 colors we also manage to define signal carriersz i in such a way that all of the following holds: a) Whenever the random walk passes by a signal carrier, we can recognize that the random walk is close to a signal carrier by looking at the observations (with high probability).
b) The probability to be induced in error by the observations, so that one infers that at a certain time one is close to a signal carrier when one is not, is small. This type of mistake never happens up to time m 2 .
c) When we pass a signal carrier we are able to estimate its frequency of one's with high precision (with high probability).
In the present section, we define and investigate an important concept that leads to the signal carriers: Markov signal probability.

Definitions
In this subsection, we define the main notions of the section: delayed signal probability, strong signal probability and Markov signal probability. We also give a few equivalent characterizations of these concepts, and we try to explain their meaning. In the end of the subsection we give a formal definition of the frequency of ones.
• Let D ⊂ Z and let ζ : D → {0, 1}. For example, ζ can be the scenery, ξ or the observations, χ. Let T = [t 1 , t 2 ] ⊂ D be an integer interval of length at least 3. Then we say that T is a block of ζ if and only if we have that We call t 2 − t 1 the length of the block T . The point t 1 is called the beginning of the block. For example, T is a block of ζ with length 4, if ζ|T = 01110.
• Let T = T (χ) ⊂ N be a time interval, possibly depending on the observations. For example, T can be a block of χ or T = [t, t + n 2 ] can be a time interval of length n 2 + 1 such that χ(t) = χ(t + 1) = · · · = χ(t + n 2 ). Let I ⊂ Z be an integer interval (a location set). We say that T was generated (by S) on I, if and only if ∀t ∈ T, S(t) ∈ I.
• We now define the delayed signal probability. To simplify the notations afterwards, define Fix z ∈ Z and let S z denote the random walk translated by z, i.e. for all t ∈ N, S z (t) := S (t) + z. We define the random variable δ d z in the following way: In other words, δ d z is the conditional probability (conditional on ξ) to observe only one color in the time interval [n 1000 − n 2 , n 2 ] if the random walk starts at z. We shall call δ d z delayed signal probability at z. During time n 1000 the random walk can not move more than Ln 1000 . Thus, δ d z depends only on the scenery ξ in the interval z − Ln 1000 , z + Ln 1000 .
We have that δ d z is a random variable which is measurable with respect to σ(ξ(s)|s ∈ I z ). Since the distribution of ξ is translation invariant, the distribution of δ d z does not depend on z.
Next we define the Markov signal probability at z. Roughly speaking, the Markov signal probability at z, denoted by δ M z , is the conditional (on ξ) probability to have (at least) n 2 + 1 times the same color generated on I z exactly n 1000 − n 2 after we observe n 2 + 1 times the same color generated on I z . In this formulation the part "after we observe a string of n 2 + 1 times the same color generated on I z " needs to be clarified. The explanation is the following: every time there is in the observations n 2 + 1 times the same color generated on I z , we introduce a stopping time ν z (i). The position of the random walk at these stopping times defines a Markov chain with state space I z . As we will prove later, this Markov chain {S(ν z (k))} k≥1 converges very quickly to a stationary measure, say µ z . So, by "M after we observe n 2 + 1 times the same color generated on I z " we actually mean: "M time after starting the random walk from an initial position distributed according to µ z ". Since the distribution of S(ν z (i)) converges quickly to µ z , δ M z is close to the probability of observing n 2 + 1 times the same color generated on I z exactly M time after time ν z (i). In other words, δ M z is close to the conditional (on ξ) probability of the event that we observe only one color in the time interval [ν z (i) + n 1000 − n 2 , ν z (i) + n 1000 ] and that during that time interval the random walk S is in I z . Thus (for k big enough) δ M z is close to: (2.4) The ergodic theorem then implies that on the long run the proportion of stopping times ν z (i) which are followed after M by n 2 + 1 observations of the same color generated on I z converges a.s. to δ M z . Actually, to make some subsequent proofs easier, we do not take a stopping time ν z (i) after each n 2 + 1 observations of the same color generated on I z . Rather we ask that the stopping times be apart by at least e n 0.1 .
In order to prove how quickly we converge to the stationary measure, we also view the explained notions in terms of a regenerative process. The renewal times will be defined as the stopping times, denoted by ϑ z (k), which stop the random walk at the point z − 2Le n 0.1 . To simplify some proofs, we also require that there is at least one stopping ν z (i) between ϑ z (k) and ϑ z (k + 1). Thus ϑ z (0) denotes the first visit by the random walk S to the point z − 2Le n 0.1 . We define ν z (1) to be the first time after ϑ z (0) where there happens to be n 2 + 1 times the same color generated on I z . Then, ϑ z (1) is the first return of S to z − 2Le n 0.1 after ν z (1) and so on. Let us give the formal definitions of all introduced notions.
• Let ϑ z (0) denote the first visit of S to the point z − 2Le n 0.1 . Thus • Let ν z (1) designate the first time after ϑ z (0) where we observe n 2 + 1 zero's or one's in a row, generated on I z . More precisely: Once ν z (i) is well defined, define ν z (i + 1) in the following manner: .. = χ t − n 2 and S(t − n 2 ), S(t − n 2 + 1), . . . , S(t) ∈ I z .
• Let ϑ z (k) denote the consecutive visits of S to the point z − 2Le n 0.1 provided that between two visits random walk S generates (at least once) n + 1 consecutive 0-s or 1-s on I z . More precisely, Basically, the definition above says: if ϑ z (k) is defined, we wait until we observe n 2 + 1 same colors generated on I z . Since S(ϑ z (k)) = z − 2Le n 0.1 , then the first n 2 + 1 same colors generated on I z can not happen earlier than e n 0.1 times after ϑ z (k). This means, the first n 2 + 1 same colors generated on I z can not happen earlier than e n 0.1 times after last stopping time ν z , say ν z (i) (this happens before ϑ z (k)). Thus, the first n 2 + 1 same colors generated on I z is actually ν z (i + 1). Observing ν z (i + 1), we just wait for the next visit of S to the z − 2Le n 0.1 . This defines ϑ z (k + 1).
• Let X z,i , i = 1, 2, . . . designate the Bernoulli variable which is equal to one if exactly after time M the stopping time ν z (i) is followed by a sequence of n 2 +1 one's or zero's generated on I z . More precisely, X z,i = 1 if and only if and • Define κ z (0) := 0. Let κ z (k) designate the number of stopping times ν z (k) occurring during the time from ϑ z (0) to ϑ z (k). Thus κ z (k) is defined by the inequalities: For all k, S(ϑ z (k)) = z − 2Ln 1000 . Hence, for all i, ϑ z (k) = ν z (i) and the inequalities above are strict.
• Define the following variables: Thus, Z z (k) is the number of stopping times occurring during the time interval from time ϑ z (k − 1) to time ϑ z (k). Note that Z z (k) ≥ 1, ∀k. The random variable X z (k) designates the number of such stopping times which, during the same time interval, have been followed exactly after time M by a sequence of n 2 + 1 0's or 1's generated on I z . Note that conditional on ξ the variables X z (1), X z (2), . . . are i.i.d. and the same holds for Z z (1), Z z (2), . . ..
• Fix ξ and z. Let Y i := S(ν z (i)), i = 1, 2, . . . denote the Markov chain obtained by stopping the random walk S by ν z (i). The state space of Y i is I z . Because of the nature of S, Y i is finite, irreducible aperiodic and, therefore, an ergodic Markov chain.
• Let µ z denote the stationary distribution of {Y k }. In the present section z is fixed, so we write µ. The measure µ is a discrete probability measure on I z , so µ = (µ(j)) j∈Iz . For each state, j ∈ I z define the hitting times τ j (l), l = 1, 2, 3, . . .. Formally, • We define: We call δ M z Markov signal probability at z.
In the following we give some equivalent forms of (2.5). Note that conditional on ξ, X z,i is a regenerative process with respect to the renewal κ z (k). Hence, conditioning on ξ, we have We count (up to time r) all sequences of length n 2 +1 of one's or zero's, generated on the interval I z according to the stopping times ν z (i), k = 1, 2, . . .. Among such sequences, the proportion of those sequences which are followed after exactly time M by another sequence of n 2 + 1 zero's or one's generated on the interval I z converges a.s. to δ M z , as r goes to infinity.
We now define the frequency of ones. To obtain the consistency with the Markov signal probability, we formally define the frequency of ones in terms of regenerative processes. However, we also derive the analogue of (2.11), which explains the meaning of the notion.
• Let U z,i = ξ(S(ν z (i) + e n 0.1 )) and define is called as frequency of ones at z. As in (2.6), conditioning on ξ, we have With the same argument as above, we get (2.12) Now, it is easy to see that in terms of U and S as in (2.11), i.e. U and S are independent, U has law µ z , we have (2.13)

Auxiliary results
In the present section we investigate the relations between δ M z and δ d z . Note that they only depend on the scenery ξ in the interval [z − Ln 1000 , z + Ln 1000 ]. In other words, The distribution of both δ M z and δ d z does not depend on particular choice of z. Hence, without loss of generality, in the following we consider the point z = 0, only.
We call a block big, if its length is bigger than n ln n .
Proposition 2.1. For any c δ ∈ [p M , 2p M ], the following statement hold: c If all blocks of ξ|[z − Ln 1000 , z + Ln 1000 ] are shorter than n/ ln n + 1, then δ d z < c δ . Formally: contains at most 0.5 ln n big blocks of ξ. More precisely: where E δ,z := [z − Ln 1000 , z + Ln 1000 ] has less than 0.5 ln n big blocks of ξ In order to prove Proposition 2.1, we use the following lemma. The proof of it can be found in [LMM04].
Lemma 2.1. There exists a constant a > 0 such that for each t, r ∈ N, for each subset I ⊂ Z, and for each j ∈ I and for every mapping ζ : Z → {0, 1}, the following implication holds if all blocks of ζ in I are shorter or equal to r, then Proof that c holds: Without loss of generality assume z = 0. Suppose that the length of all blocks of ξ|[−Ln 1000 , Ln 1000 ] is at most n/ ln n. Let I := [−Ln 1000 , Ln 1000 ]. Denote δ(l) = δ 0 (l), where δ 0 (l) is as in (2.7). If the all the blocks in I are not longer than n/ ln n we get by Lemma 2.1 that for all j ∈ I δ(j) ≤ exp − an 2 (n/ ln n) 2 = n −a ln n .
By (2.10) we get that 14) The expression on the right side of the last inequality is of smaller order than any negative polynomial order in n. By the local central limit theorem p M is of order n − M 2 . Thus, for n big enough Proof that a holds: Without loss of generality assume z = 0. Define the event Thus, So, The dominating term in the product on the right side (2.15) is exp (− ln (2) n/ ln n). Hence, for n big enough, the expression on the right side of (2.15) is smaller than exp(− ln(1.5)n ln n ).
Proof that b holds: It suffices to prove that Now, because of the central limit theorem, there is a constant b > 0 not depending on n, such that for all j ∈ [n/3, 2n/3] we have: By the local central limit theorem, again, for all j ∈ [n/3, 2n/3] we have, for n big enough, that Using (2.10) and (2.16) we find that when E holds, then For n big enough, obviously the right side of (2.17) is bigger than 2p M . This proves E ⊆ {δ d 0 ≥ 2p M }. Furthermore, we have that P (E) = 0.5 n . The inequality 0.5 n > exp(−n) finishes the proof.
Proof that d holds: Without loss of generality assume z = 0. For a block T , the point inf T is called the beginning of the block. Let t 1 , t 2 , . . . denote the beginnings of the consecutive big blocks in [−Ln 1000 , ∞). Define t 0 := −Ln 1000 and g i := t i − t i−1 , i = 1, 2, . . . . So, g i measures the distances between consecutive big blocks. Clearly, g i -s are i.i.d. Note, Hence, P (E c δ,0 ) ≤ P (g i ≤ 2Ln 1000 ) 0.5 ln n = 2Ln 1000 0.5 ln n (0.5) 0.5n . Combining this with b, we get

Proof of Lemma 2.2
In the present subsection we prove Lemma 2.2. To the end of the section we assume z = 0. At first we define fences.

Fences
An The Applying b, we get Define Hence C consists of all pieces of sceneries, η, with the following properties: δ d 0 (η) is bigger than c δ , the number of big blocks is less than 0.5 ln n and the gaps between the breakpoints of the consecutive fences in I is at most Cn.
Let η ∈ C and let z 0 , z 1 , . . . , z N be the breakpoints of consecutive fences (restricted to I) of η. Since η ⊂ E b , we have N ≥ 2Ln 999 . Now partition the interval I as follows: We shall call the partition (2.19) the fence-partition corresponding to η. The fences guarantee that any block of η, that is longer than L is a proper subset of one interval I k . Since , there is at least one and at most 0.5 ln n big blocks. Let I * k , k = 1, . . . N * , N * ≤ 0.5 ln n denote the k − th interval containing at least one big block. Similarly, let I o k , k = 1, . . . , N + 1 − N * denote the k − th interval with no big blocks. Clearly, most of the intervals I k are without big blocks, in particular k l(I o k ) > Ln 1000 . Define To summarize -to each η ∈ C corresponds an unique fence-partition, an unique labelling of the interval according to the blocks, and, therefore, unique j o . We now define a mapping B : C → O as follows: We also define the corresponding permutation: Since all big blocks of η are contained in the intervals I k , the mapping B keeps all big blocks unchanged, and just moves them closer to the origin. The mapping B is clearly not injective. However, B(η 1 ) = B(η 2 ) implies that the fence-partitions corresponding to η 1 and η 2 consists of the same intervals, with possibly different order. Also the intervals with big blocks (marked with star) are the same, but possibly differently located. Moreover, the ordering of the similarly marked blocks corresponding to η 1 and η 2 are the same (i.e. if the 8-th interval, I 8 , of the partition corresponding to η 1 is the 20-th interval, I 20 , of the partition corresponding to η 2 , then their marks are the same. If I 8 in its partition is the seventh interval with o (I 8 = I o 7 in the partition corresponding to the η 1 ), then the same block in the second partition must be also the seventh interval with o (I 20 = I o 7 in the partition corresponding to η 2 ). Therefore, the partition corresponding to η 1 and η 2 differ on the location of the star-intervals, only. Since the number of intervals is smaller than 2Ln 1000 and the number of star-intervals is at most 0.5 ln n, the number of different partitions with the properties described above, is less than (2Ln 1000 ) 0.5 ln n . This means |B(C)|(2Ln 1000 ) 0.5 ln n > |C|.
By (2.18) and d) of Proposition 2.1, we get: provided n is big enough. These relations yield: The lemma is proved.
Let J * = Π η (I * ), i.e. J * is the union of all intervals with big blocks in the new location. The length of I * (and, therefore, that of J * ) is at most 0.5Cn ln n. Thus, J * is at most Cn+0.5Cn ln n from the origin. Let n be so big, that Cn + 0.5Cn ln n ≤ n 2 . Then, j ≤ n 2 for each for each j ∈ J * . Denote by: Now from (2.22) and (2.23) we get Proposition 2.3. For any ς ∈ B(C) we have Proof. We use the notation and the results of the previous proof. By the representation (2.8) we have where µ = {µ(i)} i∈I is the stationary measure of Y k = S(ν 0 (k)), k = 1, 2, . . ..
Use local central limit theorem (CLT in the sequel) to estimate (2.25) with d, c being constants not depending on n.
Hence, because of (2.24), (2.22) and (2.25) (2.26) We now estimate µ(J * ). We shall show that Estimation of µ(J * ) Fix an j ∈ I and define ν as the first time after e n 0.1 when n 2 + 1 consecutive 0-s or 1-s are generated on I. Formally, Thus, it suffices to estimate P (S j (ν) ∈ J * ).
At first note that by (2.22) and (2.23,) we get j∈J * δ η 0 (j) → 1. Since |J * | ≤ n 2 (and n is big enough), we deduce the existence of j * ∈ J * such that Then, because of the fences we have: Now, let τ k be the k-th visit after time e n 0.1 − n 2 to the interval I. Let τ * k be the k-th visit after time e n 0.1 − n 2 to the point j * . Define the events {S j (τ k + i) = j * }, k = 1, 2, . . .
We consider the events The event E 1 ensures that within the first n 2020 visits of S j to I no consecutive 0's or 1's were generated on I\J * . The event E 2 ensures that before time τ n 2020 − n 2 the random walk visits at least n 10 times the point j * . Finally, the event E 3 ensures that during these n 10 visits of j * , at least one of them is a beginning of n 2 consecutive 0's or 1's. If these events hold, then ν ≤ τ n 2020 and S j (ν) ∈ J * . Thus Next, we give upper bounds for the probabilities P (E 1 ), P (E 2 ), P (E 3 ).
1) Note that: E c 1 ⊂ ∪ n 2020 k=1 F k , implies: P (E c 1 ) ≤ n 2020 k=1 P (F k ). For each k, There is no big blocks in I\J * , hence by the argument of c: implying that: P (E c 1 ) ≤ n 2020−a ln n . 2) To estimate P (E 2 ) we use the Höffding inequality. By central limit theorem there exists a constant p > 0 not depending on n such that P (F ′ k ) ≥ p. Also note that F ′ k and F ′ l are independent if |k − l| ≥ n 2000 . Hence, the set {F ′ k }, k = 1, . . . , n 2020 contains a subset {F ′ k i } i = 1, . . . n 20 consisting of independent events. Let X i := I F ′ k i . Now, τ n 2018 + n 2000 ≤ τ n 2019 ≤ τ n 2020 − n 2 , if n is big enough. This means Now, when n is big enough, we have . . , n 7 be a subset of {F * k } consisting on independent events, only. By (2.27), P (F * k ) > 1 n 3 , ∀k. Now (2.28) The right side of (2.28) is smaller than (0.5) n 4 if n is big enough.

Corollaries
We determine the critical value c r . Since we choose it within the interval [p M , 2p M ], it has all properties stated in Proposition 2.1 and Lemma 2.2. However, we also have to ensure that with high probability the signal probabilities δ d z and δ M z are significantly away from c r . By "significantly" we mean that the difference between these probabilities and c r is bigger than a polynomially small quantity in n. This polynomially small quantity will be denoted by ∆. Thus, c r must be properly chosen and that will be done with the help of Corollary 2.2.
At first, some preliminary observations.
Proof. We do the proof by contradiction. Assume on the contrary that there exists no interval is an interval of length l, by assumption: Now, by b) of Proposition 2.1: . (2.30) Since (1 − 1 n j ) n j < e −1 , we have (1 − 1 n j ) n j+2 < e −n 2 . Thus, (2.30) implies e −n < e −n 2 -a contradiction.
Proof. Suppose that such a (sub)interval does not exists. Then follow the argument of the previous proof to get For n big enough, the right side of (2.34) is bigger than e −2n . This contradicts (2.33).
The following corollary specifies c r and ∆. such that, for n big enough, simultaneously, We now consider the interval [a, a+b 2 ]. Note that: Hence, (2.36) holds. Since u = c r − ∆, we also have that (2.37) holds. It only remains to show that the chosen c r also satisfies (2.35).

Scenery-dependent events
In the present section we define and investigate the signal points and Markov signal points. We show that with high probability the location of the signal points follows certain clustering structure. This structure gives us the desired signal carriers in the 2-color case.

Signal points
We are now going to define the Markov signal points, strong signal points and signal pointsthese are the location points, where the corresponding signal probabilities are above the critical value c r . The Markov signal points form the core of the signal carriers, the (strong) signal points will be used in our proofs. In an oversimplified way, we could say that the Markov signal points are places in the scenery ξ where the conditional probability to see in the observations some rare unusual pattern is above c r . The unusual pattern is basically a string of n 2 , zero's or one's.
In the present subsection, with the help of the signal points, we define many other important notions, and we also investigate their properties.
In the following, ∆ and c r are as in Corollary 2.2. In particular, • We call a Markov signal point z regular, if δ M z > c r + ∆. is called the neighborhood of z. We say that the neighborhood of z is empty, if N z does not contain any block of ξ longer than n 0.35 . Thus, {N z is empty } ⊂ σ(ξ i , i ∈ N z ). • Let z j,k denote the midpoints of I k,j . Hence z j,1 = j, z j,2 = j + 2Ln 1000 + 1, . . . , z j,k = j + 2kLn 1000 + (k − 1).
are independent with the same probability p.
• Let k ′ denote the integer valued random variable that shows the index of the first interval I k,0 which has its midpoint being a Markov signal point. By such a counting we disregard the first interval. Thus, k ′ > 1 and, formally, k ′ is defined by the relations Clearly, k ′ − 1 is a geometrical random variable with parameter p and, hence, Ek ′ = 1 p + 1.
• Let Z be the location of the first Markov signal point after 2Ln 1000 . Recallz 1 is the location of the first Markov signal point after 0. Note, that for each i ≥ 0, we have • Take m(n) = n 2.5 EZ .
By provided n is big enough.
• Next, we define the random variables which we are using later: i >cr−∆} , z = 0, 1, 2, . . . . Thus, X z indicates, whether z is a Markov signal point or not. The random variables X z are identically distributed with mean p.
• We estimate the number of Markov signal points in [0, cm], where c > 1 is a fixed integer, not depending on n . For this define: Thus, when E 0 holds, the interval [0, cm] contains at most n 10000 Markov signal points.

Scenery-dependent events
Next, we describe the typical behavior of the signal points in the interval [0, cm]. Here c > 1 is a fixed integer, not depending on n. Among others we show that, with high probability, for each signal carrier pointz i in [0, cm], the corresponding frequency of ones, h(z i ), vary more than e −n 0.11 (eventsĒ n 3 andĒ n 4 below). We also show that, with high probability, all signal points in [0, cm] have empty neighborhood.
All the properties listed below depend on the scenery ξ only. Therefore we refer to them as the scenery dependent events.
We now define all scenery dependent events,Ē n 1 , . . . ,Ē n 9 and prove the convergence of their probabilities. All the events will be defined on the interval [0, cm], where c > 1 is a fixed integer. Thus, if a point z is such that N z ∈ [0, cm], by the neighborhood of z, we mean N z ∩ [0, cm]. This meansĒ n i ∈ σ(ξ z : z ∈ [0, cm]). The exact value of c will be defined in the next chapter (in connection with the event E n 2,S ). During this chapter, c is assumed to be any fixed integer bigger than 1. Since E s ∩ E n 1a ⊂Ē n 1 , it suffices to show that P (E n 1a ) → 1. To see this, we use the Markov inequality: For each z, the events {N z is empty} and {δ z > c r − ∆} are independent. Thus, for each z, We obviously have P (N z is empty) = P (N o is empty) and For each z, From (2.36) of Corollary 2.2 we have: Thus, from (3.7) P (Ē nc 5 ) ≤ cmpn −10 5 ≤ c(n 2.5 EZ +1)pn −10 5 = c3Ln 1002.5−100000 +cpn −10 5 → 0, as n → ∞.
In the next subsection we present some technical preliminaries related to the proof.

Some preliminaries
Let S be the symmetric random walk with span 1. Define: p N (k) = P (S(N ) = k). The random walk S has lattice +\ − z,z ∈ Z; its variance is σ 2 .
Let µ be a probability distribution on {−t N , −t N + 1, . . . , 0, . . . , t N − 1, t N }. Consider the convolutions If p N (k) ≥ p N (k + 1) for all k ≥ 0, then for each k > t N , we have the bounds In this case, And from (3.16), taking j = t N we may deduce that: Generally, choose an atom λ := µ j > 0. Then ). (3.21) In particular, from (3.21) follows: Suppose that arrays u k := u N (k) and v k := v N (k), t N < k ≤ LN + t N both satisfy (3.22). Then Let us make one more observation. Since exp( for each N big enough. Thus, there exists a constant c > 0 such that Take λ as previously. Then Hence there exists C > 0: u(l) ≥ C √ N ∀l such that |l + j| ≤ 3t N . In particular Define the random variables z 1 , z 2 , . . . as follows: z 1 is the first Markov signal point in [0, ∞), z k is the first Markov signal point in [z k−1 + e n 0.3 , ∞). Note that a.s. there are infinitely many such points.
From the signal carrier part we know that, if each Markov signal point in [0, cm] has empty neighborhood, i.e.Ē n 2 holds, then they form clusters which have radius at most 2Ln 1000 and lie at least e n 0.3 apart from each other. In this case all signal carrier points in [0, cm] coincide with the z i 's defined above. We define the event: Then: Let z i , z j , i = j. For simplicity denote them as z and z ′ Let ǫ n := exp(−n 0.11 ).
f n := z+L(n 1000 +e n 0.1 ) Note that conditioning on ξ n , the coefficients u n (k) become constants.
(More precisely, f n has the same distribution as Hence where (u n (k)) are the fixed coefficients of type (3.17) (with N = e n 0.1 , b = 10000). Now the Berry-Esseen inequality for independent random variables (see, [Pet95], Thm 3, p.111) states: ). Here Φ stands for the standard normal distribution function.
• In the following we consider the scenery dependent events defined on [−cm, cm]. For this, we define the eventsẼ n i , i = 1, . . . , 9, whereẼ n i is defined exactly asĒ n i , with [−cm, 0] instead of [0, cm].
• Finally, we define the events: E n i :=Ẽ n i ∩Ē n i . The results of the present section show that ∀ i = 1, . . . , 9,

What is a signal carrier?
Let us briefly summarize the main ideas of the previous sections.
A signal carrier is a place in the scenery, where the probability to generate a block of n 2 + 1 times the same color is high. However, it is clear that such a place can not be too small. In the 3-color example the signal carrier depends on only one bit of the scenery. In the 2-color case, it takes many more bits to make the scenery (locally) atypical. We saw in Proposition 2.1, that for z to be a signal point, it is necessary that the interval I z contains at least one big (longer than n/ ln n) block of ξ. Thus, if a point z is a (Markov, strong) signal point or not, depends on ξ|I z .
If z is a signal point, then the scenery ξ is atypical in the interval I z : δ d z is high. Thus, signal points would be our candidates for the signal carriers, if, for each z, we could estimate δ d z . The latter would be easy, if we knew when the random walk visits z. Then just take all such visits and consider the proportion of those visits that were followed by n 2 + 1 same colors after M steps. Unfortunately, we do not know when the random walk S visits z. But we do know (we observe) when S generates blocks with length at least n 2 . Thus we can take these observations (times) as the visits of (the neighborhood of) z and estimate the probability of generating n 2 + 1 times the same color, M steps after previously observing n 2 + 1 times the same color. This idea yields the Markov signal probability. The problem now is to localize the area where the random walk (during a given time period) can generate n 2 + 1 times the same colors in the observations. If this area was too big, we could neither estimate the Markov signal probability nor understand where we are. To localize the described area, we showed (event E n 2 ) that signal points have empty neighborhood. In the next section we shall see that the probability to generate a block of n 2 + 1 times the same color on the empty neighborhood is very small. This means, if S is close to a signal point z, then, with high probability, (and during a certain time period) all strings of n 2 + 1 times the same colors in the observations are be generated on I z . The fact that all signal points have also empty borders (events E n 8 and E n 9 ) makes the latter statement precise. Thus, a Markov signal point seems to be a reasonable signal carrier. But which one? Note, if z is a Markov signal point, i.e. I z contains at least one big block, then, very likely the point z + 1 is a Markov signal point, too. In other words, Markov signal points come in clusters. However, when E n 2 holds, then each point in such a cluster has empty neighborhood. On the other hand, for z to be a Markov signal point, it is necessary to have at least one big block of ξ in I z . This means that the diameter of every cluster of Markov signal points is at most 2Ln 1000 . The distances between the clusters are at least L(e n 0.3 − n 1000 ). Hence, in 2-color case one can think of signal carriers as clusters of Markov signal points (provided E n 2 holds, but this holds with high probability). However, to make some statements more formal, for each cluster we have one representator, namely the signal carrier point. Since the diameters of the clusters are at most 2Ln 1000 , our definition of signal carrier points ensures that different signal carrier points belong to different clusters. If the cluster is located in [0, ∞), then the signal carrier point is the left most Markov signal point in the cluster; if the cluster is located in (−∞, 0), then the signal carrier point is the right most Markov signal point in the cluster. The event E n 7 ensures that there are no Markov signal points in the 2Ln 1000 -neighborhood of 0, soz 1 andz 0 belong to different clusters, too. The exact choice of a signal carrier point is irrelevant. However, it is important to note that given a cluster, everything that makes this cluster a signal carrier cluster (namely, the big blocks of scenery) is inside the interval Iz, wherez is the signal carrier point corresponding to the cluster. In particular, all blocks in the observations that are longer than n 2 will be generated on Iz. This means that the signal carrier points,z i (or the corresponding intervals Iz i ) serve as signal carriers as well. At least, if we are able to estimate δ M z i with great precision. This is the subject of the next section.

Events depending on random walk
In the previous section we saw: if all scenery dependent events hold, then the signal carrier points are good candidates for the signal carriers. In this case the signal is an untypically high Markov signal probability. Hence, to observe this signal, we must be able to estimate the Markov signal probability. In the present section we define these estimators and in the next section we will see that they perform well, if the random walk S behaves typically. We describe the typical behavior of S in terms of several events depending on S. The main objective of the present section is to show that the (conditional) probability of such events tends to 1 as n tends to infinity.

Some preliminaries
As argued in Subsection 3.4, the main idea of the estimation of the Markov signal probability is very simple -given a time interval T , consider all blocks in the observations χ| T that are bigger than n 2 . Among these observations calculate the proportions of such blocks, that after exactly M steps, are followed by another such block. The time interval used for this estimation must be big enough to get a precise estimate but, on the other hand, it must be in correspondence with the size of an (empty) neighborhood. Recall that the neighborhood N z consisted of two intervals of length Le n 0.3 . Hence, the optimal size of the interval T is e n 0.3 .
We now define the necessary concepts related to the described estimate -stopping times (that stop when a string of at least n 2 +1 times the same color is observed) and the Bernoulli variables that show whether the stopping times are followed (after M step) by another such string or not. For technical reasons after stopping the process, we wait at least e n 0.1 steps until we look for the next block.
• Let X t,i be the Bernoulli random variable that is one if and only if: • We define some analogues ofν t and X t . Let z ∈ Z and t ∈ N. Let ν z,t (1) designate the first time after t where we observe n 2 zero's or one's in a row, generated on the interval I z . More precisely: ν z,t (1) := min s > 0 χ (s) = χ (s − 1) = · · · = χ s − n 2 S(j) ∈ I z , ∀j = s − n 2 , . . . , s .
• Let X z,t,i , i = 1, 2, . . . designate the Bernoulli variable which is equal to one if exactly after time M the stopping time ν z,t (i) is followed by a sequence of n 2 + 1 one's or zero's generated on I z . More precisely, X z,t,i = 1 if and only if χ(ν z,t (i) + M ) = χ(ν z,t (i) + M + 1) = · · · = χ(ν z,t (i) + n 2 ) and gives a very precise estimate of δ M z . The problem is that the random variables X z,t,i and, hence, the estimateδ M z,t is a priori not observable. This is because we cannot observe whether a string of n 2 + 1 times the same color in the observations is generated on I z or not. Thus, we can not observe neither ν t,z (i) nor X t,z,i . However, the event E n 3,S , stated below, ensures that with high probabilityδ M z,t is the same asδ M T , provided that during the time interval T , the random walk S is close to z (the sense of closeness will specified later).
• We define the estimates for the frequency of ones. Again, we define a general, observable, estimate:ĥ t and its theoretical, a priori not-observable counterpart:ĥ z,t . • Finally, we define the stopping time that stop the walk, when a new signal carrier is visited.
Let . . . ,z −1 ,z 0 ,z 1 , . . . denote the signal carrier-points in R. Denote I i := I z i and let ρ(k) denote the time of the k-th visit of S to one of the intervals I i in the following manner: when an interval I i is visited, then the next stop is on a different interval.
More precisely, let ρ(0) be the first time t ≥ 0 such that S(t) ∈ ∪ i I i . Denote I(ρ(k)) the interval I i visited at time ρ(k). Then define ρ(k) inductively:

Random walk-dependent events
In this section, we define the events that characterize the typical behavior of the random walk S on the typical scenery on the interval [−cm, cm]. The (piece of) scenery ξ|[−cm, cm] is typical if it satisfies all the scenery-dependent events E n i , i = 1, . . . , 9. Recall, that the events E n i are the same as the eventsĒ n i defined in Section 4.2 with [0, cm] replaced by [−cm, cm]. Also recall that c > 1 is an arbitrary fixed constant not depending on n, and m = n 2.5 EZ . Hence, throughout the section we consider the sceneries belonging to the set: Clearly, E cell OK depends on n. We know that P (E cell OK ) → 1 if n → ∞. Let P (·|ψ) denote P (·|ξ = ψ). We now list the events that describe the typical behavior of S. The objective of the section is to show: if n is big and ψ n :=: ψ ∈ E cell OK then all listed events have big conditional probabilities P ψ . The events depending on the random walk are: We now estimate the conditional probabilities of all listed events. In most cases we prove statements like P ψ (E n j,S ) → 1. This means: for an arbitrary sequence ψ n ∈ E n cell OK , we have: lim n→∞ P (E n j,S |S(m 2 ) = m, ξ = ψ n ) = 1.
Proof. At first note, that, for each n, the event E n 2,S is independent of the scenery ψ. Thus, . We now find c, not depending on n such that P ψ (E nc a (c)), P ψ (E nc b (c)) ≤ ǫ 2 . Let us define the stopping time ϑ: Let for all j ∈ 1, . . . , L: Our random walk S is symmetric. By the reflection principle, for all j ∈ 1, . . . , L, we have: p j = P (S(m 2 ) = cm + j + (cm + j − m) = 2cm + 2j − m, ϑ ≤ m 2 and S(ϑ) = cm + j).
Thus p j ≤ P S m 2 = 2cm − m + 2j and By LCLT, for big m, the right side of (4.7) can be made arbitrary small in comparison to P S m 2 = m by taking c big enough. In other words, there exists c , not depending on n such that:
Note, that the choice of c does not depend on n. From now on, we fix c such that Proposition 4.1 holds with ǫ > 1 8 . This particular c is used in the definition of all scenery-dependent events and, therefore, in the definition of E cell OK as well as in the definitions E n 4,S , E n 5,S .
In what follows, we often use these versions of the Hoeffding inequality: Let X 1 , . . . , X N be independent random variables with range in [a, b]. Let S N denote their sum. Then: (4.8) For our random walk, this gives: for some d ′ , d > 0.
We also use the following results: let X 1 , . . . , X N be i.i.d. random variables with mean 0 and finite variance where W t is standard Brownian motion. It is well-known that ∀x > 0, P (sup 0≤t≤1 W t ≤ x) = 2Φ(x) − 1.
The right side of the last inequality tends to 0 if n → ∞. Relation (4.13) now finish the proof.
In Section 2.1 the stopping times ϑ z (k), ν z (i) as well as the random variables X z,i were used to define the random variables κ z (k), X z (k) and Z z (k). The latter were used to define δ M z . Then we fix an arbitrary time t and define the counterparts of all the above-mentioned stopping times and random variables starting from t. In Section 4.1 we already defined the t-counterpart of ν z (i) and X z,i , namely ν z,t (i), and X z,t,i , i = 1, 2, . . .. Recall that in the definition of ν z,t (1), the starting point ϑ z (0) was replaced by t, the induction for ν z,t (i) is the same as the one for ν z (i), i = 2, 3, . . .. The Bernoulli random variables X z,t,i were defined exactly as X z,i with the stopping times ν z,t (i) instead of the ν z (i)'s.
• Let ϑ z,t (0) = t and let We use ϑ z,t (k) to define the t-analogues of κ z , Z z and X z .
We are now going to show that for each ξ, t, z, the first e n 0.2 observations of X z,t,i are enough to estimate δ M z (ξ) very precisely, i.e.δ M z,t is close to δ M z .

Second step
We now show that P (E c 7,a ), P (E c 7,b ) and P (E c 7,c ) are of order o(exp(−n 1000 )).
Taking t = e n 0.1 (4.25) yields: To estimate P (E 7,b ) and P (E 7,c ) we use the Hoeffding inequality. Fix l ∈ [ e n 0.2 a , e n 0.2 ]. By (4.8) we have: On the other hand, since X a k , k ≥ 2 are i.i.d., we have: Thus, where K = 2 36 . Now, and (4.28) The same bound holds for P (E c 7,c ).
Fix ψ n ∈ E n cell OK .
Define the following events: S visits I(ρ(k)) more than e n 0.22 times    k = 0, 1, . . . and E a := ∩ 25000 k=1 E a (k). Also define Now, clearly, on E a (k) we have τ k (e n 0.21 ) ≤ ρ(k) + e n 0.3 − 2e n 0.1 . Thus E n 5,S holds, if . We are now proving that P ψ (E c a ) → 0 and P ψ (E c b ) → 0.
By the Hoeffding inequality we obtain that for a constant K > 0: Proof that P ψ (E c a ) → 0 This proof is a little tricky because unlike the other proofs we have that P (E a |ψ n ) is much bigger than P (S(m 2 ) = m).
Let L = n 100000 and consider the event: Here and in the rest of the proof we assume (without loss of generality) that all ratios and exponents are integers. Also define The event E c means that no stopping time ρ(k) occurs in the time- S visits I(ρ(k)) more than e n 0.22 times We show that the probability P (E a |E n 1,S , ψ n ), can be very well approximated by the probability P (E * a |C, ψ n ) and the latter goes to 0 when n → ∞. We proceed in three steps. 2) Second, use the inequalities: P (E * c a ∩ E n 1,S ∩ C|ψ n ) ≤ P (E c a ∩ E n 1,S ∩ C|ψ n ) ≤ P (E * c a ∩ E n 1,S ∩ C|ψ n ) + P (E c c ∩ E n 1,S |ψ n ).

Combinatorics of g andĝ
In this section we show: if all scenery dependent events and random walk dependent events hold, then our estimatesδ M T andĥ t are precise. This means, we can observe our signals and, just like in our 3-color example, we can estimate the g-function.
Let us first give the definition of the g-function in the 2-colors case.

Definition of g
In this subsection we give a formal definition of the function The function g depends on n. When n is fixed, we choose m = n 2.5 EZ , where the random variable Z is the location of the first Markov signal point after 2Ln 1000 in ξ. We consider the signal carrier pointsz 1 ,z 2 , . . . , in [0, m]. Define the following subset of {0, 1} m+1 : E * := {ψ ∈ {0, 1} m+1 :z 1 (ψ) ≥ L(e n 0.1 + n 1000 ),z n 2 +1 ≤ m − L(e n 0.1 + n 1000 )}.
Here,z i (ψ) = ∞, if the piece of scenery ψ has less than i signal carrier points.
The algorithm for computingĝ has 5 phases and it differs from theĝ-reconstruction algorithm for the 3-color case (Subsection 1.6) by the first step, only. The rest of the construction is the same.

Main proof
Next, we prove the main result: when all previously stated events hold, then theĝ-algorithm works, i.e.ĝ(χ m 2 0 ) ⊑ g(ξ m 0 ). Recall E n cell OK = ∩ 9 i=1 E n i . Similarly define the intersection of the random walk dependent events: E n S := ∩ 8 i=1 E n i,S . Finally, let E g−works be the event thatĝ works, i.e.: E g−works := ĝ(χ m 2 0 ) ⊑ g(ξ m 0 ) .

(5.4)
At first we show that step 1 in the definition ofĝ works properly, i.e. a time interval T is selected (i.e.δ M T > c r ) only if during the time T the random walk is close to a unique signal carrier pointz. The closeness is defined in the following sense: we say that during time period T , the random walk S is close to z, if there exists s ∈ T such that S(s) ∈ I z . is equal toĝ(ψ) up to the first or last bit.
Let χ m 2 0 be the observations. Apply theĝ-construction algorithm. 1) At the first step we pick the intervals T 1 = [t 1 , t 1 + e n 0.1 ], . . . , [t K , t K + e n 0.1 ] such that for each j,δ M T > c r , j = 1, . . . , K. By Lemma 5.1 we know that each interval T j corresponds to exactly one signal carrier point, sayz π(j) .
All these properties hold because of E n 4,S ∩ E n 5,S . Indeed, during the time interval [0, m 2 ] the random walk starts at 0 and, according to the event E n 1,S , ends at m. Letz 1 . . . ,z u denote all signal carrier points of ψ in [0, m]. By E n 1 , u > n 2 . The maximal length of a jump of S is L and, therefore, on its way, S visits all intervals Iz 1 , . . . Iz u . Recall that the stopping times ρ(k) denote the first visits of the new interval (the first visit of the next interval, not necessarily new for the past). By E n 4,S ∩ E n 5,S , for each k such that ρ(k) < m 2 we have: there is at least e n 0.2 stopping timesν ρ(k) (i) in T := [ρ(k), ρ(k) + e n 0.3 − e n 0.1 ]. Letz be the signal carrier point such that S(ρ(k)) ∈ Iz. Thus the assumptions of Proposition 5.1 hold andδ M T =δ M z,t . Moreover, by (5.6) we have thatδ M T > c r , i.e. the interval T will be selected in the first step of theĝ-reconstruction. To summarize: the random walk starts at 0, by convention the first signal carrier point in [0, ∞) isz 1 , the biggest signal carrier point in (−∞, 0] isz 0 . From Lemma 5.1 we know -during T 1 , S must be close to a signal carrier point. On the other hand [ρ(0), ρ(0) + e n 0.3 ] is the first time interval, during which S is close to a signal carrier point. We know that this interval will be selected. Hence π(1) ∈ {0, 1}.
On its way S visits all signal carrier interval Iz 1 , . . . Iz u . Right after the first visit of a new signal carrier, ρ(k), the random walk produces an interval T = [ρ(k), ρ(k) + e n 0.3 ] that will be selected. Together with Lemma 5.1 the latter yields that π is skip-free.
Recall thatz u is the last signal carrier point in [0, m]. Thus, the last signal carrier interval S visits during [0, m 2 ] isz u orz u+1 . By E n 7 we know thatz u lays in [0, m − Le n 0.3 ]. Hence, if S(ρ(k)) ∈ Iz u , then [ρ(k), ρ(k) + e n 0.3 ] will be selected. We get that the last selected interval corresponds to the signal carrier that is at leastz n 2 +1 . Thus π(K) ≥ n 2 + 1.
The rest of the algorithm was already explained in Subsection 1.6.3. However, in the following we give a bit more formal explanation.
Proof of Theorem 1.1 Fix c > 0 such that Proposition 4.1 holds for ǫ = 1 8 . Use this particular c to define all scenery dependent events as well as all random walk-dependent vents.
The intersection of all scenery-dependent events is E n cell OK . In Section 3.2, we proved that P (E n cell OK ) → 1. Hence 1) holds. Now consider the event E n S . Use Theorem 5.2 to find the integer N 1 < ∞ such that for each n > N 1 , (5.4) hold. Then, for each n > N 1 , ψ n ∈ E n cell OK we have P (g(χ m 2 0 ) ⊑ g(ξ m 0 )|S(m 2 ) = m, ξ = ψ n ) ≥ P (E n S |S(m 2 ) = m, ξ = ψ n ) = P ψ (E n S ).
In Section 4.3, we proved that lim inf n P ψ (E n S ) ≥ 1 − 1 8 . Let N 2 be so big that P ψ (E n ) > 3 4 ∀n > N 1 . Take N := N 1 ∨ N 2 . With such N , 2) holds. Finally, the statement 3) follows from the definition of g in Section 5.1.