Free Choice in Quantum Theory: A p-adic View

In this paper, it is rigorously proven that since observational data (i.e., numerical values of physical quantities) are rational numbers only due to inevitably nonzero measurements errors, the conclusion about whether Nature at the smallest scales is discrete or continuous, random and chaotic, or strictly deterministic, solely depends on experimentalist’s free choice of the metrics (real or p-adic) he chooses to process the observational data. The main mathematical tools are p-adic 1-Lipschitz maps (which therefore are continuous with respect to the p-adic metric). The maps are exactly the ones defined by sequential Mealy machines (rather than by cellular automata) and therefore are causal functions over discrete time. A wide class of the maps can naturally be expanded to continuous real functions, so the maps may serve as mathematical models of open physical systems both over discrete and over continuous time. For these models, wave functions are constructed, entropic uncertainty relation is proven, and no hidden parameters are assumed. The paper is motivated by the ideas of I. Volovich on p-adic mathematical physics, by G. ‘t Hooft’s cellular automaton interpretation of quantum mechanics, and to some extent, by recent papers on superdeterminism by J. Hance, S. Hossenfelder, and T. Palmer.


Introduction
The main goal of the current paper is to prove some of results which were announced without proofs in [1], namely, to prove rigorously mathematical statements which show that an experimentalist's conclusions about whether Nature on the smallest of scales is discrete or continuous [2], random and chaotic, or strictly deterministic [3] solely depends on the experimentalist's free choice of the metrics he chooses to process the measurement data which basically are rational numbers due to inevitably nonzero measurement errors. It should be stressed that the said statements are not types of free-will theorems in quantum mechanics since the statements are about how the data obtained during experiments are postprocessed rather than about how an experimentalist chooses the measurement setting during experiments. This is a crucial difference between results of the current paper and, for example, a Conway-Kochen strong free will theorem [4]. In order to distinguish between these two faces of experimentalist's freedom, in this paper, the two terms "free choice" and "free will" are used, and they are not interchangeable.
There is some resemblance between the meanings of terms used in the invariant set theory [3,5] (within which a p-adic metric is briefly mentioned) and in the current paper; however, the current paper discusses a mathematical model for postprocessing of measurement data rather than broader physical theories.
The paper is inspired by the ideas of I. Volovich who, in collaboration with V. Vladimirov in the 1980s laid the cornerstone of contemporary p-adic mathematical physics [6]. The paper is motivated also by the ideas of G. 't Hooft who initiated the development of the cellular automaton interpretation of quantum mechanics [7] which is based on a suggestion that on some basic level there is no intrinsic randomness in nature.
More formally, the paper introduces a wide class of functions, each of which can be regarded as a continuous (and sometimes as a chaotic, having positive entropy) real function over continuous real time with respect to real metric and which simultaneously is strictly deterministic (and a nonchaotic, having zero entropy) causal function over discrete time with respect to the p-adic metric for every p > 1. By the common definition, causal functions are the mappings which can be performed by automata but only those automata which are the so-called letter-to-letter transducers (or, sequential Mealy machines whose sets of states are not necessarily finite) over a p-letter alphabet rather than by cellular automata on which G. 't Hooft's interpretation is based. These classes of automata differs both from algorithmic and physical points of view. From the algorithmic point of view, letter-to-letter transducers can be judged as the least powerful computers compared to cellular automata which are the most powerful ones. Any algorithm (i.e., any general recursive function) can be implemented on a suitable cellular automaton since the class of all cellular automata is Turing-complete [8,9], whereas algorithms which can be implemented by the transducers are necessarily primitive recursive functions, and moreover, constitute a small class of primitive recursive functions; see the end of Section 3.3. From a physical point of view, the sequential machines are models of open systems whereas cellular automata are models of isolated systems. In contrast to a sequential machine, a cellular automaton updates its states according only to a fixed local rule which does not depend on input, whereas the next state of a sequential machine depends both on input information and on a current state; the sequential machine produces output information which also depends both on input information and on the current state. Throughout this paper, the term automaton refers to a sequential Mealy machine with a potentially infinite number of states; for a formal definition of the latter machine see Definition 2 . In what follows, types of automata different from the said Mealy machines are mentioned with respective adjectives, e.g., "cellular automaton" and "push-down automaton".
The paper is organised as follows: • In Section 2, we recall a formal definition of causal function over discrete time (cf., Definition 1). The very term "causality" is based on the notion of time; this is why in the paper, "time" as a measurable physical entity is a central theme: time may be either discrete (e.g., Planck time) or continuous (e.g., real time) at respective "ends of scale". In this paper, we generally advocate that these cases are indistinguishable by measurements and actually are subject to an experimentalist's free choice of metric with respect to which he processes the numerical values of the experimental physical data. After the formal definition of causality over discrete time, we introduce as postulates statements of I. Volovich on indistinguishability using measurements of physical quantities between rational and irrational values and of G. 't Hooft on the nonexistence of randomness in Nature; then, we formalise the notion a "physical law" as a function which is consistent with these postulates, cf., Conditions 1. • In Section 3, we review some notions and facts from p-adic analysis and from automata theory which will be needed further in the paper. • In Section 4, we introduce one of the main notions of the paper; that is, the real causal functions which are the functions that are continuous both with respect to a real metric and to the p-adic metric; i.e., causal functions which reside simultaneously in two worlds, Archimedean and non-Archimedean. The main results described in these sections are as follows: -Theorem 5 completely describes the class of functions that satisfy Conditions 1; i.e., those which are completely consistent both with Volovich postulates and with 't Hooft causality postulate. We interpret this theorem to be a manifestation of the observer's freedom to conclude whether Nature on the smallest of scales is discrete or continuous since the conclusion depends solely on the observer's free choice of metric with respect to which the observer processes the measured numerical data.

-
In Section 4.4, we argue that the observer's conclusion as whether Nature is basically random and chaotic or totally predictable and deterministic also depends solely on the observer's free choice of metric with respect to which the observer processes the measured numerical data; namely, we show that maps which are chaotic with respect to the real metric are strictly deterministic and predictable with respect to p-adic metric, irrespective to which common definition of chaos is used. -In Section 4.5, we argue that Conditions 1 may be too restrictive from the physical point of view and relax the conditions, letting them hold only for some prime p rather than for all primes. This way we introduce a notion of a p-consistent function, show that the class of p-consistent functions is much wider than the class of completely consistent ones (Theorem 7), prove hologram-like property (Theorem 8) which shows that global behaviour of p-consistent functions is completely defined by their local behaviour, and then prove that wide classes of physically important functions (such as continuous real functions, real functions that vanish at infinity, n-th power integrable functions, wave functions) can be uniformly approximated by infinitely differentiable p-consistent functions; see Theorem 9. This theorem is yet one more piece of evidence supporting the notion that observer's conclusion on discreteness, continuity, and reversibility of time solely depends on the observer's free choice of metric. Finally, in this subsection, we prove Theorem 10 which yields that smooth p-consistent functions related to systems having a finite number of states are necessarily affine; this theorem may demonstrate where the linearity of operators used in quantum theory is rooted.
• In Section 5, we argue that "continuous" and "discrete" models of physical world "meet each other in the middle of the scales", and the wave function is the "meeting point". The specifics of this section are as follows: -In Section 5.1, we formalise what is meant by "measurements at each end of the scale" by introducing two observers, Big-endian and Little-endian, that perform measurements at respective ends, macro and micro. -In Section 5.2, we introduce a p-adic model of the instrument which measure and indicates time, a p-adic clock, and a respective notion of p-adic time, which is time by the Little-endian's clock. Then, we outline (Theorem 11), which proves that there exists a unique clock which is the same for Little-endian and for Bigendian, the universal clock. We argue that the known effect in quantum theory of indistinguishability of which of two event happens earlier than does another one may be rooted in the fact that p-adic time cannot be ordered,i.e., that in contrast to the ring of integers Z, the ring Z p of p-adic integers cannot be ordered. Therefore, the existence or nonexistence of the "time arrow" is again subject to free choice of the metric by the experimentalist. -Section 5.3 describes the base on which the construction of wave function is founded. The section describes, in formal terms, the process of finding cluster points for experimental points in Euclidean space and constructing a smooth line (or surface) on which these cluster points fall. In the subsection, we mostly refer to results which were published earlier in [10,11] and interpret these as the models of physical systems having either discrete or continuous spectra. Based on these results, we argue that chaos is either immanent to continuous time models or emerges as a result of sufficiently long evolution of a physical system in discrete time models. -In Section 5.4, we construct two types of wave functions, the sharp one for Littleendian, with respect to discrete time, and the fuzzy one for Big-endian, with respect to continuous time. The fuzzy wave function can be approximated by sharp wave functions with any desirable accuracy, so this is again a subject to the free choice of the experimentalist regarding the type of wave function which depends on the experimentalist's free choice of metric. We show then in (Theorem 18), that under a reasonable finiteness assumption, the fuzzy 1-dimensional wave function is actually a sharp N-dimensional wave function over discrete 2-adic Here, as an extra mathematical tool, we use β-expansions of numbers; the β-expansions were originally introduced in [12,13]. -In Section 5.5, we formally derive a time-energy uncertainty relation in entropic form. Here, we use yet one more extra mathematical tool, the theory of prefix codes. All necessary notions, results, and proper references of this theory are given in the subsection. We stress that no hidden variables are assumed, and the uncertainty relation holds both for the Big-endian and for the Little-endian.
• We conclude in Section 6. Here, we state that basically the results of the paper may be treated as information-theoretic and remark that the paper highlights J. Wheeler's "it from bit" doctrine [14] since the final results on wave functions, especially Theorem 18, show that "it" is "from bit" indeed: both sharp and fuzzy wave functions actually turn out to be 2-adic 1-Lipschitz functions, i.e., automata functions over the alphabet {0, 1}.

Formalisation
I. V. Volovich, in his numerous papers, books, talks, etc., has stated, many times, the following postulates (further referred to as Volovich postulates) on which p-adic mathematical physics is founded: Only rational numbers can be observed; irrational numbers cannot. (ii) Distances smaller than Planck length cannot be measured. (iii) Fundamental physical laws should be invariant with respect to a change of number field.
According to Ostrowski's theorem, every nontrivial absolute value on the rational numbers Q is equivalent to either the usual real absolute value or a p-adic absolute value (c.f., e.g., [15] [Theorem 10.1]). Then, to ensure the limits of convergent sequences over a field belong to the field; the mentioned number fields must be the fields Q p of p-adic numbers or the field R of real numbers since these fields are the only completions of the field Q with respect to absolute values on Q. Of course, the fields can be complete extensions of the fields Q p and R like, e.g., the fields of complex p-adic numbers C p or a field of "ordinary" complex numbers C, but Q p and R are the only "smallest" fields which satisfy the third Volovich postulate.
G. 't Hooft in his book The Cellular Automaton Interpretation of Quantum Mechanics [7] makes the following claim (further referred to as the 't Hooft causality postulate) which is fundamental for the cellular automaton interpretation of quantum mechanics: It may well be that, at its most basic level, there is no randomness in Nature, no fundamentally statistical aspect to the laws of evolution. Everything, up to the most minute detail, is controlled by invariable laws. Every significant event in our universe takes place for a reason, it was caused by the action of physical law, not just by chance. This is the general picture conveyed by this book.
To be consistent with this postulate, a physical system must be causal; that is, the "effect", which is the reaction of the system to a "cause", i.e., to an impact the system has been exposed, must be a function of the "cause" and of the "state" of the system. However, the very notion of causality is based on the notion of "time" which must be a totally ordered set since the "effect" cannot happen earlier than can the "cause" whose function the "effect" is. It is impossible to experimentally distinguish rational numbers from real numbers (cf. Volovich first postulate); therefore it is reasonable to assume that "time" is a totally ordered countable set. It is well known that any totally ordered countable set T is order-isomorphic to a subset of Q (c.f., e.g., [16]) with respect to the natural order ≤ on Q. Time T is called continuous if the ordering of elements in T is dense; i.e., given t 1 , t 2 ∈ T there exists t 3 ∈ T such that t 1 < t 3 < t 2 . Time T is called discrete, and if given any t 1 , t 2 ∈ T, there is not more than a finite t 3 ∈ T such that t 1 < t 3 < t 2 .
"Continuous" physical models are based on the assumption that any temporal/spatial interval can be divided into smaller intervals ad infinitum. The "discrete" models assume that spacetime should somehow be "quantized" at the smallest of scales; i.e., there exist the smallest spatial/temporal intervals which can not be divided into smaller ones, [2]. In the latter case, it would be reasonable to try to construct a mathematical theory assuming that total amount of these "indivisible" values can be increased ad infinitum. In the both cases, as well as in respective physical theories, the "infinity" simply stands for a value which is extremely small (or extremely large) compared to a given value so that calculations involving the notion of infinity result in values which agree with respective measured values up to a small real number, the error. Therefore, if theories of either type adequately describe physical reality at respective "ends of scale" , the theories must "meet one another somewhere in the middle of the scale".
The discreteness implies that the indivisible intervals are respective units:, i.e., take values of 1; moreover, both "cause" and "effect" are sequences of "elementary causes" and "elementary effects" which happen at discrete time instants 0, 1, 2, . . .. Actually, the "time unit" is the longest temporal interval within which it is impossible for an observer to determine whether any two events are simultaneous or not; i.e., which of the two events happens earlier/later than another one does. In other words, an "event" is like a film consisting of frames where each frame is a static picture, but the sequence of the pictures produces a movie on a screen which the audience of the cinema sees as dynamical process. Thus, the "elementary event" ("elementary cause", "elementary effect") is an event that lasts exactly one time unit similar to a momentary splash for which the moment when it begins is undistinguishable from the moment when it finishes.
We recall a notion of causal function over discrete time in terms of general system theory, c.f., e.g., [17,18]. Definition 1 (Causality over discrete time). Causal functions over discrete time N 0 = {0, 1, 2, . . .} are exactly the functions f which satisfy the following conditions: The domain (the "causes") and range (the "effects") of f are, accordingly, all sequences a = (a i ) i∈N 0 and b = (b i ) i∈N 0 over respective sets A, the "elementary causes", and B, the "elementary effects"; (ii) If f (a) = (b i ) i∈N 0 , then b i does not depend on a i+1 , a i+2 , . . ., for all i ∈ N 0 .
In other words, the function f is causal if and only if there exists a sequence (ϕ It is reasonable to assume that both sets A of "elementary causes" and B of "elementary effects" contain at least two elements and, moreover, that the sets are finite since no physical objects are known which have been proven to be infinite in some natural meaning: Infinity is a mathematical rather than a physical notion which is used in mathematical calculations in order to find good estimates of physical values since the values can be measured with a nonzero error only. From this finiteness assumption, it follows that the causal functions are exactly the mappings which are produced by a special class of automata, the letter-to-letter transducers (or, sequential machines) which transform input sequences a = (a i ) i∈N 0 of elementary causes into output sequences b = (b i ) i∈N 0 of elementary effects so that (ii) is satisfied (cf., e.g., [19], a classical monograph on automata theory). Note that condition (ii) is just a Lipschitz condition with a constant 1 with respect to the natural non-Archimedean metric d on sequences. The metric d can be defined as follows: given two sequences, c = (c i ) i∈N 0 and c = (c i ) i∈N 0 , over the same finite set, d(c, c ) = p −n , where n = max{i ∈ N : c i = c i } if such n exists, and d(c, c ) = 0 if c i = c i for all i ∈ N 0 (here p > 1 is arbitrary real number). In this paper, we mostly consider the case when A and B are a finite p-element set F p where p is a prime number (the latter restriction is more a technical one imposed in order to not overload statements). This way, we may assume that F p is a finite p-element field and that the infinite sequences (c i ) i∈N 0 (where c i ∈ F p ) constitute the space Z p of p-adic integers under a natural one-to-one correspondence between the infinite sequences and canonical representations of p-adic integers ∑ ∞ i=0 c i p i . In the case when p > 1 is not a prime number, the sequences also may be put in one-to-one correspondence with the space Z p of p-adic integers since the latter spaces are defined for all p = 2, 3, 4, . . ., and not necessarily only for prime p; see, e.g., [20].
Physical models, loosely speaking, describe functions f which are "physical laws" that express dependencies of physical quantities on other physical quantities; therefore, if time is one of these quantities, it is reasonable to assume causality, i.e., the functions f are causal. Let us express more formally the conditions the functions f must meet in order to be consistent both with Volovich postulated and 't Hooft causality postulate.
In order to be consistent with Volovich postulates, the following conditions should be satisfied.

•
As only rational numbers can be measured, the functions f , i.e., the closed forms of physical laws which can be experimentally verified, must be mappings of rational numbers to rational numbers; i.e., the functions f must take rational values when values of variables are rational.

•
In order to study functions f when values of variables are "very large" or "very small" with respect to some reasonable metric, one has to expand the laws from the field of rational numbers Q to a bigger field which is complete with respect to that metric; therefore, this bigger field can only be the field of real numbers R and/or p-adic fields Q p for primes p = 2, 3, 5, 7, 11, . . .; however, in order to be invariant with respect to the change of the number field, a restriction to Q of any such expansion of f to a bigger field F ⊃ Q must be the same irrespective to which field F was used for in the expansion, whether F = R or F = Q p . Further, to be consistent also with the 't Hooft causality postulate, the functions f should be causal; however, as it has been argued before, the "time" with respect to which the functions are causal must be order-isomorphic to a subset of Q. However, since, according to the Volovich postulates, no temporal interval smaller than Planck's time can be measured, the temporal intervals can only be multiples of Planck's time; therefore, the "time" over which the functions f are causal must be order-isomorphic to a subset of Z. Thus, up to order isomorphism, the time scale is either N 0 = {0, 1, 2, . . .} or Z = {0, ±1, ±2, . . .} depending on whether the "beginning of time"' exists or does not exist. According to the contemporary physical picture of the universe, it is reasonable to assume that the "beginning of time" exists; thus, the time scale must be N 0 , up to order isomorphism. However, causal functions over the discrete time N 0 can be treated as p-adic 1-Lipschitz functions whose domain and range are p-adic integers Z p rather than the whole field Q p , c.f., the reasoning which follows Definition 1; thus, as the "common part" of Z p and R (which we further denote via Z p ∩ Q) are rational p-adic integers, i.e., the irreducible fractions, whose denominators are coprime to p, to be consistent with Volovich postulates, the causal functions must take values from Z p ∩ Q rather than from the whole Q; moreover, the functions must be expandable to the whole field R since Z p ∩ Q is a dense subset of R. Finally, we can specify the formal properties the functions f must share in order to be consistent both with Volovich postulates and with the 't Hooft causality postulate, as follows: Condition 1 (Complete consistency). A (univariate) continuous real function f : R → R which is consistent with both Volovich postulates and the 't Hooft causality postulate must share the properties listed below.

(i)
For every prime p, the restriction f | N 0 must be a causal function over discrete time N 0 ; i.e., the restriction f | N 0 must satisfy a p-adic Lipschitz condition with a constant 1. That is, for all m, n ∈ N 0 , there must hold the inequality where d p is the p-adic metric.
Since N 0 is a dense subset in Z p , by (i), for every prime p, there exists a unique extension of f | N 0 to the function f p : Z p → Z p which satisfies a Lipschitz condition with a constant of 1 with respect to the p-adic metric d p . Therefore, to be invariant with respect to the change of the field, the function f : R → R must act on the set Z p ∩ Q of all p-adic rational integers exactly as the function f p does; that is, for every prime p, the restriction f | Z p ∩Q on rational p-adic integers Z p ∩ Q must coincide with the restriction f p | Z p ∩Q on Z p ∩ Q: The questions which immediately arise are whether there exist functions f which satisfy the conditions, and if such functions do exist, what are these functions. In Section 4, we show that functions which meet Conditions 1 do exist and constitute a class of all polynomials over Q of a special type (the class contains, e.g., all polynomials over Z); see Theorem 5. Moreover, the functions turn out to be causal with respect to all finite alphabets and not necessarily with respect to p-symbol alphabets for prime p. This implies in particular that the answer to the commonly asked question about p-adic mathematical physics concerning what p should be chosen by an experimenter in order to make the theory consistent with the observations is as follows: the choice of p is absolutely free if causality, discreteness at Planck's scale, and invariance with respect to the change of the number field are assumed.
We stress that the functions f which satisfy Conditions 1 are causal for all p-symbol alphabets and for all prime p, and hence, for all finite alphabets. In our view, the latter property appears to be too restrictive (and somewhat nonphysical, cf., the reasoning concerning the finiteness assumption above) since Planck's scale includes a finite number of physical quantities (time, length, etc.) rather than an infinite number. Thus, it is reasonable to assume that Conditions 1 hold only for a finite set of primes; this implies that the functions f are causal with respect to finite alphabets, the prime power decompositions of the number of elements of which include only powers of primes from that set. The study of this class of functions can be reduced to cases containing only one prime, p. We show that if in the statement of Conditions 1, a prime p is fixed and that "for every prime p" is replaced by "for the prime p" then there exist functions f that satisfy the Conditions, which are continuous real functions on R but which are not rational functions over Z; i.e., are not of the form u(x)/v(x) where u(x), v(x) ∈ Z[x] are polynomials with integer coefficients; see Theorem 7. Note also that under such a restatement of the Conditions, f (Z) is not necessarily a subset of Z but only a subset of Z p ∩ Q, cf., Note 1.

Preliminaries
We review some notions and facts from p-adic analysis and from automata theory which will be needed further in the paper.

A Few Words about Words
An alphabet is just a finite nonempty set A; further in the paper, typically A = {0, 1, . . . , p − 1}, where p > 1 is an integer (mostly, but not always, p is a prime). Elements of A are called symbols, or letters. By this definition, a word of length n over alphabet A is a finite sequence (stretching from right to left) α n−1 · · · α 1 α 0 , where α n−1 , . . . , α 1 , α 0 ∈ A. The number n is called the length of the word w = α n−1 · · · α 1 α 0 and is denoted via Λ(w). The empty word φ is a sequence of length 0; that is, the one that contains no symbols. Given a word w = α n−1 · · · α 1 α 0 , any word v = α k−1 · · · α 1 α 0 , k ≤ n, is called a prefix of the word w, whereas any word u = α n−1 · · · α i+1 α i , 0 ≤ i ≤ n − 1 is called a suffix of the word w. Every word α j · · · α i+1 α i where n − 1 ≥ j ≥ i ≥ 0 is called a subword of the word w = α n−1 · · · α 1 α 0 . Given words a = α n−1 · · · α 1 α 0 and b = β k−1 · · · β 1 β 0 , the concatenation ab is the following word (of length n + k): Given a word w, its k-times concatenation is denoted via (w) k We denote using W = W(A) the set of all nonempty words over A = {0, 1, . . . , p − 1} and using W φ the set of all words including the empty word φ. In the sequel, the set of all n-letter words over the alphabet A, we denote as W n ; thus, W = ∪ ∞ n=1 W n . To every word w = α n−1 · · · α 1 α 0 , we put into the correspondence a non-negative integer num(w) = α 0 + α 1 · p + · · · + α n−1 · p n−1 . Thus, num maps the set W of all the nonempty finite words over the alphabet A onto the set N 0 = {0, 1, 2, . . .} of all non-negative integers. We will also consider a map ρ of the set W into the real unit half-open interval [0, 1); the map ρ is defined as follows: We also use the notation 0.w for 0.β r−1 . . . β 0 . Along with finite words, we also consider one-side infinite words over the alphabet A; these are the infinite sequences of the form . . . α 2 α 1 α 0 where α i ∈ A, i ∈ N 0 . In this paper, we may write one-side infinite words either stretching from left to right or from right to left when convenient, i.e., both α 0 α 1 α 2 . . . and . . . α 2 α 1 α 0 denote the same word. For finite words, we may also use both notations, left and right, and the order of indices of letters in the word shows which of the two notations is used. For infinite words, notions of prefix, suffix, and subwords are defined in the same way as they are for finite words; note that suffixes is are always infinite words whilst prefixes and subwords are always finite words. Let an infinite word w be eventually periodic; that is, let for α i β j ∈ A; then, the subword β t−1 β t−2 . . . β 0 is called a period of the word w, and the suffix α r−2 . . . α 0 is called the preperiod of the word w. Note that a preperiod may be an empty word, while a period cannot. We ultimately write the periodic word w as w = (β t−1 β t−2 . . . β 0 ) ∞ α r−1 α r−2 . . . α 0 .

p-adic Integers
We briefly recall some very basic facts about p-adic integers referring the reader to any monograph on p-adic analysis (e.g., to [20]) for deeper introduction to the subject. Let p > 1 be an integer. A p-adic integer z ∈ Z p can be uniquely represented by a canonical form . Thus, to every infinite sequence z = (ζ i ) ∞ i=0 , we put into a correspondence a p-adic integer represented by a respective canonical form. The sequences z may also be treated as (one-side) infinite words over the alphabet {0, 1, . . . , p − 1}; thus, we now can expand a mapping num to the set W p of all infinite sequences over {0, 1, . . . , p − 1} so that num(z) = ∑ ∞ i=0 ζ i p i ∈ Z p . The so defined mapping num : W p → Z p is one-to-one; thus, in what follows, we will not distinguish when necessary between p-adic integers, (one-side) infinite sequences over {0, 1, . . . , p − 1}, and infinite words over the alphabet {0, 1, . . . , p − 1}.
The sequences z which contain only finitely many nonzero terms correspond to nonnegative integers from N 0 = {0, 1, 2, . . .} represented by their base-p expansions; the sequences z which contain only finitely many terms not equal to p − 1 correspond to negative integers −N = {−1, −2, −3, . . .}. The sequences z which are ultimately periodic correspond to rational p-adic integers z ∈ Z p ∩ Q; i.e., to rational numbers which can be represented by irreducible fractions u/v whose denominators v are coprime to p. Any z ∈ Z p ∩ Q can be represented as The rational p-adic integers constitute a subring Z p ∩ Q of Z p which is a dense subset of Z p with respect to the p-adic metric. The metric is induced by the p-adic absolute value |z| p which is equal to p − ord p z , where ord p z is the length of the longest zero-prefix (the prefix which consists of zeros only) of z if z = 0, and |0| p = 0 by definition.
Given n ∈ N = {1, 2, 3, . . .} and a canonical expansion z = ∑ ∞ i=0 α i p i for z ∈ Z p , we further denote z mod p n = ∑ n−1 i=0 α i p i ∈ N 0 . The mapping mod p n : z → z mod p n can be treated as a ring epimorphism of Z p onto the residue ring Z/p n Z, under a natural representation of elements of the residue ring by the least non-negative residues {0, 1 . . . , p n − 1}. Given n ∈ N, the base-p expansion of n is a finite word over F p whose length is log p n + 1. As the base-p expansion of 0 is a one-letter word (namely, 0), in what follows, we assume that log p 0 = 0. We stress that when considering words corresponding to numbers, for numbers 0, 1, 2, . . ., we distinguish their base-p expansions from their canonical p-adic representations: the latter are treated as infinite words rather than as finite words. We also stress that the mapping num : W → N 0 is a surjection but not one-to-one, whereas the mapping num : W p → Z p is one-to-one. In what follows, it always will be clear from the context what domain of num is considered.
A probability measure µ on Z p can be defined as follows: elementary µ-measurable sets are balls As the balls are simultaneously open and closed in topology induced by the p-adic absolute value | · | p and as every two balls are either disjoint or one of them contains another one, the balls constitute a base of sigma-algebra which define a sigmaadditive measure µ on Z p . Actually, this measure µ is a Haar measure normalised so that µ(Z p ) = 1. The measure µ is a Borel measure; that is, every open subset is µ-measurable (hence, every closed subset is µ-measurable as well). The measure µ is regular; that is, for any µ-measurable subset A ⊂ Z p µ(A) = sup{µ(S) : S ⊂ A, S is closed in Z p } = inf{µ(S) : S ⊃ A, S is open in Z p } Thus, Z p is a totally disconnected compact metric space whose metric is induced by the p-adic absolute value | · | p and a probability space with respect to the measure µ. Note that the probability measure agrees with the metric; i.e., any function Z p → Z p that is continuous with respect to the metric is measurable: f −1 (S) is µ-measurable once S ⊂ Z p is µ-measurable. Also note that the p-adic metric d p (a, b) = |a − b| p (where a, b ∈ Z p ) is non-Archimdean; that is, the triangle inequality holds for that metric in a stronger form: In a similar way, the metric and probability measure can be defined for spaces Z n p = Z p × · · · × Z p n , but in this paper, this n-dimensional space is mentioned only briefly in appropriate places; in order not to overload the exposition, we limit our "working space" to Z p .

Systems, Transducers, Automata, Sequential Machines
Terminology in automata theory is somewhat diverse; in order to avoid a misunderstanding of the basic notions, we state them below. I is a nonempty finite set, the input alphabet; • O is a nonempty finite set, the output alphabet; • S is a nonempty (possibly, infinite) set of (epistemic) states; • S : I × S → S is a state transition function; • O : I × S → O is an output function. The system is called autonomous if neither S nor O depend on input letters (that is, if S : S → S, O : S → O); otherwise, the system is called nonautonomous. A subsystem A of A is a system I, S , O, S, O such that ∅ = S ⊂ S and S(χ, s ) ∈ S for all χ ∈ I, s ∈ S . A subsystem is called minimal if it has no subsystems other than itself. An initial automaton (or in other terminology, a letter-to-letter transducer [21], a Mealy sequential machine [19], an initial synchronous automaton [22]) A(s 0 ) is a system where one of the states, s 0 ∈ S, is fixed; s 0 is called the initial state.
In what follows, the term automaton stands for an initial automaton; the subsystems of the automata are also called subautomata. A noninitial state s ∈ S is called reachable (or, accessible) if there exists a finite sequence χ 0 , χ 1 , . . . , χ N−1 ∈ I such that S(χ N−1 , s N−1 ) = s, where s i = S(χ i−1 , s i−1 ), i = 1, 2, . . . , N − 1; i.e., if there exists a path from the initial state s 0 to s of finite length N.
An automaton A determines a unique map f A : . . . χ 2 χ 1 χ 0 → . . . ξ 2 ξ 1 ξ 0 from the set W(I) of all (one-side) infinite words over the alphabet I to the set W(O) of all (one-side) infinite words over the alphabet O, as follows: at time instant i = 0, the automaton, being in the state s 0 , accepts the first input letter χ 0 , updates its state to a newer state s 1 = S(χ 0 , s 0 ), and produces an output letter ξ 0 = O(χ 0 , s 0 ); at the next time instant i = 1, the automaton accepts χ 1 , updates its state to s 2 = S(χ 1 , s 1 ), and produces an output letter a uniquely determined sequence of maps. The mapping f A is called an automaton function of the automaton A; clearly, the mapping is causal. It is well known that the converse is also true: every causal mapping f : I → O is an automaton function of a suitable automaton A f (see, e.g., [19] [Chapter IV, Theorem 8.2]). This is why for the rest of this paper we use the terms causal function, automaton map, automaton function, automatic function, and 1-Lipschitz function as synonyms.
For instance, take a prime number p and consider an automaton whose input (respectively, output) alphabet is m-tuple (α 1 , . . . , α m ) ∈ F m p = I (respectively, n-tuple from F n p = O); then, the automaton function is a map Z m p → Z n p which satisfies a Lipschitz condition with a constant 1 (further, 1-Lipschitz for brevity) with respect to the p-adic metric which is defined by the p-adic absolute value |(z 1 , . . . , z k )| p = max{|z 1 | p , . . . , |z k | p } on Z k p (here z j = ∑ ∞ i=0 α ji p i ∈ Z p , α ji ∈ F p , j = 1, 2, . . . , k). Moreover, every 1-Lipschitz map f : Z m p → Z n p is an automaton function of a suitable automaton A f . Note that it is convenient sometimes to consider automata whose input/output alphabets' cardinalities #I, #O are multiplicatively dependent (i.e., such that #I, #O are powers of some integer p > 1) as automata having multiple inputs/outputs; i.e., to consider the 1-Lipschitz map f : Z m p → Z n p as an automaton function of an automaton having m input channels and n output channels, each channel over a p-symbol alphabet. That is, the automaton function in this case is a multivariate map over infinite words over a p-symbol alphabet. In what follows, we will refer to such a case as to multivariate.
It is clear that a composition of automaton functions is an automaton function of an automaton which is a sequential composition of respective automata. For automata (and for their functions), the Cartesian product and Kronecker product can also be defined, but we do not need these constructions within the scope of the current paper.
Given f , the automaton A f is not unique in the meaning of Definition 2:. There are infinitely many different automata (i.e., the ones whose sets of epistemic states are different, whose state transition functions are different, whose output functions are different) whose automaton function is f . Therefore, an observer can only make guesses about the "internal structure" of the system by observing pairs of "causes and effects", i.e., pairs (z, f (z)), z ∈ Z M ; moreover, the equivalent states are indistinguishable for the observer. However, given f there exists a unique automaton whose automaton function is f and whose set of states S is the "smallest". Call the two states s i , s j ∈ S of the automaton A equivalent; if whenever s i , s j are taken as initial states, the word mappings performed by either of the two initial automata are equal to one to another; i.e., if the input words are equal one to another, then the corresponding output words are also equal one to another. Factorising the state set of the automaton A by the equivalence relation, we obtain an automaton having no equivalent nonequal states whose automaton function is f A . An automaton function f A is called finite if it can be produced by an automaton whose set of states is finite; that is, the factor set by the equivalence relation is finite.
It is convenient to represent automata by their state transition diagrams (or Moore diagrams), which are directed graphs (the digraphs) whose vertices are states and whose arrows are state transitions, with the arrows labelled by input letter|output letter. Given an automaton function f : Z m p → Z n p , there exists an automaton whose automaton function is f and whose state transition diagram is an infinite tree such that each vertex (i.e., a state) has exactly p m outgoing arrows which go to p m different vertices, cf., Figure 1 which depicts a state transition diagram of an automaton whose automaton function is f : Z 2 → Z 2 .
ersion May 9, 2023 submitted to Entropy 11 of 48 state set of the automaton A by the equivalence relation, we get an automaton having no 487 equivalent non-equal states whose automaton function is f A . An automaton function f A 488 is called finite if it can be produced by an automaton whose set of states is finite; that is, 489 the factor set by the equivalence relation is finite.

490
It is convenient to represent automata by their state transition diagrams (or, Moore di-491 agrams) which are directed graphs (the digraphs) whose vertices are states, whose arrows 492 are state transitions; and the arrows are labelled by input letter|output letter. Given an au-493 tomaton function f : Z m p → Z n p , there exists an automaton whose automaton function is f 494 and whose state transition diagram is infinite tree such that each its vertex (i.e., a state) has 495 exactly p m outgoing arrows which go to p m different vertices, cf., Figure 1 which depicts a 496 state transition diagram of an automaton whose automaton function is f :  The automaton function of the automaton whose state transition diagram is depicted 498 by Figure 1 is f (z) = z + 1 (z ∈ Z 2 ), the 2-adic odometer. The reduced state transition 499 diagram (which is obtained by factorisation by the equivalence relation defined earlier) is 500 a digraph having only two vertices, cf., Figure 2. The automaton whose state transition 501 diagram is depicted by Figure 2 has the same automaton function f (z) = z + 1 on Z 2 ; 502 thus the function f is a finite automaton function since it is produced by an automaton 503 having only 2 states. Note that a finite automaton is minimal if and only if its state transition 504 diagram is a strongly connected digraph; i.e., given any two vertices there is a path connecting 505 the vertices. The 2-adic odometer therefore has the only minimal sub-automaton, the one 506 whose set of states consists of the only state s 1 .

507
Recall that a path in a digraph is a (finite or infinite) sequence of arrows − → a 0 , − → a 1 , . . . 508 such that for every pair − → a j , − → a j+1 of the arrows there is a state s such that the arrow − → a j 509 goes to s and − → a j+1 goes from s. In a state transition diagram of an automaton having input 510 alphabet A to every path it corresponds a word χ 0 χ 1 . . . over A where χ j are input letters, 511 the ones which occupy the first positions in the label α|β of the arrow: If A = {0, 1, . . . , p − 512 1} then to every path − → a 0 − → a 1 . . . which starts from the initial state s 0 it corresponds the 513 p-adic integer χ 0 + χ 1 p + • • • χ k−1 p k−1 + • • • where χ j |• is a label which marks the arrow 514 − → a j , j = 0, 1, 2, . . . . Simply speaking, the word χ 0 χ 1 . . . is an input word such that when 515 an automaton is fed by that word, the automaton updates it states s 0 → s 1 → s 2 → • • • 516 where s j is a state from which the arrow − → a j starts and s j+1 is a state to which the arrow 517 − → a j goes; so the states s j , s j+1 are connected by the arrow − → a j which goes from s j to s j+1 518 and which is labelled by the label χ j |•. The automaton function of the automaton whose state transition diagram is depicted by Figure 1 is f (z) = z + 1 (z ∈ Z 2 ), the 2-adic odometer. The reduced state transition diagram (which is obtained by factorisation with the equivalence relation defined earlier) is a digraph having only two vertices, cf., Figure 2. The automaton whose state transition diagram is depicted as in Figure 2 has the same automaton function f (z) = z + 1 on Z 2 ; thus, the function f is a finite automaton function since it is produced by an automaton having only two states. Note that a finite automaton is minimal if and only if its state transition diagram is a strongly connected digraph; i.e., given any two vertices, there is a path connecting the vertices. The 2-adic odometer, therefore, has the only minimal subautomaton, the one whose set of states consists of the only state s 1 .
Recall that a path in a digraph is a (finite or infinite) sequence of arrows − → a 0 , − → a 1 , . . . such that for every pair − → a j , − → a j+1 of the arrows there is a state s such that the arrow − → a j goes to s and − → a j+1 goes from s. In a state transition diagram of an automaton having input alphabet A, to every path there corresponds a word χ 0 χ 1 . . . over A where χ j are input letters, the ones which occupy the first positions in the label α|β of the arrow: if A = {0, 1, . . . , p − 1} then to every path − → a 0 − → a 1 . . . that starts from the initial state s 0 , there corresponds the p-adic integer χ 0 + χ 1 p + · · · χ k−1 p k−1 + · · · where χ j |· is a label which marks the arrow − → a j , j = 0, 1, 2, . . .. Simply speaking, the word χ 0 χ 1 . . . is an input word such that when an automaton is fed by that word, the automaton updates it states s 0 → s 1 → s 2 → · · · where s j is a state from which the arrow − → a j starts and s j+1 is a state to which the arrow − → a j goes; thus, the states s j , s j+1 are connected by the arrow − → a j which goes from s j to s j+1 and which is labelled as χ j |·.  The statement of the following proposition is well known, see, e.g., [6]: Automata functions of automata whose input/output alphabets are F p can be explic-526 itly represented via Mahler series. Recall that if p > 1 is an integer (which is not necessarily 527 a prime) then every function f : N 0 → Z p (or, respectively, f : N 0 → Z) has the only Mahler 528 expansion, that is, has a unique representation via the so-called Mahler (interpolation) series, 529 [46]: where a i ∈ Z p (respectively, a i ∈ Z), i = 0, 1, 2, . . ., and by the definition. The following reciprocity relations hold: The function f : Z p → Z p represented by series (3.3) is continuous with respect to the 534 p-adic metric if and only if a i tends p-adically to 0 as i tends to infinity.

535
To represent functions of several variables one may use interpolation series of the 536 following form: here a i 1 ,...,i n ∈ Z p . As the map f : Z n p → Z p is an automaton function (of the automaton 538 having n inputs an one output over a p-symbol alphabet F p ), the following theorem 3.3 539 completely describes the automata functions. Note that log p i is the smallest integer 540 which does not exceed log p i; thus log p i is reduced by 1 the number of digits in the 541 base-p expansion of i ∈ N 0 ; so log p 0 = 0.  The statement of the following proposition is well known; see, e.g., [10]: Proposition 1 (Finite and nonfinite automata functions). Both addition + : Z 2 p → Z p and multiplication · : Z 2 p → Z p are automata functions; addition is a finite automaton function, whereas multiplication is not.
Automata functions of automata whose input/output alphabets are F p can be explicitly represented via Mahler series. Recall that if p > 1 is an integer (which is not necessarily a prime), then every function f : N 0 → Z p (or, respectively, f : N 0 → Z) has the only Mahler expansion; that is, has a unique representation via the so-called Mahler (interpolation) series [20]: where a i ∈ Z p (respectively, a i ∈ Z), i = 0, 1, 2, . . ., and by definition. The following reciprocity relations hold: The function f : Z p → Z p represented by series (3) is continuous with respect to the p-adic metric if and only if a i tends p-adically to 0 as i tends to infinity.
To represent functions of several variables, one may use interpolation series of the following form: Here, a i 1 ,...,i n ∈ Z p . As the map f : Z n p → Z p is an automaton function (of the automaton having n inputs and one output over a p-symbol alphabet F p ), the following Theorem 1 completely describes the automaton functions. Note that log p i is the smallest integer which does not exceed log p i; thus, log p i is reduced by 1 number of digits in the base-p expansion of i ∈ N 0 ; thus, log p 0 = 0. where ν(i 1 , . . . , i n ) = max{ log p i k : k = 1, 2, . . . , n}.
In particular, a univariate function f : Z p → Z p represented by the Mahler expansion (3) for suitable c i ∈ Z p ; i = 0, 1, 2, . . ..

Note 2.
The series (6) converges uniformly on Z p . Given a 1-Lipschitz function f : Z p → Z p , the representation (2) is unique.
There are explicit representations of automaton functions in other terms (e.g., via van der Put series, digital derivatives) which are not needed within the scope of the paper; an interested reader is referred to an expository paper [24]. Additionally, it is worth noting that Moore sequential machines are initial automata whose output function depends only on states, cf. Definition 2, but it is well known that the latter machines are equivalent to Mealy machines in the following meaning: under the assumption that an output of a Moore machine at initial state is an empty symbol (i.e., no output), then the classes of causal functions represented by Mealy machines and by Moore machines coincide; however, to represent a causal function via a state transition diagram of a Moore machine, one needs more states compared to the diagram of the respective Mealy machine. This is why in the rest of the paper, the example state transition diagrams are given for Mealy machines although, from a physical point of view, it might be more natural to deal with Moore machines since they appear to be defined on Markov chains whilst Mealy machines are not, as the output of Moore machines formally depends only on states rather than on arrows reaching the states; however, this view is misleading since Mealy machines do exactly what Moore machines do.
Finally, automaton functions is the concept which illuminates the sharp difference between the two approaches, the 't Hooft's one based on cellular automata and ours based on letter-to-letter transducers: the class of functions computed by the transducers is much smaller than the class of functions computed by cellular automata. To exemplify this, consider one more type of transducer, the letter-to-word transducer (or, asynchronous initial automata, [22]) whose output function is I × S → W φ rather than I × S → O and where W φ is the set of all finite words (including the empty word φ) over the output alphabet O, c.f., Definition 2. In the case when I = {0, 1, . . . , p − 1}, an asynchronous initial automaton, produces a map Z p → Z p that can be constructed by an analogy with the synchronous case; then, the maps Z p → Z p , which are automaton functions of nondegenerate synchronous initial automata, constitute the class of all functions that are continuous with respect to the padic metric, c.f., [22] [Theorem 2.4]. Therefore, these functions are defined by the maps N 0 → N 0 as N 0 is dense in Z p with respect to the p-adic metric. The automata functions of initial synchronous automata are all of the form (6), so if f is an automaton function of a synchronous automaton such that f : N 0 → N 0 , then necessarily c i p log p i ∈ Z for all i ∈ N 0 as the value of a i for every i can be calculated by using (4). Therefore, from the algorithmic point of view, f is a primitive recursive function. In a similar way, it can be shown that the functions N 0 → N 0 which are automaton functions of nondegenerate asynchronous automata are also a primitive recursive function since they can be uniquely expanded to continuous p-adic functions Z p → Z p and thus are of the form (3). However, a class of cellular automata is Turing-complete; therefore, the automaton functions of cellular automata (which can be defined for these automata as well) constitute the class of all general recursive functions; hence, they are not even everywhere defined on N 0 , let alone p-adic continuity or 1-Lipschizness. In other words, one may say that the class of automata functions of initial synchronous automata is the smallest class of automata functions, whereas the class of automata functions of cellular automata is the largest one.

On the Dynamics of Causal Functions
Here, we briefly recall some facts about the dynamics of automaton functions following [23]; i.e., on the dynamics of the p-adic 1-Lipschitz functions. The dynamics arises quite naturally since the automaton function of a sequential composition of automata is a composition of automaton functions. In addition, we recall from [25] a few general notions and facts from dynamical system theory which will be needed in subsequent steps.
A map F : S → Y from a measure space S into a measure space Y endowed with probability measures µ and ν, respectively, is said to be measure-preserving if µ(F −1 (S)) = ν(S) for each measurable subset S ⊂ Y; in the case when S = Y and µ = ν, a measure-preserving map F is said to be ergodic if given a measurable subset S such that F −1 (S) = S, either µ(S) = 1 or µ(S) = 0; the map F is called weak mixing if for any two measurable sets A, B, there exists a sequence n k → ∞ over N 0 such that µ( If n k = k, the weak mixing is called strong mixing. Weak mixing implies ergodicity but is a stronger condition than is ergodicity: the map F is weak mixing if and only if the map Definition 3 (Topological transitivity). Given a topological space X and a continuous mapping f : X → X, the mapping f (as well as the respective dynamical system) is called topologically transitive if there exists a dense orbit of f ; that is, if there exists x ∈ X such that the set of iterations There is another (generally, nonequivalent to the above) definition of topological transitivity: the map f is called topologically transitive if for every pair of nonempty open sets U, V ⊂ X, there exists a non-negative integer such that f (U) ∩ V = ∅. However, as in the sequel we deal with spaces X = Z n p , n ∈ N, the two definitions are equivalent since the spaces have no isolated points and are separable and of second category.

Definition 4 (Unique ergodicity)
. A mapping f : S → S is called uniquely ergodic if there exists a unique f -invariant probability measure µ on S; i.e., such that f is ergodic with respect to µ.

Proposition 2 ([25] [Corollary 4.3.6]). A minimal isometry of a compact metric space is uniquely ergodic.
Given a 1-Lipschitz function f : Z p → Z p , a map f mod p k : z → f (z) mod p k is a well-defined map of the residue ring Z/p k Z into itself, cf., Section 3.2. This map is called an induced function modulo p k . The function induced modulo p k by a 1-Lipschitz function F : Z n p → Z n p can be defined by analogy.
Definition 5 (Bijectivity and transitivity modulo p k ). A 1-Lipschitz function F : Z n p → Z n p is said to be a bijective modulo p k (respectively, a transitive modulo p k ) whenever the induced function F mod p k : (Z/p k Z) n → Z/p k Z) n is bijective (respectively, transitive).
In what follows, if the measure is not specified explicitly, measure preservation and ergodicity are defined with respect to the Haar probability measure on Z n p , cf., Section 3.2. The following Theorem and Proposition are proven in [23] [Chapter 4]. Theorem 2 (Main ergodic theorem for 1-Lipschitz p-adic dynamics). A 1-Lipschitz function F : Z n p → Z n p is measure-preserving (or, accordingly, ergodic) if and only if it is bijective, ( or, accordingly, transitive) modulo p k for all k = 1, 2, 3, . . ..

Proposition 3.
A function F : Z n p → Z n p is measure-preserving and 1-Lipschitz if and only if it is an isometry of Z n p onto itself. A measure-preserving 1-Lipschitz function F is ergodic if and only if it has a dense orbit; moreover, all orbits of ergodic 1-Lipschitz function F : Z n p → Z n p are dense.
The space Z n p is a probability space and a metric (and thus topological) space. Therefore, for a continuous function Z n p → Z n p , one can define a metric entropy (related to the probability) and a topological entropy (related to the topology). In general, given F, these entropies may differ. However, for 1-Lipschitz functions F, both entropies coincide and are 0. Indeed, it is known that if G : X → X is an isometry of a compact metric space X onto itself, then the topological entropy of G is 0, cf., e.g., [26] [Exercise 6.3]. Yet, the variational principle for the topological entropy necessitates that the topological entropy of a continuous transformation G of a compact metric space X is a supremum of all metric entropies of G with respect to G-invariant measures on X, cf., [26] [Theorem 6.8.1]; this proves the claim. Moreover, from Proposition 3 it follows that given a 1-Lipschitz ergodic map F : p to Z 2 p is never ergodic since an orbit which starts from (z, z) ∈ Z 2 p is never dense in Z 2 p , so F is never weak mixing. We summarize as follows: , if and only if the respective automaton is time-reversible: an automaton A whose automaton function is F is called time-reversible if there exists an automaton B whose automaton function is G and such that G = F −1 , i.e., the composition G(F) is an identity map Z n p → Z n p . The time-reversibility is also called automaton weak invertibility, [27]. • All 1-Lipschitz functions Z n p → Z n p have zero topological entropy (thus, zero metric entropy). • All 1-Lipschitz ergodic maps F : Z n p → Z n p are uniquely ergodic. • None of the 1-Lipschitz ergodic maps F : Z n p → Z n p is weak mixing. • Every orbit of every 1-Lipschitz ergodic map F : Z n p → Z n p is dense. When n = 1 the following is true [28] [Theorem 6]: Theorem 3. Let f : Z p → Z p be surjective and 1-Lipschitz. The following propositions are equivalent: In subsequent steps, we will need the following sufficient conditions of measurepreservation/ergodicity for 1-Lipschitz functions Z p → Z p , [23] [Lemma 4.41]: Lemma 1. Given a 1-Lipschitz function f : Z p → Z p and p-adic integers c, d, c ≡ 0 (mod p), the function g(  , v), or, which is the same if and only if words f (u) and f (v) have a common prefix of length at least k whenever respective words u and v have a common prefix of length k.

Completely Consistent Functions
As a composition of automaton functions is an automaton function, the following example introduces an important class of functions which are automaton functions for every p (by Proposition 1) and, moreover, which at the same time can be considered as continuous real functions.
That is, as the set Z p ∩ Q of all rational p-adic integers is dense both in Z p with respect to the p-adic metric on Z p for every p and with respect to usual real metric on R, the map induced by a polynomial f ∈ Z[x] is well-defined both on Z p for all p and on R; i.e., the map f : z → f (z), (z ∈ Z p ∩ Q), can be uniquely extended both to continuous maps f : u → f (u), (u ∈ Z p ), for all p, and to a continuous map f : y → f (y), (y ∈ R). This is because any polynomial map is a composition of additions and multiplications, and these operations are well-defined and continuous both on all Z p and on R with respect to corresponding metrics and agree on Z p ∩ Q.

Universally Causal Functions
The maps f : N 0 → Z defined by polynomials over Z are examples of functions which we call universally causal; these are the functions which, loosely speaking , are causal with respect to all finite alphabets A and B such that #A = #B = r for whatever r ∈ {2, 3, 4, . . .} is taken. Here is a formal definition.

Definition 6 (Universally causal functions). A causal function
whose domain is all sequences a = (a i ) ∞ i=0 over A and whose codomain is all sequences (1), satisfies the following conditions: (i)f (N 0 ) ⊂ Z, where N 0 , the rational non-negative integers, are all r-adic integers whose canonical r-adic representations contain only a finite number of nonzero terms; and Z, the rational integers, are either non-negative rational integers or negative rational integers. The latter are all r-adic integers whose canonical r-adic representations contain only a finite number of terms other than (r − 1)r i .
The class of universally causal functions is much wider than than that of functions defined by polynomials over Z. Actually, up to the bijections α, β, the universally causal functions constitute a class of the so-called pseudo-polynomials, Ref. [29] or universal functions [30]; these are maps g : N 0 → Z which satisfy (ii) from Definition 6.
Theorem 4 (On pseudo-polynomials). A map g : N 0 → Z is a pseudo-polynomial if and only if g can be represented as where c i ∈ Z, lcm{1, 2, . . . , i} is the least common multiple of the numbers 1, 2, . . . , i, and ψ(i) = ∑ q≤i, q prime log q i ln q is the second Chebyshev function, i = 1, 2, . . . (recall that In the literature, often only the functions of the form (7) which are not polynomials are called pseudo-polynomials, but in the current paper, we call pseudo-polynomials all functions of that form. The class of pseudo-polynomials is wide and is a subject of study for a number theorists, who focus mostly on Ruzsa's conjecture, which is about the sufficient conditions for when a pseudo-polynomial is a polynomial; see, e.g., [31]. Classical examples of pseudo-polynomials which are not polynomials are The following is noteworthy.

•
Even if all but a finite number of c i in (7) are 0, i.e., if g is a polynomial, then g is not necessarily a polynomial with integer coefficients, although g is polynomial over Q. For instance, put c 4 = 1 and put c i = 0 for i = 4. • If all but a finite number of c i are 0, the function g is well-defined on R; that is, f can be uniquely expanded to a map R → R which is continuous with respect to the real metric. • For every p > 1, the map g can be uniquely expanded to 1-Lipschitz (thus, automatic) map Z p → Z p , cf., (6) from Theorem 1.

The Main Theorem on Complete Consistency
Therefore, polynomials of the form (7) satisfy Conditions 1 for every prime p. It turns out that the converse statement is also true. Note that a function which satisfies the conditions for all prime p must be universally causal, i.e., it must be a pseudo-polynomial; however, the only pseudo-polynomials which are well-defined on R are polynomials since if an infinite number of c i in (7) are nonzero then the series diverges at; for example, z = −1 as the common term at z = −1 is (−1) i c i · lcm{1, 2, . . . , i} and thus does not go to 0 as i → ∞. However, this argument does not prove the converse claim since, for instance, if g is a pseudo-polynomial which is not a polynomial, then the composition g(z 2 ) is also a pseudo-polynomial, but the map z → g(z 2 ) is well-defined on Z. Nonetheless, the following theorem holds true.  (7); i.e., when all but a finite number of c i in (7) are zero.
Proof. According to Theorem 4, every polynomial g over Q of the form (7) satisfies (i) from Conditions 1, cf., (ii) of Definition 6. Therefore, g also satisfies (ii) from Conditions 1 since g(Q) ⊂ Q as g is a polynomial over Q.
To prove the converse claim, note that the map u : Z p → Z p is 1-Lipschitz if and only if ∆ i u(z)/i ∈ Z p for all z ∈ Z p and all i ∈ N, cf., [23] [Proposition 3.38] or [32] . Therefore, we have the following: Further, from (ii) of Conditions 1, it follows (by Note 1) where the series converges p-adically for all but not more than a finite number of primes p as ∆ i f (z) tends p-adically to 0 according to Theorem 1; cf., (i) of Conditions 1. Thus, the series converges to some q (z, h) ∈ Q by (ii) of Conditions 1, and, therefore, the series converges in R to that rational number q (z, h). We where the series in the right hand part converges in R to the rational number q(z, h) ∈ Q; therefore, the absolute value |( h−1 i−1 ) from the convergence of the series in the right hand part of (9), and it it follows necessarily that lim i→∞ and, in view of Conditions 1 (i), the series in the right hand part converges p-adically in Z p , then, according to Note 2, we finally conclude that f is a polynomial over Q; hence, a polynomial of the form (7). Definition 7 (Totally consistent functions). Further in the paper, functions described by Theorem 5 are called totally consistent; C(R) denotes the class of all totally consistent functions.

Note 4.
In view of Theorem 1, the statement of Theorem 5 holds true for continuous real functions R m → R n as well. The proof is a minor modification of the proof of the said theorem and thus is omitted.

Note 5.
From the proof of Theorem 5, it follows that relaxation of Conditions 1 to functions f whose domain contains a real interval rather than coincides with the whole R does not widen the class of functions.

The Free Choice of Discreteness/Continuity
We stress once again that in the measurement of values of physical quantities, the rational padic integers Z p ∩ Q are indistinguishable from rational numbers Q since every real number can be approximated by a rational p-adic integer with any desirable accuracy. Note also that polynomials over Z are totally consistent; cf. Example 2. The theorem by M. I. Chlodovsky states that a continuous real-valued function on a real interval which does not contain integers can be uniformly approximated by polynomials over Z [33,34]. Therefore, according to Theorem 5, any continuous real function on the real interval [α, β] where 0 < α < β < 1 can be uniformly approximated (with respect to a real metric) by completely consistent functions, i.e., by functions from C all primes (R). On the other hand, Theorems 1 and 5 imply that any p-adic 1-Lipschitz function f : Z p → Z p can be uniformly approximated (with respect to the p-adic metric) by completely causal functions, regardless of which prime p is taken.
Indeed, according to (6), the function f can be represented by the Mahler expansion According to Theorem 5, given n ∈ N, we must find a polynomial g( for all z ∈ Z p . As lcm{1, 2, . . . , i} = ∏ (q) q m i,q where q m i,q is the largest power of a prime q that does not exceed i, then m i,q = log q i , and therefore c i · lcm{1, 2, . . . , i} = c i p log p i a i where a i = lcm{1, 2, . . . , i}/p log p i is in Z and is coprime to p. Hence, given b i ∈ Z p , a congruence b i ≡ c i a i (mod p n ) has an integer solution c i ∈ Z. Put c i = 0 for all i such that log p i ≥ n, and let c 0 ∈ Z be the least non-negative residue of b 0 ∈ Z p modulo p n . Then, the so-defined polynomial g is the one we need.
All the considerations already outlined in this paper may be taken as evidence in favour of the following plausible statement which answers the question to which the whole book [2] is devoted: Interpretation 1 (Observer's free choice of discreteness/continuity). Due to the inevitable nonzero error in the measurements of values of physical quantities, an observer's conclusion whether Nature on the smallest of the scales is discrete or continuous completely depends on the observer's free choice of metric with respect to which the observer processes the measured numerical data. Moreover, the very "degree of the discreteness", the number p, is subject to observer's free choice.

The Free Choice of Chaoticity/Predictability
The next important question which should be addressed is related to the 't Hooft causality postulate and can be posed as follows: Can an observer determine through numerical observational data whether Nature on the smallest of scales is random or absolutely predictable? In what follows, the second term is understood as causality, i.e., if an observer probes a system by exposing it to some impacts, reactions of the system coincide whenever impacts coincide up to a precision of measurement equipment; that is, the same causes imply same effects, so the behaviour of the system is completely predictable since a cause results in a unique effect within the measurement precision. The randomness means that the "same" causes may result in different effects. Specifically, causes whose numerical values are indistinguishable in measurement since the values coincide up to the precision of measurement equipment may result in effects which are distinguishable by measurement, i.e., differences of numerical values of respective effects exceed the measurement error. This is why we treat what follows randomness as chaos in a broad meaning since the definitive feature of chaos is its extreme sensitivity to negligible distortions/perturbations.
Recall that there are many nonequivalent mathematical notions of chaos; see, e.g., the expository paper [35]. One of the most common of these definitions is in the work of R. L. Devaney [36] [Definition 8.5] which reads as follows: Definition 8 (Devaney's chaos on metric spaces). Let F : X → X be a continuous function on a metric space X equipped with a metric d. The function F is said to be chaotic if it satisfies the following three conditions: Sensitive dependence on initial conditions: There is δ > 0 such that, for any x ∈ X and any neighbourhood A ⊂ X of x, there exists y ∈ A and n ∈ N 0 such that d(F n (x), F n (y)) > δ.
Density of periodic points: The set of all periodic points of F is dense in It is known that conditions (i)-(iii) are not independent. In [37], it is proven that sensitive dependence on the initial conditions is a redundant element in Devaney's definition because it follows from topological transitivity and denseness of the periodic points; in [38], it is shown by construction of counter examples, that neither topological transitivity nor denseness of the periodic points follow from the remaining two properties. In [39], it is proven that chaos, according to Devaney's definition, may exist in bounded but noncompact spaces without any nonperiodic orbits. For bounded metric spaces, however, the following theorem is true: Theorem 6 (C. Knudsen, [39]). Let F, X, d be as that in Definition 8; let X be bounded; let f = F| Y be a restriction of F to a dense subset Y of X. Then, we obtain the following: The following definition of chaos on a bounded metric space is from Knudsen.
Definition 9 (Knudsen's chaos on bounded metric spaces [39]). Let F be a continuous transformation of a bounded metric space X. If F has a dense orbit in X and if F exhibits sensitive dependence on the initial conditions, then F is said to be chaotic.
We stress that to the best of our knowledge, all definitions of chaos on metric spaces contain sensitive dependence on initial conditions as an inherent property; other conditions vary, but the sensitive dependence condition is always present, [40]. For other various types of chaos on compact metric spaces X, see [35]. We only mention that a continuous map F : X → X is called topologically chaotic if topological entropy of F is positive. The topological chaos implies Li-Yorke chaos, which is yet one more widely known type of chaos, for whose definition the reader is referred to [35]. In addition, positive topological entropy implies distributional chaos of type DC2, [41]. Chaos can also be defined in terms of measure-preserving transformations of measure spaces rather than of metric spaces; see [41].
The "chaos-like" behaviour may also be expressed in terms of "blending capability" which we first illustrate by an example taken from [42]. If in a cocktail shaker of volume 1 there are 10 shares of gin and 90 shares of vermouth then, after ergodic shaking, in every volume V of the shaker there will be 10 shares of gin and 90 shares of vermouth on average, whereas after strong-mixing shaking, in every V there will be approximately 10 shares of gin and 90 shares of vermouth; after weak-mixing shaking, in every V with the exception of some rare instants there will be 10 shares of gin and 90 shares of vermouth. Formally, a measurepreserving transformation F is by definition strong mixing if lim n→∞ µ(F −n (A) ∩ B)) = µ(A)µ(B) for every µ-measurable subsets A, B. Thus, if µ is a probability measure, the strong-mixing transformation, after being applied a sufficiently large number of times, makes any two "events" A, B "independent" in the probabilistic meaning. As mentioned in Section 3.4, a 1-Lipschitz measure-preserving map can be neither strong nor weak mixing; only the ergodicity is possible.
Finalising the considerations of chaos, we claim that 1-Lipschitz functions F : Z n p → Z n p are deterministic and nonchaotic with respect to chaos of any type. Indeed, due to the 1-Lipschizness, these functions exhibit no sensitive dependence on initial conditions, and their topological entropy is zero; hence, any metric entropy is zero; cf. Section 3.4. Moreover, as measure-theoretical chaos is defined only for measure-preserving maps, and as a 1-Lipschitz map F : Z n p → Z n p preserves the Haar probability measure if and only if it is an isometry, it can be easily shown that F is chaotic with respect to no type of measuretheoretic chaos defined in [41]. One may say, therefore, that totally consistent functions (see Definition 6) are the best candidates to be called superdeterministic. The latter term also must not be treated in the meaning which is common for physical theories [5] but rather as a mathematical notion to stress the "extremely nonchaotic" behaviour of the functions.
On the other hand, one may also say that totally consistent functions are similar to Ianus Bifrons: being deterministic with respect to a p-adic metric for every p, the totally consistent functions can nevertheless be chaotic if considered as real functions on a real interval. Let us consider an illustrative example.
A well-known "canonical" example of real chaotic maps, the logistic map L(x) = 2x(1 − x), maps a real closed interval [0, 1] to [0, 1]. The map L has positive entropy log 2. On the other hand, L is a polynomial with integer coefficients; hence, it is a totally consistent function, thus its entropy as a p-adic 1-Lipschitz map z → 2z(1 − z) (both topological and metric with respect to Haar probability measure) is 0, and L : Z p → Z p is not sensitive to initial conditions. The map L on Z 2 is not measure-preserving with respect to the Haar probability measure on Z 2 ; it has the only point of attraction (namely, 0) to which all orbits converge; thus, L is not topologically transitive on Z 2 .
However, the map L is ergodic on the 3-adic sphere S 1/27 (0) of radius 1/27 centred at 0 since 0 is a fixed point of L and L (0) = 2 is a generator of the group of units modulo 9; see [23] [Theorem 4.79] or [43] [ Theorem 5.7]. Specifically, the sphere S 1/27 (0) is a disjoint union of two 3-adic balls B 1/81 (27) and B 1/81 (54), the sphere is invariant under the action of L on Z 3 , and the sphere is measurable with respect to the Haar probability measure on Z 3 . Thus, the probability measure on Z 3 induces a probability measure on S 1/9 (0) with respect to which the action of L on the sphere is measure-preserving and ergodic. The set of all rational 3-adic numbers from S 1/27 (0) which lie in the real closed interval [0, 1] is dense in [0, 1] with respect to the real metric. Therefore, as 3-adic rational integers are indistinguishable from real numbers by measurement due to inevitable nonzero error, the map L can be judged as measure-preserving and ergodic. Now consider the map L on the 3-adic sphere S 1/27 (1). The sphere is a disjoint union of balls B 1/81 (28) and B 1/81 (55). The sphere is invariant under the action of L on Z 3 and is measurable with respect to the probability measure on Z 3 . The map L on S 1/27 (1) is measure-preserving with respect to the induced probability measure but is not ergodic by the criterion of ergodicity on p-adic spheres (see [23] [Theorem 4.79] or [43] [Theorem 5.7]) since L (1) = −2 is not a generator of the group of units modulo 9. The set S 1/27 (1) ∩ [0, 1] is dense in [0, 1] with respect to the real metric; therefore, by the reasoning similar to that as above, the map L can be judged as measure-preserving but not ergodic.
Finally, the map L : Z p → Z p is measure-preserving for no p as L is not a bijective modulo p; cf. Theorem 2. However, the set Z p ∩ Q ∩ [0, 1] is also dense both in Z p and in [0, 1] with respect to the p-adic and to the real metrics accordingly, so the values of the map L that takes on Z p ∩ Q ∩ [0, 1] define a unique map both on [0, 1] and on Z p for every p. However, an observer's measurement data may only be rational numbers due to the inevitable nonzero measurement error, and any rational number from [0, 1] can be approximated with arbitrarily high accuracy (with respect to the real metric) by numbers from Z p ∩ Q ∩ [0, 1] regardless of whichever p is taken. In other words, numbers from Z p ∩ Q ∩ [0, 1] (as well as from S 1/27 (1) ∩ [0, 1], or from S 1/27 (0) ∩ [0, 1]) are indistinguishable from numbers in Q ∩ [0, 1] and from numbers in [0, 1] by measurements due to nonzero measurement error, but the choice of metric (and of the dense subset) with respect to which the measured numbers are processed is crucial for the observer's conclusion whether the obtained data are completely random or satisfy a strictly deterministic law.
All these facts can be judged as evidence in favour of the following assertion.
Interpretation 2 (Observer's free choice of determinism/randomness). Due to the inevitable nonzero error in measurements of values of physical quantities, an observer's conclusion as to whether Nature on the smallest of the scales is superdeterministic or random completely depends on the observer's free choice of metric with respect to which the observer processes the measured numerical data.

p-Consistent Functions
In view of the finiteness assumption (cf. the text which follows Definition 1), Conditions 1 may appear to be too restrictive since according to physical reasons, the number of "elementary causes" and "elementary effects" cannot be arbitrarily large; therefore, it does not exceed some p. This is a motivation to introduce the following class of causal functions, the (univariate) p-consistent functions: given a prime p, we denote via C p (R) the class of all continuous (with respect to the usual metric on R) functionsf : R → R such that the following conditions are satisfied: There exists a p-adic 1-Lipschitz function f : Z p → Z p such that the following are obtained: The multivariate p-consistent functions R m → R n can be defined similarly. Loosely speaking, the functions from C p (R) "are living simultaneously in two worlds", the Archimedean one and the non-Archimedean one: anyf ∈ C p (R) defines a unique 1-Lipschitz (i.e., automaton) function f : Z p → Z p since Z p ∩ Q is dense in Z p with respect to p-adic metric, and vice versa, any f defines a unique continuous real functionf : R → R since Z p ∩ Q is dense in R with respect to the real metric (this is why in what follows, we use the same symbol f forf as well).
The functions from C p (R) may suit the best for physical modelling of causal dependencies both at the macro-and micro-scales since values of f ∈ C p (R) on, e.g., N 0 , completely define the function f on R.
From this definition, it immediately follows that any function from C p (R) can be represented via a Mahler series (6) where all c i are in Z p ∩ Q; the series converges both on R and on Z p with respect to the real and, accordingly, to the p-adic metric. It would be interesting to find necessary and sufficient conditions on the coefficients c i when the series (6) defines a C p (R)-function. The general conditions are not yet known, but nevertheless it is clear that the class C p (R) is rich; for instance, it contains not only polynomials over Z p ∩ Q but also some rational functions.
The rational functions f are differentiable with respect to both the p-adic metric and the real metric; moreover,f = f everywhere on Z p ∩ Q and f ∈ C p (R); c.f., [44] or [23] [Section 3.10.2].
For k ∈ N, we denote as C k p (R) (respectively via C ∞ p (R)) the subclass of all functions which are k-times (respectively, infinitely many times) differentiable with respect to both p-adic and real metric, whose derivatives are also in C p (R). Put C 0 p (R) = C p (R). It is natural to ask, therefore, whether there exist functions in C p (R) which are not rational functions. The answer is affirmative. Proof. The theorem can be proven by employing ideas from [45,46]. The set Z p ∩ Q is countable; let us enumerate its elements as z 1 , z 2 , . . .. Define by simultaneous induction a sequence of functions g 0 , g 1 , g 2 , . . . and integers m 0 < m 1 < m 2 < · · · as follows: put g 0 (x) = 0, m 0 = 1. For n ≥ 1, consider the following polynomial over the ring Z p ∩ Q: (r i − x) = a n,0 + a n,1 x + · · · + a n,n−1 x n−1 + (−1) n x n .
Put g n (x) = p n h n (x) (p 2n + 1) |a n,0 | + |a n,1 |m n−1 + |a n,n−1 |m n−1 n−1 + m n n−1 p where | · | is the real absolute value and r p for r ∈ R is the smallest ∈ N such that r ≤ if ≡ 0 (mod p), or r p = + 1 if ≡ 0 (mod p). Then, g n (x) is a polynomial over Z p ∩ Q and g n (c) < p −n for every c ∈ C whose complex absolute value c ≤ m n−1 . Now, if n is even, let m n be first integer larger than m n−1 such that ∑ n i=0 g i (m n ) ≥ 2. If n is odd, let m n be the first integer larger than m n−1 such that ∑ n i=0 g i (m n ) ≤ −2. Since the leading coefficient of h n (x) is (−1) n , these conditions are always true if m n is large enough. After defining all g n (x), put g(x) = ∑ ∞ i=1 g i (x). Then, the following is true: • The sum g(x) = ∑ ∞ i=1 g i (x) converges uniformly in the open complex disk D m n (0) of radius m n centred at 0 for all n ∈ N because, except for the first n terms, every term g i (c) is bounded absolutely by p −i , and the sum of these converges. • Because the uniform convergence in an open subset of C preserves analyticity, the function g is analytic on every D m n (0) and so also on the whole C.
• In the sequence (g(m k )) ∞ k=1 , the terms having odd indices k are less than −1, whereas the terms having even indices k are greater than 1 since ∑ n i=0 g i (m n ) ≥ 2 for even n and ∑ n i=0 g i (m n ) ≤ −2 for odd n, and the remaining terms in g(m n ) cannot change the whole sum for more than 1.
Thus, the function g is well-defined on the whole R; g is a continuous function with respect to the real metric, and according to the intermediate value theorem, the function g has a zero between m n and m n+1 for all sufficiently large n ∈ N. Therefore, g has infinitely many zeroes in R and thus cannot be of the form . The function g according to this construction is a complex analytic function which is analytic on the whole C; thus, the restriction of g on R is a function R → R which is infinitely many times differentiable everywhere in R, and each derivative is continuous with respect to real metric and thus is uniquely defined by its values on Z p ∩ Q, as Z p ∩ Q is dense in R with respect to the real metric.
On the other hand, given a, b ∈ Z p ∩ Q, a = b, there are unique k, n ∈ N such that a = z k , b = z n with respect to the numeration of numbers in Z p ∩ Q. If n > k, is a polynomial over Z p ∩ Q; thus, a unique continuation of g to the whole Z p is a p-adic 1-Lipschitz function. Letḡ k = g mod p k be a polynomial over N 0 obtained by the reduction modulo p k of the function g (it is clear that then degḡ k ≤ k via the construction of g). Then, the function g : Z p → Z p can be uniformly approximated by the polynomialsḡ k with respect to the p-adic sup-norm which is defined as follows: Given a p-adic 1-Lipschitz functions u : Z p → Z p , the p-adic sup-norm is max{|u(z)| p : z ∈ Z p }. Therefore the function g : Z p → Z p is a B-function, the Stone-Weierstrass completion of the polynomials over N 0 with respect to the said p-adic sup-norm; thus, g is infinitely many times differentiable with respect to the p-adic metric, all derivatives are B-functions, and thus the derivatives are uniquely defined by their values on Z p ∩ Q, as Z p ∩ Q is dense in Z p with respect to the p-adic metric; see [44] [Proposition 4.4.] or [23] [Section 3.10.2, Proposition 3.59].
Therefore, g is infinitely many times differentiable both on R and on Z p , and the values of the derivatives both with respect to the real and to the p-adic metric coincide on Z p ∩ Q. This finally proves that g is a C ∞ p (R)-function.
The C p (R)-functions exhibit a sort of "hologram-likeness". The values a C p (R)-function takes on arbitrarily small real interval, completely define the function on R and on Z p . Recall that a complete hologram can be restored from a small piece of a holography plate.
Let α, β ∈ Z p ∩ Q, α < β. Put γ = −1 if α − β ≤ −1; let γ = 1 1−p t be such that 0 > γ > α − β if α − β > −1 for a suitable t ∈ N. It is clear from what we have already proven that f (x) = g(x) for all x ∈ R if and only if f (x) = g(x) for all x ∈ (γ −1 (β − α), 0), as (γ −1 (β − α), 0) ⊃ (−1, 0). Interpretation 3 ("Causality" vs. "locality"). The proof of Theorem 8 shows that the "local" behaviour of C p (R)-functions completely defines their "global" behaviour. Given values of C p (R)function takes on an arbitrarily small neighbourhood of an arbitrary point, the values the function takes at all other points can be "restored uniquely". If the points of R are treated as "positions" and values of the function as "measurement data" of a physical system to which the function is ascribed, then the data an observer obtains by probing a system in a given position let him completely predict values of physical quantities obtained by measurements at all other positions.
This property of C p (R)-functions is especially important since various classes of real functions can be approximated by C p (R)-functions. Any continuous function g : R → R that vanishes at infinity can be uniformly approximated on R by C ∞ p (R)-functions. (recall that a continuous function g : R → R vanishes at infinity, if, for every ε > 0, there exists a compact set K ⊂ R such that |g(x)| < ε for all x ∈ R \ K).

(v)
Any continuous function g : R → R that vanishes at infinity can be uniformly approximated on R by C ∞ p (R)-functions which are automaton functions of time-reversible automata.

(vi)
Any square-integrable function g : R → R (and moreover, any function g : R → R that is integrable with its n-th power for some n ∈ N) can be uniformly approximated by C ∞ p (R)-functions which are automaton functions of time-reversible automata.
Proof. The class C ∞ p (R) contains all polynomial functions over Z. Chlodovsky theorem yields that a continuous real-valued function, which is defined on a real interval that does not contain an integer, can be uniformly approximated by polynomials over Z [33,34]. If the interval [a, b] contains integers, take α, β ∈ Z p ∩ Q such that the interval [a , b ] = [αa + β, αb + β] contains no integers (e.g., take m ∈ N such that |b − a| < p m − 1 and put α = 1/(p m − 1)). Given a continuous function g : [a, b] → R, the function g(α −1 (x − β)) can be uniformly approximated by polynomials u i (x) ∈ Z[x] on [a , b ] by Chlodovsky's theorem; thus, the function g can be uniformly approximated by poly- . This proves claim (i). To prove claim (ii), consider the functiong(x) = g(x)−x p . In view of (i), sinceg(x) is continuous on [a, b],g can be uniformly approximated by C ∞ p (R)-functions u i ; thus, g can be uniformly approximated by functions x + pu i (x) which are also in C ∞ p (R). However, given any 1-Lipschitz function u : Z p → Z p , the function z + pu(z) is 1-Lipschitz measurepreserving according to Lemma 1. Thus, the C ∞ p (R)-functions x + pu i (x) are automata functions of time-reversible automata.
To prove claim (iii), note that the function g can be uniformly approximated by polynomials w j (x) over Z p ∩ Q; c.f., the proof of (i). Then, the difference equation is. Thesẽ w j (x) can be uniformly approximated by polynomials u ji (x) over Z p ∩ Q; c.f., the proof of (i). Therefore, g can be uniformly approximated by polynomials over Z p ∩ Q of the form 1 + x + p · ∆u(x) which are all ergodic according to Lemma 1.
To prove claim (iv), consider functions of the form , and deg u(x) ≤ deg v(x). All these functions vanish at infinity, are C ∞ p (R)-functions (c.f., Example 3) and separate points. Therefore, the R-algebra A generated by the set A of all these functions satisfies conditions of the Stone-Weierstrass theorem for locally compact spaces, i.e., the algebra is dense with respect to the topology of the uniform convergence in the Banach algebra of all real-valued continuous functions on R which vanish at infinity. However, the set A is dense in A.
In order to prove claim (v), note that in view of the proof of claim (iv), it suffices to approximate uniformly on R the functions of the form h( 1+pv(x) 2 vanishes at infinity; moreover, this function is a C ∞ p (R)-function, and it is a measurepreserving 1-Lipschitz function Z p → Z p since it is bijective modulo p, and its derivative modulo p vanishes nowhere; c.f., [44] i=0 over Z p ∩ Q that converges to 1/p in R, we conclude that the function h(x) can be uniformly approximated on R by C ∞ p (R)-functions which are measure-preserving 1-Lipschitz functions Z p → Z p ; that is, automaton functions of time-reversible automata.
It is well known that functions which are integrable with their n-th powers, for some n ∈ N, can be uniformly approximated by Schwartz functions; but the latter are smooth and vanish at infinity. With (v), this proves claim (vi) and the theorem. Interpretation 4 (Observer's free choice of arrow of time). Due to the inevitable nonzero error in measurements of values of physical quantities, an observer's conclusion on the direction of "arrow of time" completely depends on the observer's free choice of metric with respect to which the observer processes the measured numerical data: according to claims (ii) and (v)-(vi) of Theorem 9, "causes" can be recovered from "effects", with any desirable accuracy. The "entropic arrow of time" also depends on the choice of metric since the value of entropy does as well; c.f., Section 4.4.
Theorem 10 (On finite automata C 1 p (R)-functions). Let a finite automaton function f ∈ C 1 p (R); i.e., let f be differentiable both over R and over Z p ; let f ∈ C p (R). Then, f is an affine function over Z p ∩ Q; i.e., f (x) = ax + b for suitable a, b ∈ Z p ∩ Q. Vice versa, all these affine functions are finite automaton functions from C ∞ p (R).
Proof. Given a 1-Lipschitz function f for n ∈ N 0 , k ≥ log p n + 1, consider functions f n,k : Z p → Z p which are defined as follows: for all z ∈ Z p . The function f is an automaton function of a finite automaton if and only if the collection F of function f n,k (where n ∈ N 0 , k ∈ N = {1, 2, 3, . . .}, k ≥ log p n + 1) contains only a finite number of pairwise distinct functions. Note that f n,k is the automaton function that corresponds to the automaton A(s(n k )) = F p , S, F p , S, O, s(n k ) , where s(n k ) ∈ S is the state the automaton A = A f = F p , S, F p , S, O, s 0 reaches after it has been fed by the input word n k (of length p k ) that corresponds to the base-p expansion of n (so the word n k may contain some leading zeros that correspond to higher order digits of the expansion). That is, there are N, K ∈ N such that for every n ∈ N 0 , k ∈ N, one findsň ≤ N,ǩ ≤ K such that f n,k (z) = fň ,ǩ (z) for all z ∈ Z p . Take z k = h p 2k −1 where h ∈ N. Note that z k ∈ Z p ∩ Q and that lim k→∞ p k z k = 0 both with respect to real metric and to the p-adic metric. Then, where lim k→∞ f (n+p k z k )− f (n) p k z k = f (n) both with respect to the real metric and to the p-adic metric. Thus, from (11) it follows that lim p k→∞ ( f n,k (u) − f n,k (0)) = u f (n) for every u ∈ Z p , n ∈ N 0 . However, f (n) is a derivative at n ∈ N 0 both with respect to the real metric and to the p-adic metric; however, f (n) may take only a finite number of values due to the finiteness of the number of pairs n, k which enumerate pairwise distinct f n,k . Therefore, f (z) may take not more than a finite number of values on Z p since any z ∈ Z p is a p-adic limit of some sequence over N 0 , and f is a continuous function Z p → Z p according the conditions of the theorem. Hence, f may take not more than a finite number of values on Z p ∩ Q and thus on R since Z p ∩ Q is dense in R and since f is a continuous real function according the conditions of the theorem. Therefore, the derivative f is a constant function over R and thus over Z p ∩ Q and over Z p ; that is, f (x) = ax + b for some a, b ∈ Z p ∩ Q. Proposition 1 proves the converse claim of the theorem. Note 6. The theorem remains true for multivariate C p (R)-maps F : Z m p → Z n p as well: affine maps over Z p ∩ Q are the only maps which satisfy the multivariate version of Theorem 10. This can be proven by a similar argument, the details of which are omitted.

Interpretation 5 (Finiteness implies linearity)
. This result may serve as a sort of hint as to why the mathematical formalism of quantum mechanics is the theory of linear operators over Hilbert space. As all "real-world" systems have a finite number of states, then when the duration of the temporal interval measured in the smallest (say, Planck) time units becomes comparable to the number of states, the finiteness reveals itself as the linearity.

In the Middle of the Scales
In Section 1, we conjectured that if both "continuous" and "discrete" theories adequately describe physical reality at respective "ends of the scale", the theories must "meet one another somewhere in the middle of the scale". In this Section, we argue that the "meeting point in the middle of the scale" is the wave function. To do this, we first need to formalise the notion of an observer; actually, we will consider observers of two kinds, each for the respective ends of the scale.

Observation and Measurement at the Ends of the Scale
To begin with, let us introduce two types of observers, the Big-endian and the Littleendian. The names of the two observers are more related to big-end and little-end orders the bytes of representation of a number are read in computer science and less with Gulliver's Travels by Jonathan Swift. Given a large non-negative number having a very long base-p expansion, the Big-endian is capable of observing only the highest order digits of the expansion, i.e., he knows the order of magnitude of the number and (up to a nonzero error) a mantissa since the Big-endian is not able to see the rightmost digits of the numbers. Conversely, the Little-endian sees the rightmost digits of the number, starting with the smallest order digit, but has no idea what are the leftmost digits and the order of magnitude of the number (although he assumes that the order is finite but very large). One may call the Big-endian a macro-observer and the Little-endian a micro-observer. However, both observers measure observable values which are rational p-adic integers. As already mentioned, real numbers are indistinguishable during measurements from rational p-adic integers Z p ∩ Q due to the inevitable nonzero measurement error with respect to real metrics. This is why we assume that numerical values of observable are in Z p ∩ Q, and the Little-endian sees the first terms of the canonical p-adic expansion of the observable value, whereas Big-endian sees the highest order digits of the base-p expansion of the same value as of a real number. We explain this more formally.
To illustrate what Big-endian observations and Little-endian observations are, let r = 1, t 1, α 0 = 1, β t−2 = β t−1 = 0; thus, both "-endians" measure physical quantity z that takes values in [0, 1]. Then, as none of the observers is able to measure the value z with a nonzero error, the Big-endian will obtain only the digitsβ t−2 ,β t−3 , . . . ,β t−n for some n < t; meanwhile, the Little-endian will obtain β 0 , β 1 , . . . , β m for some m < t − 1. Thus, the only information about z which possibly is common for both observers is the values β t− , β t− −1 , . . . , β k for some k > 0, > 1. The two observers may communicate with each other and thus make only common guesses about what z is. Moreover, both do not know what t is; therefore, as t 1 , the only thing that both observers may know for sure is that 1 ≥ z ≥ 1 − 1/p. Note that there are no "hidden variables" in this scenario since both observers may unboundedly increase the precision of their measurement despite neither being able to measure quantities with a nonzero error.

p-adic Clocks
In this section, we introduce a p-adic model of the instrument which measure and indicates time, a p-adic clock; then, we prove that there exist only one clock, which is the same for all Little-endians and Big-endian, the universal clock.
A timekeeping element of the contemporary physical clock is a harmonic oscillator of a particular frequency, which is assumed to be a positive integer showing the number of periods per unit interval; therefore, the shortest time interval which can be measured is a reciprocal of the frequency. In order to measure the value of time elapsed, one merely counts the number of periods from one moment of time to another and represents this non-negative integer in some base, say, p, where p is the frequency of the oscillator. In what follows, we assume that p is a prime as to not overload the exposition with unimportant technical details. Thus, a model of such clock can be represented using the p-adic odometer, a dynamical system f = τ p : z → z + 1 on the space of p-adic integers Z p . If the initial point is and then the base-p expansion of x i represents the time elapsed, i = ∑ {0, 1, . . . , p − 1} is the j-th digit of the base-p expansion of i. In loose terms, the p-adic clock is simply a counter whose face consists of windows; at each time moment i, each j-th window shows δ j (i). It is convenient to assume that the number of windows is infinite to have the time elapsed be unrestricted; thus, we obtain the dynamical system τ p on Z p . Note that the initial state x 0 may be taken arbitrarily and not necessarily as x 0 = 0; then, to get the base-p representation of time elapsed since the initial moment, one has to perform subtraction x i − x 0 in Z p . The p-adic clock is depicted in Figure 3. To the right, the content of the registry is similar to a standard representation of time in decimal (rather than p-ary) fractions of a second (millisecond, microsecond, nanosecond, ...) with Planck time at the rightmost position; meanwhile, to the left are decimal multiples of a second (petasecond, exasecond, ...). 23 submitted to Entropy 28 of 48 infinite to have the time elapsed unrestricted; thus we get the dynamical system τ p on Z p . 1233 Note that initial state x 0 may be taken arbitrary, and not necessarily x 0 = 0; then to get the 1234 base-p representation of time elapsed since the initial moment one has to perform subtrac-1235 tion x i − x 0 in Z p . The p-adic clock is depicted by Figure 3. To the right, the content of the 1236 registry is like a standard representation of time in decimal (rather than p-ary) fractions 1237 of a second (millisecond, microsecond, nanosecond, ...) with Planck time the rightmost 1238 position; whereas to the left are decimal multiples of a second (petasecond, exasecond,...). 1239 Figure 3. The p-adic clock Speaking loosely, the registry in Figure 3 is like a face of a mechanical counter consist-1240 ing of cogwheels: The period of the sequence of states of the rightmost cell of the registry 1241 (which can be judged as the rightmost the rightmost cogwheel) is p, the period of the se-1242 quence of states of the second rightmost cell is p 2 since the figure in that cell changes once 1243 in a period of the rightmost cell, etc. The latter property is a definitive property of an 1244 ergodic transformation on Z p , cf. Theorem 3.10; therefore, all ergodic 1-Lipschitz trans-1245 formations on Z p should be considered as clocks, cf. (ii) of Theorem 3.12 as they can be 1246 'adjusted' one to another since they all are conjugate to the p-adic odometer.

1247
If initial state of the odometer is taken to be 0 (i.e., each cell of the registry depicted 1248 by Figure 3 is 0) then after n ∈ N time units elapsed the registry will contain the base-p 1249 expansion of the number n since τ n p (0) = n. Let us now take any ergodic 1-Lipschitz map 1250 f : Z p → Z p , take any t ∈ Z p and any sequence (n i ) ∞ i=0 over N 0 which converges p-adically 1251 to t (such a sequence exists as N 0 is dense in Z p ). It turns out that then for any z ∈ Z p 1252 the p-adic limit lim p i→∞ f n i (z) exists; denote this limit via f t (z), then (z; t) → f t (z) is a 1-1253 Lipschitz map Z 2 p → Z p which is measure-preserving with respect to t, see [3, Propositions 1254 4.87-4.88, 4.90]. Therefore, p-adic time t is well-defined. For instance (see [3,Example 4.89]), 1255 given an ergodic affine map f (z) = az + b on Z p , the 2-variate function f t (z) is of the form 1256 f t (z) = bt + z if a = 1, and if a = 1. Note that if affine map z → az + b is ergodic then b ≡ 0 (mod p) and a ≡ 1 1258 (mod p) (see [3,Theorem 4.36]); so both a t and a t −1 a−1 are well-defined p-adic integers, for 1259 every t ∈ Z p .

1260
The problem which immediately arises is that p-adic time t is well defined for every 1261 t ∈ Z p , but if q is a prime number distinct from p the p-adic time t may be meaningless 1262 for a q-adic observer, the q-adic Little-endian, not speaking of a Big-endian. But fortu-1263 nately there is a clock (and therefore time) which is common both for all Little-endians and for 1264 Big-endian. This clock/time is unique up to direction of time arrow. It is clear that the clock 1265 which is common for all p-adic Little-endians and for the Big-endian must be a totally 1266 consistent function. The following theorem holds: Speaking loosely, the registry in Figure 3 is like a face of a mechanical counter consisting of cogwheels. The period of the sequence of states of the rightmost cell of the registry (which can be judged as the rightmost cogwheel) is p, the period of the sequence of states of the second rightmost cell is p 2 since the figure in that cell changes once in a period of the rightmost cell, etc. The latter property is a definitive property of an ergodic transformation on Z p ; cf. Theorem 2. Therefore, all ergodic 1-Lipschitz transformations on Z p should be considered to be clocks, cf. (ii) of Theorem 3, as they can be "adjusted'" one to another since they all are conjugate to the p-adic odometer.
If the initial state of the odometer is taken to be 0 (i.e., each cell of the registry depicted by Figure 3 is 0), then after n ∈ N time units elapse, the registry will contain the base-p expansion of the number n since τ n p (0) = n. Let us now take any ergodic 1-Lipschitz map f : Z p → Z p , and any t ∈ Z p and any sequence (n i ) ∞ i=0 over N 0 which converges p-adically to t (such a sequence exists as N 0 is dense in Z p ). It turns out then that for any z ∈ Z p , the p-adic limit lim p i→∞ f n i (z) exists; denote this limit via f t (z), then (z; t) → f t (z) is a 1-Lipschitz map Z 2 p → Z p which is measure-preserving with respect to t; see [23] [Propositions 4.87-4.88, 4.90]. Therefore, p-adic time t is well-defined. For instance (see [23] [Example 4.89]), given an ergodic affine map f (z) = az + b on Z p , the two-variate function f t (z) is of the form f t (z) = bt + z if a = 1, and if a = 1. Note that if the affine map z → az + b is ergodic then b ≡ 0 (mod p) and a ≡ 1 (mod p) (see [23] [Theorem 4.36]); thus, both a t and a t −1 a−1 are well-defined p-adic integers for every t ∈ Z p .
The problem which immediately arises is that p-adic time t is well-defined for every t ∈ Z p , but if q is a prime number distinct from p, the p-adic time t may be meaningless for a q-adic observer, the q-adic Little-endian, not to mention the Big-endian. Fortunately, however, there is a clock (and therefore time) which is common both for all Little-endians and Big-endian. This clock/time is unique up to the direction of the time arrow. It is clear that the clock, which is common for all p-adic Little-endians and Big-endian, must be a totally consistent function. The following theorem holds: Theorem 11. Totally consistent functions which are measure-preserving for all prime p are exactly the functions x → ±x + c, where c ∈ Z; only the functions τ ± (x) = x ± 1 are ergodic for all prime p.
Proof of Theorem 11. According to Theorem 5, any totally consistent function g is a polynomial; therefore, to be measure-preserving on Z p , g must be (1) bijective modulo p and (2) its derivative g (x) must vanish modulo p nowhere for all prime p; see, e.g., [23] [Theorem 4.45]. As g is 1-Lipschitz on Z p for all prime p , and g (x) is a polynomial, the derivative exists and takes values from Z p for all prime p; hence, g (Z) ⊂ Z. Therefore, (as g (x) is a polynomial), condition (2) implies that g (Z) ∈ {1, −1}, which means that g is a constant, ±1. This means that g is the affine function, namely, either g(x) = −x + c or g(x) = x + c for some c ∈ Z (since g(0) must be an integer as g(Z) ⊂ Z due to total consistency). This proves the claim concerning measure-preservation.
The ergodicity claim follows from the ergodicity criterion for affine maps z → az + b which implies that if the map is ergodic on Z p then a ≡ 1 (mod p) and b ≡ 0 (mod p); see [23] [ Theorem 4.36]. As these conditions must hold for all prime p, we conclude that a = 1 and b ∈ {1, −1}.
Interpretation 6 (Free choice of temporal ordering at the smallest of scales). The only clock that is common for "both ends of the scale" is the standard odometer τ(t) = t 0 + t which shows the time t − t 0 ∈ R elapsed since the moment t 0 ∈ R. All observers acquire the value of the time elapsed up to a nonzero error with respect to the corresponding metrics. Therefore, in a contrast to a real observer (the Big-endian) the p-adic observers (the Little-endians) generally cannot determine with the "time stamps" of events which one of the two events happened earlier and which one later since there is no order on the field of p-adic numbers which agrees with field operations.

Note 7.
It is known that generally there is no ordering of events in quantum mechanics; see, e.g., [48].

Digitalization
Initial automaton is a model of a (generally open) physical system prepared in some fixed state; the system is exposed by an experimenter to a time series of "elementary impacts" and thus produces the time series of "elementary reactions". The impacts/reactions occurs at discrete instants of time since time is assumed to be discrete; for example, at Planck's scale, the smallest time interval is Planck time 5.391247(60) × 10 −44 s. Concrete values of that smallest time interval depend on the process which is modelled (e.g.,. in smart contracts of digital economy the smallest interval is usually assumed to be 24 h) and are not specified; the definitive feature of the model is that "time flow" consists of "indivisible time intervals".
The experimenter prepares a number of identical systems in the same state and probes them by exposing them to different impacts, observing reactions and thus obtaining a number of experimental points ( impact ; reaction ), where are measured values of components of the impact-reaction pair. The experimenter then treats any measured value as a real number up to a nonzero real error.
In order to not overload the exposition, in what follows we consider a one-dimensional case mostly when the values impact and reaction are numbers rather than vectors. Up to normalisation, we may assume that the measured numerical values are all in the unit real interval [0, 1]; thus, the experimenter obtains a number of experimental points in the real unit square [0, 1] × [0, 1] = I 2 ⊂ R 2 . Namely, given an automaton A, let f = f A : Z p → Z p be its automaton function (i.e., a 1-Lipschitz map). Consider a subset E( f ) of all the following points of the Euclidean unit square I 2 = [0, 1] × [0, 1] ⊂ R 2 : . Note that f (x) mod p k corresponds to a k-letter output word ξ k−1 · · · · · · · · · ξ 1 ξ 0 of the automaton which is fed by the k-letter input word χ k−1 · · · · · · · · · χ 1 χ 0 which corresponds to x mod p k ; cf.
. Note that f (x) mod p k corresponds to a k-1319 letter output word ξ k−1 • • • • • • • • • ξ 1 ξ 0 of the automaton which is fed by the k-letter input 1320 word χ k−1 • • • • • • • • • χ 1 χ 0 which corresponds to x mod p k , cf. Figure 4. Further, though all the word lengths k are finite, the clustering is equivalent to send-1322 ing k → ∞. Therefore the clustering is equivalent to taking limit points of the closure P( f ) 1323 of the set E( f ) with respect to standard topology of R 2 . We call P 2 ( f ) a plot of f . Speaking 1324 very loosely, the plot is a picture the experimenter obtains as an output of the experiment 1325 which consists of a number of individual probes of a physical system which is prepared 1326 in the same state before each probe. Note that the set of cluster points of the pictures the both 1327 experimenters, the Little-endian and the Big-endian, obtain as result of the experiment look very 1328 similar for the both since Little-endian makes the word lengths as long as possible to construct the 1329 cluster points while the Big-endian is capable to get only the points which correspond to sufficiently 1330 long words, i.e., the points which are close to the cluster points. This fact is crucial for future con-1331 struction of wave function by the both experimenters as well as for the uncertainty relation on 1332 which the both agree. 1333 Let us describe this procedure more formally. For ζ −1 p −1 be integral and fractional parts of s, respectively. Recall that any complex charac-1336 ter of additive group Q + p of the field Q p of p-adic numbers is of the form χ r (s) = e 2πi{sr} p , 1337 where r ∈ Q p ; χ r is a continuous group epimorphism into the group of complex roots of 1338 unity (which is isomorphic to the group Q + / Z + ). Take r = 1, denote χ 1 = χ; given a 1339 1-Lipschitz map f : Z p → Z p , consider mappings 1340 f k : e 2πi{p −k z} p → e 2πi{p −k f (z)} p , (z ∈ Z p ), for all k ∈ N 0 . As everyf k maps points of the unit circle S into points of S, the pairs 1341 (e 2πi{p −k z} p ; e 2πi{p −k f (z)} p ) constitute a set of points on the unit torus T 2 = S × S. The unit 1342 square I 2 is a universal cover of the torus T 2 ; this way the points e f k (z) ∈ I 2 are identified 1343 with the points (e 2πi{p −k z} p ; e 2πi{p −k f (z)} p ) ∈ T 2 , and in what follows we do not differ the 1344 both point sets and speak either of the points on the surface of the torus T 2 or on the square 1345 I 2 when and what is more convenient. in the square I 2 (or, of all the points (e 2πi{p −k z} p ; e 2πi{p −k f (z)} p ) in the torus T 2 ), where 1349 k ∈ N, z ∈ Z p , is called a (1-dimensional) plot of the automaton A, or, which is the same, 1350 of the automaton function f = f A . The set P ( f ) = P (A) of all the limit points of the plot, 1351 the derived set of the set P( f ) = P(A), is called the limit plot of the automaton A (of the 1352 automaton function f A ).

1353
Recall that limit point, accumulation point, or cluster point are synonymic notions of 1354 the point such that every neighbourhood of which contains points other than that point. 1355 Recall also that derived set of a closed set is also closed; so P ( f ) = P (A) is closed. Being 1356 closed, the set P(A) is measurable with respect to the Lebesgue measure on R 2 ; denote 1357 Further, although all the word lengths k are finite, the clustering is equivalent to sending k → ∞. Therefore, the clustering is equivalent to taking limit points of the closure P( f ) of the set E( f ) with respect to the standard topology of R 2 . We call P( f ) a plot of f . Speaking very loosely, the plot is a picture the experimenter obtains as an output of the experiment which consists of a number of individual probes of a physical system which is prepared in the same state before each probe. Note that the set of cluster points of the pictures for both experimenters, the Little-endian and the Big-endian, obtained as result of the experiment look very similar for the both since Little-endian makes the word lengths as long as possible to construct the cluster points while Big-endian is only capable of obtaining the points which correspond to sufficiently long words, i.e., the points which are close to the cluster points. This fact is crucial for the future construction of wave function by the both experimenters as well as for the uncertainty relation on which the both agree.
Let us describe this procedure more formally. For s = ∑ ∞ j=−k ζ j p j ∈ Q p , (ζ j ∈ {0, 1, . . . , p − 1}, j ∈ Z), let [s] p = ζ 0 + ζ 1 p + ζ 2 p 2 + · · · ∈ Z p and {s} p = ζ −k p −k + · · · + ζ −1 p −1 be the integral and fractional parts of s, respectively. Recall that any complex character of additive group Q + p of the field Q p of p-adic numbers is of the form χ r (s) = e 2πi{sr} p , where r ∈ Q p ; χ r is a continuous group epimorphism into the group of complex roots of unity (which is isomorphic to the group Q + /Z + ). Take r = 1, denote χ 1 via χ; given a 1-Lipschitz map f : Z p → Z p , consider the mappingš for all k ∈ N 0 . As everyf k maps points of the unit circle S into points of S, the pairs (e 2πi{p −k z} p ; e 2πi{p −k f (z)} p ) constitute a set of points on the unit torus T 2 = S × S. The unit square I 2 is a universal cover of the torus T 2 ; this way, the points e f k (z) ∈ I 2 are identified with the points (e 2πi{p −k z} p ; e 2πi{p −k f (z)} p ) ∈ T 2 , and in what follows, we do not differ between the point sets and speak either of the points on the surface of the torus T 2 or on the square I 2 , whichever is more convenient. ) in the square I 2 (or of all the points (e 2πi{p −k z} p ; e 2πi{p −k f (z)} p ) in the torus T 2 ), where k ∈ N, z ∈ Z p is called a (one-dimensional) plot of the automaton A or, similarly, of the automaton function f = f A .
The set P ( f ) = P (A) of all the limit points of the plot, the derived set of the set P( f ) = P(A), is called the limit plot of the automaton A (of the automaton function f A ).
Recall that the limit point, accumulation point, or cluster point is a synonymic notion of the point such that every neighbourhood of which contains points other than that point. Recall also that the derived set of a closed set is also closed; thus, P ( f ) = P (A) is closed. Being closed, the set P(A) is measurable with respect to the Lebesgue measure on R 2 ; denote as Example 5. Automata may be infinite and measure-0; constants may be measure-1:

•
The automaton whose automaton function is f (z) = z + (z 2 OR(− 1 3 )) , (z ∈ Z 2 ), is infinite and measure-0. Here, OR is bit-by-bit logical ∨ with no carries to higher order bits; that is, if z = ∑ ∞ j=0 ζ j 2 j , then zOR(− 1 The automaton whose automaton function is f (z) = C where C is a p-adic integer whose canonical representation corresponds to a Champernowne word is a measure-1 automaton.
In short, Theorems 12 and 14 imply that plots of finite automata cannot contain "figures" but may contain "lines". These lines are of the utmost importance in further considerations since they may naturally be treated as "experimental curves" obtained by probing a physical system both by Little-endian and Big-endian observers. It turns out that smooth lines from limit plots of finite automata are windings of torus; therefore, the lines may be treated as sine waves, so the smooth lines in the limit plot of a finite automaton constitute a collection of sine waves. Moreover, the waves are limit plots of finite affine automata. Now, we express these facts rigorously.
Recall that a knot is a smooth embedding of a circle S into R 3 and a link is a smooth embedding of several disjoint circles in R 3 ; cf. [51]. We will consider only special types of knots and links, namely, torus knots and torus links. Informally, a torus knot is a smooth closed curve without intersections which lies completely in the surface of a torus T 2 ⊂ R 3 , and a link (of torus knots) is a collection of (possibly knotted) torus knots; see, e.g., [52] [Section 26] for formal definitions.
We also need a notion of a winding of a torus. Formally, a winding of a torus is any geodesic on a torus. Recall that geodesics on torus T 2 are images of straight lines in R 2 under the mapping (x; y) → (x mod 1; y mod 1) of R 2 onto T 2 = R 2 /Z × Z; cf., e.g., [53] [Section 5.4].

Definition 11 (Winding of the torus).
A winding of the torus is an image of a straight line in R 2 under the map mod1 : (x; y) → (x mod 1; y mod 1) of the Euclidean plane R 2 onto the 2-dimensional real torus T 2 = R 2 /Z × Z = S × S ⊂ R 3 . If the line is defined by the equation y = ax + b, we say that a is a slope of the winding C(a, b). We denote via C(∞, b) a winding which corresponds to the line x = b, the meridian, and say that the slope is ∞ in this case. Windings C(0, b) of slope 0 (i.e., the ones that correspond to straight lines y = b) are called parallels.
In dynamics, windings of torus T 2 are viewed as orbits of linear flows on the torus; that is, of dynamical systems on T 2 defined by a pair of differential equations of the form dx dt = β; dy dt = α on T 2 and thus by a pair of parametric equations x = (βt + τ) mod 1; y = (αt + σ) mod 1 in Cartesian coordinates; cf., e.g., [54] [Section 4.2.3].

Note 9.
It is well known that a winding defined by the straight line y = ax + b is dense in T 2 if and only if −∞ < a < +∞ and the slope a = α β is irrational; see, e.g., [54]  Theorem 15 which follows states that C 2 -smooth lines (i.e., those which are twice differentiable and have continuous second derivatives) in P ( f A ) are windings of the torus T 2 provided the automaton A is finite; cf., Figures 5 and 6.
Theorem 15 ([10]). Let f : Z p → Z p be an automaton function of a finite automaton; let g be a C 2 -function with domain [a, b] ⊂ [0, 1) ⊂ R and range [0, 1) ⊂ R. Let the graph G(g) = {(x; g(x)) : x ∈ [a, b]} of the function g lie completely in P( f ). Then, there exist a, b ∈ Q ∩ Z p such that g(x) = (ax + b) mod 1 for all x ∈ [a, b]; moreover, there is a winding of the torus T 2 which lies completely in P( f ) and which contains the graph G(g) of the function g. There are not more than a finite number of pairwise distinct windings of the unit torus T 2 in P 2 ( f ); all of these are images of real affine functions x → ax + b for a, b ∈ Z p ∩ Q under the mapping mod1 : R 2 → T 2 .

1451
Note 5.14. The C 2 -smoothness condition can be relaxed: C 1 -smoothness is sufficient to 1452 ensure the affinity, see [36].

1453
Though Theorem 5.13 after proper re-statement holds for m-variate 1-Lipschitz maps 1454 f : Z m p → Z m p as well, see [6], we restrict considerations in the rest of the paper mostly by a 1455 univariate case for simplicity. 1456 The torus link which is a limit plot of a finite automaton affine function f : z → az + b 1457 on Z p is completely described by the following theorem: 1458 Theorem 5.15 ([6]). Given a finite automaton affine function f : z → az + b on Z p , (i.e., such 1459 that a, b ∈ Z p ∩ Q), represent a, b as irreducible fractions: a = α β ; b = α β , where α, β, α , β ∈ Z, 1460 β, β ≡ 0 (mod p). Then the limit plot P ( f ) on the torus T 2 is a torus link which consists of 1461 N torus windings whose slope is a, where N = mult p , (ρ ∈ R).

1465
In cylindrical coordinates, every torus winding x → ax + b of a torus which is ob-1466 tained by revolving around Z-axis a circle which is coplanar with the axis, has radius r 1467 and center at the distance R from the origin, can be represented by the following paramet-1468 ric equations If a ∈ Z p ∩ Q then a is irreducible fraction α/ beta where α, β ∈ Z and p β; then cor-1470 responding winding winds β times around Z-axis and |α| times around a circle in the 1471 interior of the torus, whereas the sign of α determines whether the rotation is clockwise or 1472 counter-clockwise. Hence "physical meaning" that can be ascribed to the coefficient a = α β 1473  Note 10. The C 2 -smoothness condition can be relaxed: C 1 -smoothness is sufficient to ensure the affinity; see [55].
Although Theorem 15, after proper restatement, holds for m-variate 1-Lipschitz maps f : Z m p → Z m p as well, see [10], we restrict considerations in the rest part of the paper mostly by a univariate case for simplicity.
The torus link which is a limit plot of a finite automaton affine function f : z → az + b on Z p is completely described by the following theorem: . Given a finite automaton affine function f : z → az + b on Z p , (i.e., such that a, b ∈ Z p ∩ Q), represent a, b as irreducible fractions: a = α β ; b = α β , where α, β, α , β ∈ Z, β, β ≡ 0 (mod p). Then, the limit plot P ( f ) on the torus T 2 is a torus link which consists of N torus windings whose slope is a, where N = mult p β d is a multiplicative order of p modulo β d , d = gcd(β, β ) is the greatest common divisor of β, β , and N = 1 if β d = 1. Every torus winding is a graph of the complex-valued function ψ(ρ, k) : R → C on the torus T 2 for a suitable In cylindrical coordinates, every torus winding x → ax + b of a torus that is obtained by revolving around Z-axis of a circle that is coplanar with the axis and has radius r and a centre at the distance R from the origin can be represented by the following parametric equations  If a ∈ Z p ∩ Q, then a is irreducible fraction α/β where α, β ∈ Z and p β; then, corresponding winding winds β times around the Z-axis and |α| times around a circle in the interior of the torus, whereas the sign of α determines whether the rotation is clockwise or counter-clockwise. Hence, "physical meaning" that can be ascribed to the coefficient a = α β of the affine map z → az + b, (z ∈ Z p ), which is a finite automaton function of affine automaton if and only if a, b ∈ Z p ∩ Q, is frequency (or, as a wavenumber, under a proper choice of units). The choice of sign + or − depends only on what direction of rotation is assumed to be "positive" or "negative"; thus, polarization and spin can be ascribed to the sign of a in relevant models.
Theorem 16 in view of representation (15) implies that the limit plot of a finite automaton whose function is z → az + b, (where a, b ∈ Z p ∩ Q, z runs over Z p ) is in one-to-one correspondence to a complex-valued function ψ : R × N 0 → C : It is worth noting that the function ψ(x, k) is well-defined for all k ∈ Z since p is the invertible modulo β /d and thus e −2πip k b is well defined for every k ∈ Z; cf., Theorem 16.
Note 11. According to Theorem 16, different affine functions z → az + b may have identical limit plots. For instance, all the functions f (z) = z + c where c ∈ Z p ∩ Q have identical limit plots which correspond to the function ψ(x) = e ix . Note also that whenever a limit plot of a finite automaton A is the same as that of the finite automaton whose automaton function f is affine, f (z) = az + b, there exist a minimal subautomaton of A (i.e., the one having no subautomata other than itself) which has exactly the same limit plot; see Figures 7 and 8. A finite automaton is minimal if and only if its reduced state transition diagram is totally connected: Given two states s, t ∈ S, there is finite word w such that when the automaton in state s accepts the word w, the automaton changes its state to t. If an automaton reaches a state which belongs to its (minimal) subautomaton, the automaton will never reach a state which does not belong to the subautomaton. Theorem 5.15 in view of representation (5.15) implies that limit plot of a finite automa-1479 ton whose function is z → az + b, (where a, b ∈ Z p ∩ Q, z runs over Z p ) is in one-to-one 1480 correspondence to a complex-valued function ψ : R × N 0 → C : It is worth noting that the function ψ(x, k) is well defined for all k ∈ Z since p is invertible 1482 modulo β / d and thus e −2πip k b is well defined for every k ∈ Z, cf., Theorem 5.15.

1483
Note 5.16. By Theorem 5.15, different affine functions z → az + b may have identical limit 1484 plots. For instance, all the functions f (z) = z + c where c ∈ Z p ∩ Q have identical limit 1485 plots which correspond to the function ψ(x) = e ix . Note also that whenever a limit plot 1486 of a finite automaton A is the same as of finite automaton whose automaton function f 1487 is affine, f (z) = az + b, there exist a minimal sub-automaton of A (i.e., the one having no 1488 sub-automata other than itself) which has exactly the same limit plot, see Figures 7 and 8. 1489 A finite automaton is minimal if and only if its reduced state transition diagram is totally 1490 connected: Given two states s, t ∈ S, there is finite word w such that when the automaton 1491 in state s accepts the word w, the automaton changes its state to t. If an automaton reaches 1492 a state which belongs to its (minimal) sub-automaton, the automaton will never reach a 1493 state which does not belong to the sub-automaton.   Figures 9-10 show limit plot of a constant func-1495 tion which is an automaton function of finite autonomous automaton; autonomous au-1496 tomata may be judged as models of either isolated or closed physical systems. Parallel lines 1497 shown by Figure 9 may be ascribed to energy levels.

1498
The rest examples are non-autonomous automata; these can serve as models of open 1499 physical systems. Figures 9-10 depict limit plots produced of an autonomous automaton 1500 whose state transition diagram depicts Figure 11. Figures 12-13 show limit plot of an au-1501 tomaton having two minimal sub-automata; the state transition diagram of the automaton 1502 is shown by Figure 14.
1503 Figure 15 represents a plot of a finite automaton which approximates a measure-1 1504 (whence, infinite) automaton whose automaton function is z → 1 + 3z + 2z 2 , (z ∈ Z 2 ). 1505 Note pronounced straight lines in the plot; these lines constitute limit plot of a minimal 1506 sub-automaton.
hat the function ψ(x, k) is well defined for all k ∈ Z since p is invertible 1482 hus e −2πip k b is well defined for every k ∈ Z, cf., Theorem 5.15. 1483 rem 5.15, different affine functions z → az + b may have identical limit 1484 , all the functions f (z) = z + c where c ∈ Z p ∩ Q have identical limit 1485 ond to the function ψ(x) = e ix . Note also that whenever a limit plot 1486 on A is the same as of finite automaton whose automaton function f 1487 + b, there exist a minimal sub-automaton of A (i.e., the one having no 1488 r than itself) which has exactly the same limit plot, see Figures 7 and 8. 1489 is minimal if and only if its reduced state transition diagram is totally 1490 wo states s, t ∈ S, there is finite word w such that when the automaton 1491 e word w, the automaton changes its state to t. If an automaton reaches 1492 gs to its (minimal) sub-automaton, the automaton will never reach a 1493 t belong to the sub-automaton.   Figures 9 and 10 show the limit plot of a constant function which is an automaton function of finite autonomous automaton; autonomous automata may be judged as models of either isolated or closed physical systems. Parallel lines shown by Figure 9 may be ascribed to energy levels.
The remaining examples are nonautonomous automata; these can serve as models of open physical systems. Figures 9 and 10 depict limit plots produced of an autonomous automaton whose state transition diagram depicts Figure 11. Figures 12 and 13 show the limit plot of an automaton having two minimal subautomata; the state transition diagram of the automaton is shown in Figure 14. Figure 15 represents a plot of a finite automaton which approximates a measure-1 (and thus infinite) automaton whose automaton function is z → 1 + 3z + 2z 2 , (z ∈ Z 2 ). Note the pronounced straight lines in the plot; these lines constitute the limit plot of a minimal subautomaton. Figure 16 depicts a plot of a measure-0 (but infinite) automaton which has the only minimal finite affine subautomaton; the automaton function of the latter subautomaton is z → 5z, (z ∈ Z 2 ). The limit plot of the latter automaton are red lines; cf., Figure 12; the state transition diagram is the lower part of the diagram shown in Figure 14.
Basically, the limit plot of a finite automaton whose minimal subautomata are affine consists of families of parallel straight lines in the unit square or, respectively, of links of the torus windings whose slopes are in Z p ∩ Q; cf., Figures 5,6,12,and 13. The the minimal subautomata from the first example "exhibit nonzero phase shifts", while for the ones from the second example, the "phase shifts" are 0. Both examples are automata having two minimal affine subautomata. The minimal subautomata from the first example ( Figures 5 and 6) have limit plots defined by the functions f 1 (z) = −2z + 1 3 (red and green windings) and f 2 (z) = 3 5 z + 2 7 , (yellow, brown, and blue windings), respectively, z ∈ Z 2 . The minimal subautomata from the second example (Figures 12  and 13) have limit plots defined by the respective functions z → 3z (blue lines) and z → 5z (red lines), z ∈ Z 2 .
The limit plot of a finite affine automaton whose automaton function is z → az + b in the unit square I 2 consists of parallel straight lines with slope a = α/β ∈ Z p ∩ Q; thus, the plot may be considered not only on the torus obtained by "gluing together" opposite sides of the square but also on a cylinder obtained by "gluing together" only a pair of opposite sides of the square. This way, one obtains solenoid rather than a torus link. This representation of a limit plot is also convenient in some cases. For instance, Figures 17 and 18 depict the limit plot of the automaton whose automaton function is f (z) = ((zAND1) − ((NOT(z))AND1)) · z, where AND and NOT are respectively bitwise logical "and" and bitwise logical "not" operations on base-2 expansions of numbers (with no carries), while "·" and "−" are usual multiplication and subtraction of numbers (with carries). Figure 19 represents the state transition diagram of a general automaton all whose minimal automata are finite and affine.
Version May 9, 2023 submitted to Entropy Basically limit plot of a finite automaton whose minimal sub-automat sists of families of parallel straight lines in the unit square or, respectively, windings whose slopes are in Z p ∩ Q, cf. Figures 5-6 and 12-13. The th automata from the first example "exhibit non-zero phase shifts" while the second example "phase shifts" are 0. The both examples are automata ha mal affine sub-automata. The minimal sub-automata from the first examp have limit plots defined by the functions f 1 (z) = −2z + 1 3 (red and green f 2 (z) = 3 5 z + 2 7 , (yellow, brown, and blue windings), respectively, z ∈ Z sub-automata from the second example (Figures 12-13) have limit plots respective functions z → 3z (blue lines) and z → 5z (red lines), z ∈ Z 2 .
Limit plot of a finite affine automaton whose automaton function is z unit square I 2 consists of parallel straight lines with slope a = α/ β ∈ Z p may be considered not only on the torus obtained by "gluing together" o the square but also on a cylinder obtained by "gluing together" only a pair o of the square. This way one obtains solenoid rather than torus link. That re limit plot is also convenient in some cases. For instance, Figures 17-18 dep the automaton whose automaton function is f (z) = ((zAND1) − ((NOT where AND, NOT are respectively bitwise logical "and" and bitwise logi tions on base-2 expansions of numbers (with no carries), "•" and "−" are u tion and subtraction of numbers (with carries). Figure 19 represents state transition diagram of a general automaton mal automata are finite and affine.  tions on base-2 expansions of numbers (with no carries), "•" and "−" are usual multiplica-1530 tion and subtraction of numbers (with carries). 1531 Figure 19 represents state transition diagram of a general automaton all whose mini-1532 mal automata are finite and affine. 1533 Figure 9.

Wave functions emerging from automata 1534
This Subsection deals with the main notion of quantum theory, the wave function. 1535 Our goal is derive wave functions from causal functions, that is, from automata. Functions 1536 (5.16) are building blocks of the construction of the wave function on the base of causal 1537 maps. To start with, we briefly outline general idea of the construction. 1538 Figure 11. State transition diagram of the autonomous automaton whose automaton function f : Z 2 → Z 2 is a constant: f (z) = 2/7, (z ∈ Z 2 ). State 1 is initial.
Limit plot of the automaton having two sub-automata whose functions are z → 3z and z → 5z, (z ∈ Z 2 ).   Limit plot of the automaton having two sub-automata whose functions are z → 3z and z → 5z, (z ∈ Z 2 ). State transition diagram of the automaton having two minimal sub-automata whose automata functions are z → 3z and z → 5z, z ∈ Z 2 . Initial state is 0.
Recall that reduced state transition diagram of a finite automaton is a digraph each 1539 path in which ultimately reaches a minimal sub-automaton. There are no outgoing paths 1540 from sub-automata. By feeding the automaton with random long words, to each minimal 1541 sub-automaton we assign a probability when the automaton reaches states which belong 1542 to the sub-automaton, cf. Figure 20. Let automaton A be such that being fed by random 1543 long words, with probability 1 the automaton reaches at some finite step a state which 1544 belongs to a minimal automaton which is finite and affine. Limit plot of every such sub-1545 automaton is described by a complex-valued function of the form (5.16).

1546
To every minimal sub-automaton which is finite and affine it is ascribed a limit plot. 1547 There are only countably many such limit plots since there only countable many affine 1548 functions Z p → Z p which are automata functions of these sub-automata: Due to the 1549 finiteness, coefficients of these affine functions must belong to the set Z p ∩ Q which is 1550 countable. As every two minimal sub-automata have no common states due to the min-1551 imality, and as to every minimal sub-automaton it is assigned a probability of reaching 1552 the sub-automaton, to every limit plot one assigns a probability to "observe" that limit 1553 plot in experiment, i.e., to obtain accumulation points in the unit square which constitute 1554 that limit plot. The probability is equal to a sum of all probabilities to reach minimal sub-1555 automata having that plot. Therefore these probabilities constitute a distribution assigned 1556 Figure 14. State transition diagram of the automaton having two minimal subautomata whose automata functions are z → 3z and z → 5z, z ∈ Z 2 . The initial state is 0. Figure 15. Plot of a finite automaton which is an approximation of a measure-1 automaton whose automaton function is z → 1 + 3z + 2z 2 , (z ∈ Z 2 ).
May 9, 2023 submitted to Entropy 38 of 48 Figure 19. General automaton all whose minimal sub-automata are finite and affine

1581
Note that the set W(S) does not depend on concrete state transition diagram of the 1582 automaton A but to be more definite one may assume that the state transition diagram of 1583 the automaton is reduced, thus, given the automaton, a unique, cf. Subsection 3.3. In that 1584 case some care should be taken speaking of paths since some arrows in the reduced state 1585 transition diagram may actually be loops, see, e.g., Figure 19: The paths (which we write 1586 from left to right) that begin at the initial state t 0 and have prefixes 0111, 01011, 010011, 1587 0100011, ... all reach the sub-automaton S 3 on 4-th, 5-th, 6-th, 7-th,.. steps respectively, so 1588 the probability to reach the sub-automaton S 3 is 1/16 + 1/32 + 1/64

1591
Given two minimal sub-automata S and T of the automaton A which are finite and 1592 affine, by the minimality one has B(S) ∩ B(T) = ∅; so the probability that a random infi-1593 Figure 19. General automaton whose minimal subautomata are all finite and affine

Wave Functions Emerging from Automata
This section discusses the main notion of quantum theory, the wave function. Our goal is to derive wave functions from causal functions; that is, from automata. Functions (16) are building blocks of the construction of the wave function on the base of causal maps. To begin, we briefly outline the general idea of the construction.
Recall that the reduced state transition diagram of a finite automaton is a digraph in which each path ultimately reaches a minimal subautomaton. There are no outgoing paths from subautomata. By feeding the automaton with random long words, to each minimal subautomaton we assign a probability for when the automaton reaches states which belong to the subautomaton; cf., Figure 20. Let automaton A be such that, being fed by random long words, the automaton at some finite step reaches, with a probability 1, a state which belongs to a minimal automaton which is finite and affine. The limit plot of every such subautomaton is described by a complex-valued function of the form (16).
To every minimal subautomaton that is finite and affine we ascribe its limit plot. There are only countably many such limit plots since there are only countably many such affine functions Z p → Z p that are automata functions of these subautomata: Due to the finiteness of the subautomata, coefficients of these affine functions must belong to the set Z p ∩ Q which is countable. As every two minimal subautomata have no common states due to the minimality and as to every minimal subautomaton it is assigned a probability of reaching the subautomaton, to every limit plot one assigns a probability to "observe" that limit plot in the experiment, i.e., to obtain accumulation points in the unit square which constitute that limit plot. The probability is equal to a sum of all probabilities to reach the minimal subautomata having that plot. Therefore, these probabilities constitute a distribution assigned to the automaton; a characteristic function of that distribution is a (generally infinite) series whose terms are functions ψ(x, k) = e i(ax−2π p k b) multiplied by values of respective probabilities; cf., (16) (there is a vast literature on characteristic functions of probability distributions; see, e.g., [56]). We argue that this characteristic function of the distribution may be treated as a wave function.

1581
Note that the set W(S) does not depend on concrete state transition diagram of the 1582 automaton A but to be more definite one may assume that the state transition diagram of 1583 the automaton is reduced, thus, given the automaton, a unique, cf. Subsection 3.3. In that 1584 case some care should be taken speaking of paths since some arrows in the reduced state 1585 transition diagram may actually be loops, see, e.g., Figure 19: The paths (which we write 1586 from left to right) that begin at the initial state t 0 and have prefixes 0111, 01011, 010011, 1587 0100011, ... all reach the sub-automaton S 3 on 4-th, 5-th, 6-th, 7-th,.. steps respectively, so 1588 the probability to reach the sub-automaton

1591
Given two minimal sub-automata S and T of the automaton A which are finite and 1592 affine, by the minimality one has B(S) ∩ B(T) = ∅; so the probability that a random infi-1593 nite path starting from the initial state reaches at a finite step some minimal sub-automaton 1594 of the automaton A is the sum ∑ μ(B(S)) taken over all minimal sub-automata S which 1595 are finite and affine. Call an automaton A ultimately affine if the probability is 1. Note that 1596 if an ultimately affine automaton is infinite, by König's lemma (also known as Beth's tree 1597 theorem) [61] there are infinite paths which never reach states belonging to these minimal 1598 sub-automata. These paths constitute a μ-measurable subset in Z p but the measure of the 1599 Figure 20. Example state transition diagram of 2-adic automaton having minimal subautomata (output symbols of labels of arrows are omitted). s 0 is the initial state. The respective probabilities of reaching subautomata S 1 , S 2 , and S 3 are 1/2, 1/4, and 11/64 = 1/8 + 1/32 + 1/64.
Proceeding to a formal rigorous construction, let us review a few preliminary conventions: • We do not distinguish affine automata whose limit plots coincide, so the actual probability distribution related to the automaton is distribution of classes of finite affine subautomata having coinciding limit plots; • We use terms "p-adic integer", "infinite word over p-symbol alphabet", and "infinite path in a state transition diagram" as synonyms; see Sections 3.1-3.3.
A word of caution: there is a one-to-one correspondence between all paths of length k in the state transition diagram and all numbers from {0, 1, . . . , p k − 1}; however, to every number from N 0 = {0, 1, 2, . . .}, there corresponds an infinite number of paths: Every such path has a prefix which is simply a base-p expansion of a number and a suffix which consists of zeros only; cf., Section 3.1.
Given an automaton A, let S be its subautomaton. Let W(S) be the set of all infinite paths starting from the initial state of A in a state transition diagram of A which reach states of S at finite steps. Note that if a path w reaches S at k-th step, then all paths which correspond to infinite words having the same prefix of length k reach S at the k-th step; therefore, the p-adic integers which correspond to these paths constitute a p-adic ball of radius p −k . Therefore, all p-adic integers that correspond to infinite paths which reach the subautomaton S at finite steps constitute a disjoint union B(S) of balls of nonzero radii; hence, B(S) is a µ-measurable subset of Z p with respect to the Haar measure on Z p which is normalised so that µ(Z p ) = 1. This way to S is assigned a probability µ(S) = µ(B(S)).
Note that the set W(S) does not depend on a concrete state transition diagram of the automaton A, but to be more definite, one may assume that the state transition diagram of the automaton is reduced; thus, given an automaton function, the reduced state transition diagram of respective automaton is unique; cf., Section 3.3. In this case, some care should be taken speaking of paths since some arrows in the reduced state transition diagram may actually be loops; see, e.g., Figure 19. The paths (which we write from left to right) that begin at the initial state t 0 and have prefixes 0111, 01011, 010011, 0100011, ... all reach the subautomaton S 3 on the fourth, fifth, sixth, seventh,.. steps respectively, so the probability to reach the subautomaton S 3 is 1/16 + 1/32 + 1/64 + 1/128 + · · · = 1/8 and B(S 3 ) is a disjoint union of balls B 1/16 (14), B 1/32 (26), B 1/64 (50), ..., B 1/2 k (2 + 3 · 2 k−2 ),... where k = 4, 5, 6, . . ..
Given two minimal subautomata S and T of the automaton A that are finite and affine, by virtue of the minimality one has B(S) ∩ B(T) = ∅; thus, the probability that a random infinite path starting from the initial state reaches at a finite step some minimal subautomaton of the automaton A is the sum ∑ µ(B(S)) taken over all minimal subautomata S which are finite and affine. We call an automaton A ultimately affine if the probability is 1. Note that if an ultimately affine automaton is infinite, then, according to König's lemma (also known as Beth's tree theorem) [57], there are infinite paths that never reach states belonging to these minimal subautomata. These paths constitute a µ-measurable subset in Z p but the measure of the subset is 0 since the subset is a complement to a countable union of balls whose measure is 1. For instance, the path 111 . . . in the state transition diagram depicted by Figure 2 never reaches a minimal subautomaton (which has only one state, namely, s 1 ) , but all other paths reach the subautomaton at finite steps, so the probability to reach that minimal subautomaton is 1.
Definition 12 (Plot equivalence of automata). Call the finite affine automata S and T plot equivalent S ≡ P T if their respective functions ψ : R × Z → C defined by (16) coincide; that is, if their limit plots coincide, P (S) = P (T), i.e., if the limit plots are links of the same number of torus windings with a common slope.
Given a, b ∈ Z p ∩ Q, denote via S a,b an automaton whose automaton function is z → az + b. Let [S a,b ] be the set of all minimal subautomata of A that are plot-equivalent to S a,b . By virtue of the minimality, given S, T ∈ [S a,b ], the subautomata S and T have no common states; therefore, B(S) ∩ B(T) = ∅; that is, the probability is well-defined. Given a, b ∈ Z p ∩ Q, the equivalence relation ≡ P induces an equivalence relation on the set of all pairs (a; b) ∈ (Z p ∩ Q) × (Z p ∩ Q) which we denote by the same symbol, i.e., (a; b) ≡ P (c; d) if and only if S a,b ≡ P S c,d .
Let Spec(A) be the set of all equivalence classes defined by minimal subautomata of A which are finite and affine. Then, the series converges absolutely for all ρ ∈ R, k ∈ Z and therefore defines a complex-valued function Ψ A (ρ, k). Call the function Ψ A a sharp wave function assigned to the automaton A.
To prove the theorem we require a lemma.
Lemma 2 (All discrete random variables can be modelled on Z p ). Given convergent series ∑ ∞ j=0 q j = 1 of positive real numbers q j ∈ R ≥0 there exist pairwise disjoint open sets W j ⊂ Z p such that the normalised Haar measure µ of W j is q j , j = 0, 1, 2, . . ..

Proof of Lemma 2.
Most likely, the lemma is known, but as the author is aware of no proper reference, a proof follows. Consider the Monna map mon(z) i=0 α i p i ∈ Z p ; that is, the Monna map mon maps p-adic balls B 1/p k (a) ⊂ Z p of radii 1/p k centred at a ∈ Z p onto closed subintervals of length 1/p k of the unit interval [0, 1]; note that λ(mon(B 1/p k (a))) = µ(B 1/p k (a)) where µ is the Haar measure on Z p normalised so that µ(Z p ) = 1, and λ is Lebesgue measure on the unit real interval [0, 1], i.e., the length of the closed interval.
Split the unit interval [0, 1] into pairwise disjoint open intervals Q j such that the length of the j-th interval Q j is q j ; namely, let Q 1 = (0, q 1 ), Q 2 = (q 1 , q 1 + q 2 ), Q 3 = (q 1 + q 2 , q 1 + q 2 + q 3 ), ...; then, Q = ∞ j=0 Q j is λ-measurable and λ(Q) = 1. For each Q j let B j be a set of all balls of nonzero radii such that mon(B) ⊂ Q j for every B ∈ B j . As any two p-adic balls either disjoint or one is a subset of another one, the set B j is a countable disjoint union of balls of nonzero radii. Thus, B j is open as each p-adic ball of nonzero radius is clopen; hence, B j is µ-measurable. As every point from Q j lies in monimage of some ball from B j , we conclude that µ(B j ) = q j and µ( ∞ j=1 B j ) = ∑ ∞ j=1 µ(B j ) = 1 as B j ∩ B k = ∅ when j = k by the construction.
Proof of Theorem 17. This proof follows immediately from the proof of Lemma 2. Every B j , j = 1, 2, . . . is a countable disjoint union of balls B 1/p r jm (a jm ), m = 1, 2, 3, . . ., centred at a jm = ∑ r jm −1 k=0 α j,m,k p k ∈ Z p . Let branches of a p-adic tree be α j,m,0 α j,m,1 · · · α j,m,r jm −1 , and let leafs be B 1/p r jm (a jm ), j, m = 1, 2, . . .. In this digraph, replace all leafs B 1/p r jm (a jm ) with state transition diagrams of automata S m ∈ [S j ]. Thus, the constructed digraph is a state transition diagram of the automaton A which is the ultimate affine and such that Ψ A (ρ, k) = ∑ ∞ j=1 q j e i(a j ρ−2π p k b j ) .

Note 12.
From the proof of Theorem 17 it follows that the ultimate affine automaton may be either measure-0 or measure-1. The first case occurs when, for example, the series ∑ ∞ j=0 q j is finite; therefore the automaton A is finite and thus measure-0. The measure-1 case occurs when, for example, all coefficients a j ∈ Z p ∩ Q constitute a dense subset in R and all b j = 0.
In what follows, we will need a slightly generalised version of Lemma 2: Corollary 1 (Generalized Lemma 2). Given convergent series ∑ ∞ j=0 q j = q ≤ 1 of positive real numbers q j ∈ R ≥0 , there exist pairwise disjoint open sets W j ⊂ Z p such that the normalized Haar measure µ of W j is q j , j = 0, 1, 2, . . .. Sharp wave functions may be considered as wave functions with respect to discrete time since the map e 2πib → e 2πip k b is equivalent to a k-digit shift of the base-p representation of b and a reduction modulo 1 of the resulting number. As k is the order of time elapsed (and is measured by p-adic clock see Section 5.2 and Figure 3) since the moment the automaton reaches a state from its minimal affine subautomaton whose automaton function is z → az + b, a sharp wave function may be judged as the one the Little-endian can construct by observing reactions of a physical system at the smallest of scales.
We argue that a wave function with respect to continuous time can also be constructed by using ultimate affine automata. The core idea of the construct is using the beta representations of numbers rather than the base-p expansions. The beta representations of real numbers were first introduced by A. Rényi in 1957 and since then have attracted substantial attention in ergodic theory and symbolic dynamics; see, e.g., monograph [21].
Recall that given real β > 1, a β-representation of real b ≥ 0 is an infinite word χ 0 χ 1 · · · over the alphabet B = {0, 1 . . . , β } such that b = ∑ ∞ j=−k χ k+j β −k−j . Note that we consider β-representations of real b ≥ 0 and not only of real b ∈ [0, 1] as in [21]. Of course, in (17), we always may assume that b ∈ [0, 1]; however, to assign real numbers to paths in state transition diagrams of automata we need beta representations of numbers from N 0 which then are converted into real numbers in a way similar to what we used in Section 5.3 by exploiting p-adic representations.
Specifically, we first use β instead of p. Thus, each arrow in a state transition diagram of the automaton whose input and output alphabets are B, is labelled by a pair χ|ξ, where χ, ξ ∈ B; for an infinite path which starts from an initial state, there corresponds an infinite word w = χ 0 χ 1 · · · over alphabet B; for w, we place a corresponding ( β + 1)-adic integer ∑ ∞ j=0 χ j ( β + 1) j . To construct a plot, we convert these ( β + 1)-adic integers into sequences of real numbers χ 0 β −1 , χ 1 β −1 + χ 0 β −2 , χ 2 β −1 + χ 1 β −2 + χ 0 β −3 ,..., thus obtaining points (χ k−1 β −1 + · · · + χ 0 β k−2 ; ξ k−1 β −1 + · · · + ξ 0 β k−2 ) ∈ R 2 . To put it in other words, we simply use β-representations for input/output words of the automaton A when constructing a plot of the automaton, but the automaton function is still a 1-Lipschitz map from ( β + 1)-adic integers to ( β + 1)-adic integers. This way, we construct a sharp wave function Ψ A (ρ, k) = ∑ [S a,b ]∈Spec(A) q [S a,b ] e i(aρ−2π( β +1) k b) (cf., (17)), which is a well-defined complex valued-function of ρ ∈ R and k ∈ Z; then, we replace ( β + 1) by β in the formula, thus resulting in another complex-valued function of ρ ∈ R and k ∈ Z. The crucial point is that if 1 < β 2, i.e., if β = 1 + τ where 0 < τ 1, then β k = (1 + τ) k ≈ 1 + kτ. When τ is small (e.g., if τ = 5.391247(60) × 10 −44 s, the Planck time) then for the Big-endian observer who is incapable of performing measurements with that accuracy (which is currently only about 10 −20 s), kτ ∈ R is indistinguishable from continuous time. Thus, we obtain a fuzzy wave functioñ which is ascribed to the automaton A. The function is well-defined for all ρ, t ∈ R since the series converges absolutely. From this point, the sharp wave function (which is a discrete time function) can be viewed as an approximation of a fuzzy wave function (which is a continuous time function). Note that since β = 1 + τ = 1, i.e., B is a 2-letter alphabet, then necessarily p = 2; see sharp wave function Formula (17). The term "approximation" here is not rigorous (although some hint is already given by Example 4); to prove this statement with a full rigour is a separate problem which will be considered in the future. In the current paper, we only find an exact representation for β = 1 + τ under the finiteness assumption of Section 2, but before doing this, we illustrate the usage of that β-representation using the analogy of film which is discussed in Section 2. Each frame of a film contains a number of details, but to cause an illusion of motion to a viewer, only a small share of the whole number of details is changed from one frame to the next frame; the smaller the share is, the slower the motion appear to the a viewer. For a Little-endian viewer, the share is p − 1 since he uses the base-p representation of numbers; in the case when the share is τ, one has the (1 + τ)-representation. If 0 < τ 1, we have the case of a Big-endian viewer.
It is important to stress that to represent numbers from N 0 in the base β, we use only non-negative powers of β in order to guarantee the uniqueness of β-representation for each number from N 0 since if negative powers of β = 1 + τ when τ 1 are allowed in β-representations, then every number from (0, τ −1 ) has a continuum of distinct β-representations provided τ < √ 5−1 2 [58]. However, in such a case, the very problem of assigning a number to a finite path in a state transition diagram becomes ill-posed. Under said convention, the following theorem is true: Theorem 18 (Finiteness assumption implies β = N √ 2). Let 1 < β < 2. If an automaton that performs the addition of β-representations of numbers from N 0 is finite then necessarily β = N √ 2 for some N ∈ N. For each N ∈ N, the addition of numbers from N 0 that are represented by N √ 2-representations can be performed with a finite automaton.
The converse statement of the theorem is obvious since the addition of numbers represented by N √ 2-expansions is an "addition with carry to the N-th digit"; for example, when N = 2 one has . . . It is worth warning the reader that Theorem 18 is not about the calculation of Planck time, whose value depends on the choice of units. In short, Theorem 18 is about how much information one needs to have both worldviews, that of the Little-endian and the Big-endian, agree. Specifically, Theorem 18 implies that the fuzzy wave function is the one which corresponds to an automaton over a 2 N -symbol alphabet; that is, to the automaton whose function is f : Z N 2 → Z N 2 , i.e., a N-variate 2-adic 1-Lipschitz map; see Section 3.3. Actually, f is a 1-Lipschitz map Z 2 ( N √ 2) → Z 2 ( N √ 2), where Z 2 ( N √ 2) is the ring of integers of the field Q 2 ( N √ 2); we leave further discussion of theory to future papers. We remind the reader that for multivariate p-adic 1-Lipschitz maps, most theorems that have been proven or mentioned in this paper hold true; in particular, Theorem 15 holds true. Given a real function G : H → R n whose domain is H ⊂ R m , by the graph of the function (on the torus T m+n ), we mean the point subset G H (g) = {( − → x mod 1; G( − → x ) mod 1) : − → x ∈ H} ⊂ T m+n . Note that if − → y = (y 1 ; . . . ; y k ) ∈ R k , then − → y mod 1 stands for (y 1 mod 1; . . . ; y k mod 1). The theorem implies that in the multivariate case, the sharp wave function is of the following form:

Theorem 19 ([10]). Let
Therefore, Theorem 18 implies that a univariate fuzzy wave function is actually a multivariate sharp wave function; however, it is for a large number of dimensions. For instance, if N √ 2 = 1 + τ where τ is of order of Planck time, then N ≈ ln 2 τ ≈ 10 43 ; that is, the automaton function of respective automaton is a 1-Lipschitz map Z 10 43 2 → Z 10 43 2 . This means that the matrices A in the above formula for the sharp wave function Ψ A ( − → x , r) are 10 43 × 10 43 ; that is, each of the matrices contains more entries than the number of atoms in the universe. An infinite-dimensional space is an adequate model for a 10 43 -dimensional space; this is why both the Big-endian and Little-endian would agree that wave functions "live" in Hilbert spaces. We postpone to a future paper more rigorous statements and proofs on how pure and fuzzy wave functions are related one to another; here, we only explain why both functions, which may be judged as "physical", are elements of Hilbert space 2 (Spec(A)) of square-summable complex sequences whose terms are indexed by elements of the set Spec(A) (which is countable) since a "physical" wave function must be square-summable and the sum of squares of probability amplitudes must be 1. Recall that any separable Hilbert space is metrically isomorphic to 2 and that the Fourier transform on the circle is such an isomorphism between the Hilbert space of square-integrable functions on [0, 1] = I and the space 2 (Z) of square-summable complex sequences whose terms are enumerated by integers. It is not difficult to construct sharp wave functions which can be judged as "physical" with this meaning. Indeed, take any sequence q 1 , q 2 , . . . of positive real numbers such that ∑ ∞ j=1 q j = 1, and the series ∑ ∞ j=1 √ q j of positive square roots converges; by using Theorem 17, construct the automaton A. Then, function ∑ ∞ j=1 √ q j e i(a j ρ−2π p k b j ) is the one we are seeking. We finalise the subsection with the following interpretation.
Interpretation 8 (Discrete spectrum; continuous spectrum). The measure-0 ultimate affine automata may be treated as models of physical systems having discrete (energy, frequency, ...) spectra, while measure-1 ultimate affine automata may be treated as models of physical systems having continuous spectra.

Uncertainty
In this subsection, we formally derive an uncertainty relation which holds for wave functions of automata. We stress, once again, that despite the Litle-endian being capable of performing observation at the smallest scale and the Big-endian not being able to do so, the