Variances and Covariances in the Central Limit Theorem for the Output of a Transducer

We study the joint distribution of the input sum and the output sum of a deterministic transducer. Here, the input of this finite-state machine is a uniformly distributed random sequence. We give a simple combinatorial characterization of transducers for which the output sum has bounded variance, and we also provide algebraic and combinatorial characterizations of transducers for which the covariance of input and output sum is bounded, so that the two are asymptotically independent. Our results are illustrated by several examples, such as transducers that count specific blocks in the binary expansion, the transducer that computes the Gray code, or the transducer that computes the Hamming weight of the width-$w$ non-adjacent form digit expansion. The latter two turn out to be examples of asymptotic independence.


Introduction
We consider sequences defined as the sum of the output of a deterministic transducer, i.e. a finite-state machine that deterministically transforms an input sequence into an output sequence.Here, we let both the input and the output be sequences of real numbers and assume that the input sequence is randomly generated.Then, while the output depends deterministically on the input, the dependence between the two random variables "sum of the input sequence" and "sum of the output sequence" may become negligible for long input sequences.We investigate for which transducers this is the case.We give two different characterizations of such "independent" transducers, an algebraic and a combinatorial one.In a similar way, we also consider the variance of the sum of the output of a transducer.We prove a combinatorial characterization of transducers with bounded variance of the output sum.These combinatorial characterizations are described in terms of a weighted number of functional digraphs or cycles of the underlying graph.
Our probability model is the equidistribution on all input sequences of a fixed length n.We asymptotically investigate the two random variables "sum of the input" and "sum of the output" of a transducer for n → ∞.If these two random variables converge in distribution to independent random variables, then the transducer is called independent.
Under this probability model, the expected value of the sum of the input and the output are e 1 n and e 2 n + O(1), respectively, for some constants e 1 and e 2 .For the sum of the input, the expressions are exact without error term because the input letters are independent and identically distributed.Furthermore, under appropriate connectivity conditions, the variances and the covariance turn out to be v 1 n, v 2 n + O(1) and cn + O(1), respectively, for suitable constants v 1 , v 2 and c.We investigate for which transducers one of the constants v 2 and c is zero.
A special case of the output sum is the Hamming weight, which is the number of nonzero elements of a sequence.To give an example of an independent transducer, we later discuss the Hamming weight of the non-adjacent form as defined by Reitwiesner [24].The non-adjacent form is the unique digit expansion with digits {−1, 0, 1}, base 2 and the syntactical rule that at least one of any two adjacent digits has to be 0. It has minimal Hamming weight among all digit expansions with digits {−1, 0, 1} in base 2. In [19], Heuberger and Prodinger prove that the Hamming weights of the standard binary expansion and the non-adjacent form are asymptotically independent.The independent transducer computing these Hamming weights is shown in Figure 1.
There are many results on the variance of the sum of the output of explicit transducers under the same probability model we use.See, for example, [2,9,10,13] for the variance of the Hamming weight of different digit expansions which are computed by transducers.In [18], the authors count the occurrences of a digit and give the expected value, the variance and the covariance between two different digits.The occurrence of a specific pattern in a word is investigated in e.g.[3,6,8,23] (with generalizations to other probability models, too).In [3], the covariance between different patterns is also considered.In [11], Grabner and Thuswaldner consider a transducer whose output is the sum of digit function.However, they were only interested in the output and did not consider the joint distribution or the covariance of the input and output sum.
By contrast, we are interested in the joint distribution of the input and output sum for a general transducer.We not only algebraically compute the expected value and the variance-covariance matrix of this distribution, but we also give combinatorial descriptions of these values.In particular, we combinatorially characterize independent transducers and transducers with bounded variance of the output sum.This combinatorial connection is described by a condition on some weighted number of functional digraphs or on each cycle of the underlying graph of the transducer.To obtain these results, we apply a generalization of the Matrix-Tree Theorem by Chaiken [4] and Moon [21].
We formally define our setting in the next section.In Section 3, we state our main results.In Section 4, we present several examples where these main results are applied.In the last section, we give the proofs of the theorems.
In many contexts, an unbounded variance (as in [20]) is necessary to prove a Gaussian limit law.In Theorem 3.1, we combinatorially describe transducers whose output sums have bounded variance.For strongly connected transducers, we prove that this is the case if and only if there exists a constant such that for each cycle, the output sum is proportional to its length with this proportionality constant.This in turn is equivalent to a quasi-deterministic output sum in the sense that the difference of the output sum and its expected value is bounded for all events, independently of the length of the input.In the special case where the transducer is strongly connected and aperiodic and the only possible outputs are 0 and 1, it turns out that the output sum has asymptotically bounded variance if and only if the output is constant for all transitions (Corollary 3.6).The assumption of strong connectivity can be relaxed for most results.
We give an algebraic description of independent transducers in Theorem 3.9.We also state there that the input sum and the output sum are asymptotically jointly normally distributed if the variance-covariance matrix is invertible.In Theorem 3.14, we present a combinatorial characterization of independent transducers.
In Section 4, we give a variety of examples of independent and dependent transducers and transducers with bounded and unbounded variance to illustrate our results.One of those examples is a transducer computing the minimal Hamming weight of τ -adic digit representations on a digit set D. Building on the results of [13], we prove that the variance of the minimal Hamming weight is unbounded, which yields a central limit theorem.
In Section 5, we also prove an extension of the 2-dimensional Quasi-Power Theorem [14] to singular Hessian matrices as an auxilliary result.
The results of Theorem 3.9 have been implemented [17] in the computer algebra system Sage [26], based on its package for finite state machines [15].This code is included in Sage 6.3.

Preliminaries
A transducer is defined to consist of a finite set of states {1, 2, . . ., S}, a finite input alphabet A I ⊆ R, an output alphabet A O ⊆ R, a set of transitions E ⊆ {1, 2, . . ., S} 2 × A I with input labels in A I , output labels δ : E → A O and the initial state 1.The transducer is called deterministic if for all states s and input labels ε ∈ A I , there exists at most one state t such that (s, t, ε) ∈ E. Furthermore, the transducer is said to be subsequential (cf.[25]) if it is deterministic, every state is final and it has a final output a : {1, 2, . . ., S} → A O .A transducer is called complete if for every state s and digit ε ∈ A I , there is a transition from s to a state t with input label ε, i.e., (s, t, ε) ∈ E. Definition 2.1.A transducer is said to be finally connected if there exists a state which can be reached from any other state.The final component of such a transducer is defined to be the transducer induced by the set of states which can be reached from any other state.A finally connected transducer is said to be finally aperiodic if the underlying graph of the final component is aperiodic (i.e., the gcd of the lengths of all walks starting and ending at a given vertex is 1).
Remark 2.2.The final component of a transducer is a strongly connected component of the underlying graph of the transducer.If the underlying graph is strongly connected, then being finally aperiodic is equivalent to being aperiodic.We then call the transducer strongly connected and aperiodic.The final component of a complete transducer is complete itself.
In the following, we consider subsequential, complete, deterministic, finally connected, finally aperiodic transducers.We require that the input alphabet A I has at least two elements.Throughout the paper, we use ε for the input of a transition and δ for the output of a transition.We denote the number of states in the final component by N.
The input of the transducer is a sequence in A * I .It is not important whether we read the input from right to left or in the other direction, we just have to fix it for one specific transducer.The output of the transducer is the sequence of output labels of the unique path starting at the initial state 1 with the given input as input label, together with the final output label of the final state of this path.
Definition 2.3.The Hamming weight of a finite sequence (n 0 , . . ., n L ) is the number of nonzero elements of the sequence.
Example 2.4.The transducer in Figure 1 is a subsequential, complete, strongly connected, aperiodic transducer.It computes the Hamming weight of the non-adjacent form when reading the binary expansion from right to left.The transducer is a slight simplification of the one in, e.g., [19], taking into account that we are only interested in the Hamming weight.
For example, the non-adjacent form of 12 is (10(−1)00) NAF and has Hamming weight 2. When reading the standard binary expansion (1100) 2 of 12, the transducer in Figure 1 writes the output 10100.The leftmost 1 in the output is the final output of the last state.The sum of the output is 2, too.Let X n be a uniformly distributed random variable on A n I .Let Output(X n ) be the sum of the output sequence of the transducer if the input is X n .Furthermore, let Input(X n ) be the sum of the input sequence.Without loss of generality, we fix the direction of reading from right to left.
We investigate the 2-dimensional random vector for n → ∞, where t denotes transposition.We will prove that each component of this random vector either converges in distribution to a normally distributed random variable or to a degenerate random variable.Here, a random variable is said to be degenerate if it is constant with probability 1.By definition, a degenerate random variable is independent of any other random variable.Thus, the variance of a degenerate random variable and the covariance of a degenerate and any other random variable are always 0.
For a finally connected, aperiodic transducer, the expected value and the variance of Ω n will turn out to be (e 1 , e 2 ) t n+O(1), (v 1 , v 2 ) t n+O(1), respectively, for suitable constants e 1 , e 2 , v 1 and v 2 (see Theorem 3.9).The covariance between the two coordinates will be cn + O(1) for some constant c.We call Σ = v 1 c c v 2 the asymptotic variance-covariance matrix of Ω n = (Input(X n ), Output(X n )) t .Its entries are called the asymptotic variances and the asymptotic covariance.
A special case is a transducer with output alphabet {0, 1}.If we consider a transducer computing, for example, a new digit expansion and we are only interested in the Hamming weight of this new digit expansion, we map the output of each transition to the alphabet {0, 1}.In this way, we obtain a new transducer with output alphabet {0, 1} computing the Hamming weight of this new digit expansion.In this case, the combinatorial characterization of a bounded variance of the output sum is particularly simple (see Corollary 3.6).
For brevity, we introduce the notion of independent transducers.Definition 2.5.A transducer is independent if the random vector Ω n converges in distribution to a random vector with two independent components, i.e., the sum of the input Input(X n ) and the sum of the output Output(X n ) are asymptotically independent random variables.
Example 2.6.In [19], Heuberger and Prodinger prove that the Hamming weight of the standard binary expansion and the Hamming weight of the non-adjacent form are asymptotically independent.Thus, the transducer in Example 2.4 is independent.

Main results
In this section, we state the main theorems and corollaries describing independent transducers and transducers with bounded variance.First, we investigate transducers with bounded variance.Then, we give an algebraic description and a combinatorial characterization of independent transducers.3.1.Bounded variance and singular asymptotic variance-covariance matrix.We give a combinatorial characterization of transducers whose output sum has asymptotic variance 0. We also give a combinatorial description of transducers with singular asymptotic variance-covariance matrix.These characterizations are given in terms of cycles and closed walks of directed graphs.
As usual, a cycle is a strongly connected digraph such that every vertex has out-degree 1.A closed walk is an alternating sequence of vertices and edges (s 1 , e 1 , s 2 , . . ., s n+1 = s 1 ) such that e j is an edge from s j to s j+1 .
For a function g and a walk C of the underlying graph of the transducer, we define taking multiplicities into account.Here, the function g is either the constant function 1(e) = 1, the input ε(e) or the output δ(e) of the the transition e.We want to emphasize that only cycles and closed walks of the final component are considered in this theorem (see also Remark 3.10).
In the case of a strongly connected transducer, the equivalent conditions of Theorem 3.1 will be shown to be equivalent to another condition which, at first glance, seems to be even stronger.
holds for all n and all inputs.
We now characterize quasi-deterministic output sums.In weakly connected graphs, it turns out that being "quasi-deterministic" is a stronger notion than the conditions in Theorem 3.1.
Theorem 3.3.Let T be a subsequential, complete transducer whose underlying graph is weakly connected.Then the following two assertions are equivalent: (d) There exists a constant k ∈ R such that the random variable Output(X n ) is quasi-deterministic with value kn + O(1).(e) There exists a constant k ∈ R such that holds for every directed cycle C of the transducer.
By comparing statements (c) of Theorem 3.1 and (e) of Theorem 3.3, it is obvious that in strongly connected transducers, all these statements are actually equivalent.
Corollary 3.4.Let T be a subsequential, complete, strongly connected, aperiodic transducer.Then the asymptotic variance v 2 of the output sum is zero if and only if the output sum is a quasi-deterministic random variable.
Remark 3.5.If the transducer is not strongly connected (so that there are states that do not belong to the final component), the output sum can have bounded variance without being quasi-deterministic.A simple example is a transducer that counts the number of 1s in a binary string before the first 0.In such a case, however, the transducer formed only by the final component still needs to have quasi-deterministic output sum.
The following corollary of Theorem 3.1 gives a combinatorial characterization of transducers whose asymptotic variance-covariance matrix is singular.
Corollary 3.7.Let T be a complete, subsequential, finally connected, finally aperiodic transducer whose input alphabet has at least size 2. Then the asymptotic covariance-variance matrix Σ has rank 1 if and only if there exist a, b ∈ R with for all cycles C of the final component.
In that case, the constants are a = − c v 1 e 1 + e 2 and b = c v 1 .Furthermore, the random variables Input(X n ) and Output(X n ) are asymptotically perfectly positively or negatively correlated (i.e., they have asymptotic correlation coefficient ±1) if and only if (1) holds with b = 0.

3.2.
Algebraic description of independent transducers.For giving an algebraic description of independent transducers, we define transition matrices of the transducer.Definition 3.8.For ε ∈ A I , let a transition matrix M ε (y) of the final component be the N × N-matrix whose entry (s, t) is y δ if there is a transition from state s to state t in the final component with input ε and output δ, and 0 otherwise.
Similarly, let M ′ ε be the transition matrix of the whole transducer.The ordering of the states is considered to be fixed in such a way that the initial state 1 is the first state and M ′ ε has the block structure * * 0 M ε where * are matrices with arbitrary entries.If the transducer is strongly connected, the matrices * are not present (they have 0 rows).Theorem 3.9.Let T be a complete, subsequential, finally connected, finally aperiodic transducer with transition matrices M ε (y) for ε ∈ A I .Let K ≥ 2 be the size of the input alphabet A I and Then the random variables Input(X n ) and Output(X n ) have the expected values, variances and covariance with The constants e 1 and v 1 can also be expressed as The random vector Ω n is asymptotically jointly normally distributed if and only if the asymptotic variance-covariance matrix Σ is regular.
The transducer T is independent if and only if or, equivalently, This result has been implemented as the method FiniteStateMachine.asymptotic_moments() in the computer algebra system Sage, cf.[17], using the finite state machines package described in [15].Remark 3.10.Neither the final output nor the non-final components influence the asymptotic result because it only depends on f (x, y, z) and thus on the transitions of the final component.Now we consider the following "inverse" problem: Given the underlying graph and the input digits of the transducer; how can we choose the output labels such that the transducer is independent?Let (a 1 , . . ., a KN ) be the output labels of the final component of the transducer.We say, as usual, that a linear equation is homogeneous if the zero vector is a solution.Then ( 4) is a linear, homogeneous equation in a 1 , . . ., a KN with real coefficients.The equation is linear because the variables a i only occur linearly in the exponents of y and there are only first derivatives with respect to y in the covariance condition (4).Furthermore, ( 4) is homogeneous because all derivatives with respect to y (and maybe other additional variables) at (x, y, z) t = 1 are homogeneous.A solution of this linear, homogeneous equation corresponds to an independent transducer.
Let us first consider the situation where all outputs are equal to 1.Then, the determinant f (x, y, z) consists of monomials x a y b z b with a ∈ R and b ∈ Z.Therefore, we obtain and it follows that (4) and (5) are satisfied.This means that a constant output (k, . . ., k) for k ∈ A O is always a trivial solution to these equations because (4) is homogeneous.
But for these trivial solutions, the sum of the output is an asymptotically degenerate random variable.Hence, we are not really interested in the independent transducers given by these solutions.
Example 3.11.In Figure 2, we have a transducer with variable output weights a 1 , a 2 , a 3 and a 4 .We do not give the final output labels as they do not influence the asymptotic result.In this example, (4) simplifies to −a 1 + a 2 = 0.

3.3.
Combinatorial characterization of independent transducers.We connect the derivatives of f (x, y, z) with a weighted sum of subgraphs of the underlying graph.Thus, in Theorem 3.14, we can give a combinatorial description of (4).
Definition 3.12.We define the following types of directed graphs as subgraphs of the final component of the transducer.
• A rooted tree is a weakly connected digraph with one vertex which has out-degree 0, while all other vertices have out-degree 1.The vertex with out-degree 0 is called the root of the tree.For functions g and h : E → R, we define With these definitions, we give a combinatorial characterization of independent transducers.Theorem 3.14.Let T be a complete, subsequential, finally connected, finally aperiodic transducer.
Then the random variables Input(X n ) and Output(X n ) have the expected values given by (2), where the constants are ) .
The variances and the covariance are given by (2), with the constants The transducer T is independent if and only if We emphasize that, by Definition 3.13, only edges in the final component of the transducer are considered in Theorem 3.14.The non-final components do not influence the asymptotic main terms (see also Remark 3.10).
In the following corollary, we consider the case of a normalized input and output, i.e., the constants of the expected values satisfy e 1 = e 2 = 0.This can be obtained by subtracting the original constants e 1 and e 2 from every input label and output label, respectively.Then the corollary follows directly from Theorem 3.14.
Corollary 3.15.Suppose that E(Input(X n )) and E(Output(X n )) are both bounded.Then the transducer T is independent if and only if εδ(D 2 ) = εδ(D 1 ).
Example 3.16.We again consider the transducer of Example 3.11 in Figure 2. The set D 1 consists of 3 functional digraphs and D 2 consists of only one functional digraph (see Figure 3).By (6), we obtain the same equation as before, namely as condition for the transducer to be independent.
Also by Theorem 3.14, the expected value of the output sum is and the asymptotic variance is 16 .
The covariance between the input sum and the output sum is

Examples of transducers
In this section we give various examples to illustrate our theorems: these include both dependent and independent transducers and transducers with both bounded and unbounded variance of the output sum.These examples are also shown in the documentation of the method FiniteStateMachine.asymptotic_moments()[17] in Sage.Example 4.6 demonstrates how the combinatorial characterization of transducers with bounded variance can be used in cases where we only have limited information about the transducer.
Example 4.1 (Width-w non-adjacent form).The width-w non-adjacent form (cf. [1,22]) is a digit expansion with base 2, digits {0, ±1, ±3, . . ., ±(2 w−1 − 1)} and the syntactical rule that at most one of any w consecutive digits is nonzero.The transducer in Figure 4 computes the Hamming weight of the width-w non-adjacent form when reading the standard binary expansion (cf.[16]).For w = 2, this transducer is the same as that in Figure 1.The variance of the output is not 0 (Corollary 3.6).With Theorem 3.9 or 3.14, we obtain that this transducer is independent for every w.Thus, the Hamming weight of the width-w non-adjacent form and the standard binary expansion are asymptotically independent.Remark 4.2.Example 4.1 not only shows that there are infinitely many independent transducers, but also gives the construction of one such infinite family of independent transducers.Example 4.3 (Gray code).The Gray code is an encoding of the positive integers such that the Gray code of n and the Gray code of n + 1 differ only at one position.The transducer in Figure 5 computes the Gray code of an integer.The output label of the initial state is 0 and, as it does not influence the result, it is not given in the figure.The transducer is finally connected and finally aperiodic.The final component consisting of states 2 and 3 is independent (see Example 3.11).Thus, the Hamming weight of the Gray code and the standard binary expansion are asymptotically independent.
Example 4.4 (Length 2 blocks in the standard binary expansion).We count the number of patterns of length 2 occurring in the standard binary expansion and compare it to the Hamming weight.By symmetry, it is obviously sufficient to consider the two patterns 01 and 11.The transducers in Figure 6 determine the number of 01-and 11-blocks, respectively.The variance of the output weight is not 0 in either case (Corollary 3.6), in fact the constant v 2 is 1 16 (for 01-blocks) and 5   16   respectively.By Theorem 3.9 or 3.14, we also find that the transducer for 01-blocks is independent, while the transducer for 11-blocks (unsurprisingly) is not: the number of 11-blocks asymptotically depends on the number of 1's in the standard binary expansion, and the correlation coefficient Example 4.5.Now, we give an example of a transducer with bounded variance of the output sum.We compute the number of 10-blocks minus the number of 01-blocks in the standard binary digit expansion.In Figure 7, we show the corresponding transducer.The output label of the initial state is 0 and, as it does not influence the result, it is not given in the figure.Any of the three cycles has output sum 0. Thus, the asymptotic variance of this random variable is 0. There is, of course, an intuitive explanation: when we read a 1 after a 0 (reading from right to left), the count increases by 1; when we read a 0 after a 1, the count decreases by 1; otherwise, it remains unchanged.Thus the final output value will only depend on the first and last digit.
Example 4.6.Finally, we consider the transducer used in [13] to compute the minimal Hamming weight of the τ -adic digit expansion for a given algebraic integer τ and a given digit set D. Note that the output alphabet of the transducer need not be {0, 1} even if we are interested in the Hamming weight.The next theorem is an extension of Theorem 4 in [13].
Consider the random variable W n = mw(D n ), where D n is a random τadic joint digit representation of length n with digits in A I ⊂ Z[τ ] d .We assume that (τ, A I ) is an irredundant digit system with 0 ∈ A I .The digits of D n are independent and identically distributed with uniform distribution on A I .Then there exist constants E, V , with V = 0, such that and Proof.In [13], the authors give a strongly connected and aperiodic transducer computing mw(z) if the input is the τ -adic representation of z with digit set A I read from left to right.Everything follows from Theorem 4 in [13] if V = 0.
To prove V = 0, we use Theorem 3.1, (b).In [13], the authors state that the transducer has a loop at the initial state 1 with input and output digit 0. Thus, in Theorem 3.1, (b), the value of k is 0.
On the other hand, there exists a z ∈ Z[τ ] d with mw(z) = 0.The input z leads to a state s.From each state the input 0 l , for some l, leads again to the initial state 1.Thus, the unique path whose input labels are given by the digit representation of zτ l is a closed walk visiting 1 at least once.The output sum of this closed walk is mw(zτ l ) = mw(z) = 0. Thus, there exists a closed walk whose output sum is not 0, which contradicts Theorem 3.1, (b) with k = 0. Therefore, we obtain V = 0.

Proofs of the theorems
In this section, we give the proofs of the theorems and corollaries of Section 3. We first prove the algebraic description and the combinatorial characterization in Sections 3.2 and 3.3.Later we prove the statements in Section 3.1 about the bounded variance.5.1.Algebraic description of independent transducers.First, we prove a slight extension of the 2-dimensional Quasi-Power Theorem [14] (a generalization of [20]).This extension will also take into account the case of a singular Hessian matrix.
We write boldface letters for a vector s = (s 1 , s 2 ) t .Furthermore, we use the notation e s = (e s 1 , e s 2 ).We denote by 1 a 2-or 3-dimensional vector of ones, depending on the context.By • , we denote the maximum norm s = max(|s 1 |, |s 2 |).
Theorem 5.1.Let (Ω n ) n≥1 be a sequence of 2-dimensional real random vectors.Suppose that the moment generating function satisfies u(s) and v(s) are analytic for s ≤ τ and independent of n; (2) , where H u (s) is the Hessian matrix of u.Let Σ be the matrix H u (0).
If H u (0) is regular, then the standardized random vector is asymptotically jointly normally distributed with variance-covariance matrix Σ.
If H u (0) has rank 1, then the limit distribution of Ω * n is the direct product of a normal distribution and a degenerate distribution (if one of the variances is O(1)) or a linear transformation thereof.In the first case, the coordinates of Ω * n are asymptotically independent.In the second case, we have an asymptotically linear relationship between the two coordinates.
If H u (0) has rank 0, then the limit distribution of Ω * n is degenerate.Proof.The expressions (7) for expectation and variance-covariance matrix follow from the moment generating function by differentiation.
The case of a regular Hessian matrix H u (0) is exactly the statement of the 2-dimensional Quasi-Power Theorem [14].
For the case of a singular Hessian matrix, we follow the proof of the Quasi-Power Theorem [14].We consider the characteristic function of the standardized random vector Ω * n .Thus the characteristic function tends to f (s) = exp − 1 2 s t H u (0)s .If the Hessian matrix H u (0) has rank 0, then f (s) equals the identity function.Thus, the distribution function is degenerate.
If the Hessian matrix H u (0) has rank 1 and the variance of the second coordinate • 1 which is the characteristic function of the normal distribution with mean 0 and variance v 1 times the characteristic function of the point mass at 0.
If the Hessian matrix has rank 1 with v 1 v 2 = 0, then we consider the random variables X = Ω n,1 , the first coordinate of Ω n , and Then, the main term of the variancecovariance matrix of (X, Z) t is v 1 0 0 0 Φ(n).Thus, X is asymptotically normally distributed and Z is an asymptotically constant random variable (see previous case).
Using this version of the Quasi-Power Theorem, we prove the algebraic description of independent transducers given in Theorem 3.9.
Proof of Theorem 3.9.Let a kln be the number of sequences of length n with input sum k such that the corresponding output of the transducer T has sum l.We define Thus, the variable x marks the input sum, y marks the output sum, and z marks the length of the input.Then [z n ]A(x, y, z) is the probability generating function of Ω n , where [z n ]b(z) is the coefficient of z n in the power series b(z).
Due to the block structure of M ′ ε (y), we have ( 8) , with u t = (1, 0, . . ., 0) for the initial state, v s = y a(s) for the final output label at state s and F 1 (x, y, z) and F 2 (x, y, z) "polynomials" in x, y and z.We use quotation marks because exponents of x and y might not be integers.However, only finitely many summands occur.
The moment generating function of Ω n is E(e Ωn,s ) = [z n ]A(e s 1 , e s 2 , z).
For extracting the coefficient, we investigate the dominant singularity of A(x, y, z).Since the final component is strongly connected and aperiodic, we have a unique dominant simple eigenvalue of ε∈A I x ε M ε (y) at (x, y) t = 1 by the theorem of Perron-Frobenius (cf.[7]).Because the final component is complete, this dominant eigenvalue is K, that is the size of the input alphabet A I .Thus, the unique dominant singularity of f (x, y, z For (x, y) t in a small neighborhood of 1, there is a unique dominant singularity ρ(x, y) of f (x, y, z) −1 due to the continuity of eigenvalues.
Next, we consider the non-final components of the transducer.The corresponding transducer T 0 is not complete.Let T + 0 be the complete transducer that is obtained from T 0 by adding loops where necessary.The dominant eigenvalue of T + 0 is K.As the corresponding sums of transition matrices of T 0 and T + 0 satisfy element-wise inequalities but are not equal (at (x, y) t = 1), the theorem of Perron-Frobenius (cf.[7,Theorem 8.8.1]) implies that the dominant eigenvalues of T 0 have absolute value less than K. Thus, the dominant singularities of F 2 (1, 1, z) −1 are at |z| > 1.By continuity, this also holds for a small neighborhood of (x, y) t = 1.
The Laurent series of A(x, y, z) at z = ρ(x, y) is A(x, y, z) = (z − ρ(x, y)) −1 C(x, y) + power series in (z − ρ(x, y)) for a function C(x, y) which is analytic in a neighborhood of 1 with C(1) = 0. Thus, by singularity analysis [5], we have and κ < 1. Theorem 5.1 yields the expected value, the variance-covariance matrix and the asymptotic normality of Ω n .By implicit differentiation, we obtain the stated expressions.The error terms for the input sum are 0 because the input letters are independent and identically distributed.This also yields the explicit constants in (3).
Since the input alphabet A I has at least two elements, the input sum has nonzero asymptotic variance.Thus, the asymptotic variancecovariance matrix Σ can have rank 1 or 2. Now, we consider these two cases separately and prove the asserted equivalence.
(1) Let Σ have rank 1.Then Ω n converges to a degenerate and a normally distributed random variable if the asymptotic variance of the output sum is 0; or a linear transformation thereof otherwise.Thus, Ω n is asymptotically independent if and only if the asymptotic variance of the sum of the output is 0. As the rank of Σ is 1, the asymptotic variance is 0 if and only if the asymptotic covariance is 0. (2) Let Σ be invertible.By Theorem 5.1, we obtain an asymptotic joint normal distribution.Thus, Ω n is asymptotically independent if and only if its asymptotic covariance is 0.

Combinatorial characterization of independent transducers.
To obtain the combinatorial characterization, we use a version of the Matrix-Tree Theorem as proved by Chaiken [4] and Moon [21].This version does not use trees, but forests, i.e., digraphs whose weak components are trees.
Definition 5.2.Let A, B ⊆ {1, . . ., N}.Let F A,B be the set of all forests which are spanning subgraphs of the final component of the transducer T with |A| trees such that every tree is rooted at some vertex a ∈ A and contains exactly one vertex b ∈ B.
, we define a function g : B → A by g(j) = i if j is in the tree of F which is rooted in vertex i.We further define the function h : A → B by h(i k ) = j k for k = 1, . . ., n.The composition g • h : A → A is a permutation on A. We define sign F = sign g • h.
consists of all spanning trees rooted in a ∈ A.
Theorem (All-Minors-Matrix-Tree Theorem [4,21]).For a directed graph with loops, let L = (l ij ) 1≤i,j≤N be the Laplacian matrix, that is N j=1 l ij = 0 for every i = 1, . . ., N and −l ij is the number of edges from i to j for i = j.Then, for |A| = |B|, the minor det L A,B satisfies det L A,B = (−1) i∈A i+ j∈B j F ∈F A,B sign F where L A,B is the matrix L whose rows with index in A and columns with index in B are deleted.
The All-Minors-Matrix-Tree Theorem is still valid for |A| = |B| if we assume that the determinant of a non-square matrix is 0. For notational simplicity, we use this convention in the rest of this section.
The next lemma connects the derivatives of f (x, y, z) with weighted sums of functional digraphs.Theorem 3.14 follows immediately from this lemma and Theorem 3.9.
Proof.The idea of the proof is as follows: First, we compute the derivatives and write them as sums over all states.Using the All-Minors-Matrix-Tree Theorem, we change the summation to a sum over forests.
In the next step, we again change to a sum over functional digraphs.Let u 1 , u 2 be any of the variables x, y or z.For a matrix M = (m ij ) 1≤i,j≤N , we define the matrix M k:u 1 = ( mij ) 1≤i,j≤N with mij = m ij for i = k and mkj = ∂ ∂u 1 m kj .Thus M k:u 1 is the matrix M where row k is differentiated with respect to u 1 .
We further define the derivatives at 1 as Applying the product rule to the definition of the determinants gives us .
In these equations, we have a sum over all states.Since our original matrix I − z K ε∈A I x ε M ε (y) is sparse, and (I − z K ε∈A I x ε M ε (y)) j:u 1 is even sparser, we use Laplace expansion along row j to determine these determinants.If i = j, we use Laplace expansion along row i and j to determine det(I − z K ε∈A I x ε M ε (y)) i:u 1 , j:u 2 for the second derivatives.If i = j, we only expand along row j.Depending on the variable of differentiation, there are at most K nonzero values in row j after differentiation.
For a transition e, we denote by t(e), h(e), ε(e) and δ(e) the tail, the head, the input and the output of the transition e, respectively.Furthermore, let w e = 1 K x ε(e) y δ(e) z be the weight of the transition e.
If we use Laplace expansion along two different rows, we must be careful with the sign.Therefore, we define σ de = (−1) [t(e)>t(d)]+[h(e)>h (d)]   for two transitions d and e.Here, we use Iverson's notation, that is [expression] is 1 if expression is true and 0 otherwise (cf.[12]).
Let L be the Laplacian matrix of the underlying graph, that is Recall the notation L A,B for the matrix where the rows corresponding to A and the columns corresponding to B have been removed.Laplace expansion yields (−1) t(e)+h(e) D u 1 (w e ) det(L {t(e)},{h(e)} ), Next, we use the All-Minors-Matrix-Tree Theorem and change the summation over all rows to a summation over forests.We obtain For a transition e, we know the first derivatives and the second derivatives for some constant e 2 (see Theorem 3.9).Then we subtract e 2 from the output of every transition, as for Corollary 3.15.Under this assumption, Theorem 3.14 implies that (b) can only hold with k = 0.
As the input sum is inconsequential, we consider A(1, y, z).For brevity, we write A(y, z) instead.We obtain where M ′ ε for ε ∈ {0, . . ., q − 1} are the transition matrices of T .
Since T is complete, finally connected and finally aperiodic, A(1, z) has a simple dominant pole at z = 1 (see the proof of Theorem 3.9).We know that ( 9) Let s be any state of the final component.Each path starting at state 1 either does or does not visit state s.In the first case, this path can be decomposed into a path leading to state s and visiting s only once, followed by a sequence of closed walks visiting state s exactly once, and a path starting in s and not returning to s.We translate this decomposition into an equation for the corresponding generating functions.
Let P s be the set of all walks in T which start at state s but never return to state s.All other states can be visited arbitrarily often.We define the corresponding generating function P s (y, z) = P ∈P s y δ(P ) z 1(P ) K −1(P ) .Then [z n ]P s (y, z) is the probability generating function of the output sum over walks in P s of length n.
Let P 1s be the set of all walks in T which start at state 1 and lead to state s, visiting s exactly once.If s = 1, this set consists only of the path of length 0. The corresponding generating function is called P 1s (y, z).
Let P 1 be the set of all walks in T which start at state 1 and never visit state s.If s = 1, this set is empty.The corresponding generating function is called P 1 (y, z).
Let C s be the set of all closed walks in T which visit state s exactly once.All other states can be visited arbitrarily often.The corresponding generating function is called C s (y, z).
Let α be any of the superscripts 1, 1s or s.By deleting the transitions leading to s, we have where E = diag(1, . . ., 1, 0, 1, . . ., 1) and u α and v α are fixed vectors.The position of the zero on the diagonal of E corresponds to the state s.The vectors u α and v α depend on α and may include the output of the transitions leading to s, but E is independent of α.Since we have the element-wise inequalities and ε∈A I M ′ ε (1)E = ε∈A I M ′ ε (1), we know that the spectral radii satisfy due to the theorem of Perron-Frobenius (cf.[7,Theorem 8.8.1]).
Here, it is important that s lies in the final component.Thus, the dominant singularities of P α (1, z) are at |z| > 1.Furthermore, we know that P s (1, 1) > 0 and P 1s (1, 1) > 0 by the definition as generating functions.
By ( 9), ( 10) and singularity analysis [5], we obtain Now, we consider transducers whose output alphabet is {0, 1} and prove that there are only trivial cases with a bounded variance.
Assume that the asymptotic variance is 0. Let k be the constant given in Theorem 3.This last proof shows the equivalence of the statements in Corollary 3.7, including a transducer with a singular asymptotic variancecovariance matrix.
Proof of Corollary 3.7.WLOG, we assume that both expected values E(Output(X n )) and E(Input(X n )) are O(1).
We know that the asymptotic variance v 1 of the input is non-zero because A I consists of at least two elements.As in the last paragraph of the proof of Theorem 5.1, we consider the random variables Y n = Input(X n ) and Z n = − c v 1 Input(X n ) + Output(X n ) and their variancecovariance matrix . The matrix Σ is singular if and only if the asymptotic variance of Z n is 0.
Thus, we consider a transducer with the same input as the original transducer T for which the output of a transition e is − c v 1 ε(e) + δ(e).By Theorem 3.1, the output sum of this new transducer has asymptotic variance 0 if and only if there exists an m ∈ R such that for every cycle C of the final component.Since the expected value of Z n is O(1), we have m = 0.
The second statement follows from Theorem 3.1.

Figure 1 .
Figure 1.Transducer to compute the Hamming weight of the non-adjacent form.

Theorem 3 . 1 .
For a subsequential, complete, finally connected and finally aperiodic transducer with an arbitrary finite input alphabet A I , the following assertions are equivalent: (a) The asymptotic variance v 2 of the output sum is 0. (b) There exists a state s of the final component and a constant k ∈ R such that δ(C) = k1(C) holds for every closed walk C of the final component visiting the state s exactly once.(c) There exists a constant k ∈ R such that δ(C) = k1(C) holds for every directed cycle C of the final component of the transducer T .In that case, kn + O(1) is the expected value of the output sum and Statement (b) holds for all states s of the final component.
• A functional digraph is a digraph whose vertices have out-degree 1.Each component of a functional digraph consists of a directed cycle and some trees rooted at vertices of the cycle.For a functional digraph D, let C D be the set of all cycles of D. Definition 3.13.Let D 1 and D 2 be the sets of all spanning subgraphs of the final component of the transducer T which are functional digraphs and have one and two components, respectively.

Figure 3 .
Figure 3. Functional digraphs of the transducer of Example 3.16.

Figure 4 .
Figure 4. Transducer to compute the Hamming weight of the width-w non-adjacent form.

Figure 5 .
Figure 5. Transducer to compute the Gray code.

Figure 6 . 1 Figure 7 .
Figure 6.Transducers to count the number of 01-and 11-blocks in the standard binary expansion.

Theorem 4 . 7 .
Assume that D ⊂ Z[τ ] d , for d a positive integer, and D ∩ τ Z d = {0}.Let mw(z) be the minimal Hamming weight of a τ -adic joint digit representation of z with digits in D. Assume further that the digit set D satisfies

d∈E e∈E e =d σ de D u 1 F
(w d )D u 2 (w e ) ∈F {t(d),t(e)},{h(d),h(e)} sign F .Let F ∈ F {t(e)},{h(e)} be a forest for a transition e ∈ E. Then F + e is a spanning functional digraph with one component.Let F ∈ F {t(d),t(e)},{h(d),h(e)} be a forest for transitions d, e ∈ E. Then F + d + e is a spanning functional digraph with one or two components, depending on σ de sign F .If σ de sign F = 1, then it has two components.Otherwise, it has one component.Now we can change the summation into a sum over functional digraphs and obtain

5 . 3 .
e) − 1), D xz (w e ) = 1 K ε(e)1(e), D yy (w e ) = 1 K δ(e)(δ(e) − 1), D yz (w e ) = 1 K δ(e)1(e), D zz (w e ) = 0. Thus, we obtain the formulas stated in the lemma.Bounded Variance and singular asymptotic variance-covariance matrix.We next give the proof of the equivalence of the three statements in Theorem 3.1, including the bounded variance.Proof of Theorem 3.1.We first prove (a) ⇔ (b) by giving an alternative representation of the generating function A(x, y, z) from the proof of Theorem 3.9.Then we prove the equivalence (b) ⇔ (c).(a) ⇔ (b): WLOG, we assume that the expected value

O( 1 )•
= E(Output(X n )) = P 1s (1, 1)P s (1, 1)C s y (1, 1)g(1) −2 n + O(1).Therefore, C s y (1, 1) = 0. Similarly, we haveV(Output(X n )) = P 1s (1, 1)P s (1, 1)C s yy (1, 1)g(1) −2 n + O(1),(11)taking into account that C s y (1, 1) = 0. By(11), V(Output(X n )) = O(1) is equivalent to C s yy (1, 1) = 0,and thus, C s yy (1, 1) + C s y (1, 1) = 0 as C s y (1, 1) = 0.By the definition of C s (y, z), this is equivalent to C∈C s δ(C) 2 K −1(C) = 0, and thus δ(C) = 0 for all C ∈ C s .(b) ⇒ (c): Let C s be the set of all closed walks in the final component of T which visit state s exactly once.If D is any cycle of the final component of the transducer, then one of the following occurs.• No visits of state s: Let i be a vertex of D. Because the final component is strongly connected, there exists a closed walk C ∈ C s with s, i ∈ C. Let D ′ be the combined closed walk of D and C.Then, D ′ ∈ C s , and so we have δ(D) = δ(D ′ ) − δ(C) = k1(D ′ ) − k1(C) = k1(D).One visit of state s: Then we have D ∈ C s and δ(D) = k1(D).(c) ⇒ (b): As a closed walk visiting s exactly once can be decomposed into cycles, this is obvious.Next, we prove the equivalence for the quasi-deterministic output sum.Proof of Theorem 3.3.(d) ⇒ (e): Let C be an arbitrary cycle of the transducer and P be a path from the initial state 1 to any state of the cycle.Let z n be the input sequence along the combined walk consisting of P and n times C.Then, by quasi-determinism and the definition of the output, we have k(1(P ) + n1(C)) + O(1) = Output(z n ) = δ(P ) + nδ(C) + O(1).Thus, n(δ(C) − k1(C)) is bounded by a constant depending on P and C, but independent of n.Therefore, we know that δ(C) = k1(C).(e) ⇒ (d): WLOG, we assume k = 0 (replace δ(e) by δ(e) − k for all transitions e).For every z ∈ A * I , we have |Output(z)| ≤ e∈E |δ(e)| + max s∈{1,...,S} |a(s)| because all cycles have output sum 0 so that every transition contributes at most once to Output(z).Therefore, we have a quasi-deterministic random variable Output(X n ) = O(1).