On linear balancing sets

Let n be an even positive integer and F be the field GF(2). A word in F<sup>n</sup> is called balanced if its Hamming weight is n=2. A subset C ⊆ F<sup>n</sup> is called a balancing set if for every word y ∈ F<sup>n</sup> there is a word x ∈ C such that y + x is balanced. It is shown that most linear subspaces of F<sup>n</sup> of dimension slightly larger than 3/2 log<inf>2</inf> n are balancing sets. An application of linear balancing sets is presented for designing efficient error-correcting coding schemes in which the codewords are balanced.


Introduction
Let F denote the finite field GF (2) and assume hereafter that n is an even positive integer. For words (vectors) x and y in F n , denote by w(x) the Hamming weight of x and by d(x, y) the Hamming distance between x and y.
We say that a word z ∈ F n is balanced if w(z) = n/2. For a word x ∈ F n , define the set B(x) = {x + z : z is balanced} = {y ∈ F n : d(y, x) = n/2} .
In particular, if 0 denotes the all-zero word in F n , then B(0) is the set of all balanced words in F n . It is known that 2 n √ 2n ≤ n n/2 = |B(x)| ≤ 2 n πn/2 (1) (see, for example, [10, p. 309]). We extend the notation B(·) to subsets C ⊆ F n by A subset C ⊆ F n is called a balancing set if B(C) = F n ; equivalently, C is a balancing set if for every y ∈ F n there exists x ∈ C such that d(y, x) = w(y + x) = n/2 (which is also the same as saying that for every y ∈ F n one has B(y) ∩ C = ∅). Using the terminology of Cohen et al. in [6,§13.1], a balancing set can also be referred to as an {n/2}-covering code.
An example of a balancing set of size n was presented by Knuth in [9]: his set consists of the words x 1 , x 2 , . . . , x n , where  It was shown by Alon et al. in [1] that every balancing set must contain at least n words; hence, Knuth's balancing set has the smallest possible size.
As proposed by Knuth, balancing sets can be used to efficiently encode unconstrained binary words into balanced words as follows: given an information word u ∈ F n , a word x in a balancing set C is found so that u + x is balanced. The transmitted codeword then consists of u + x, appended by a recursive encoding of the index (of length ⌈log 2 |C|⌉) of x within C. Thus, when |C| = n, the redundancy of the transmission is (log 2 n) + O(log log n). By (1), we can get a smaller redundancy of 1 2 (log 2 n) + O(1) using any one-to-one mapping into B(0). Such a mapping, in turn, can be implemented using enumerative coding, but the overall time complexity will be higher than Knuth's encoder.
In many applications, the transmitted codewords are not only required to be balanced, but also to have some Hamming distance properties so as to provide error-correction capabilities. Placing an error-correcting encoder before applying any of the two balancing encoders mentioned earlier, will generally not work, since the balancing encoder may destroy any distance properties of its input. One possible solution would then be to encode the raw information word directly into a codeword of a constant-weight error-correcting code, in which all codewords are in B(0). By a simple averaging argument one gets that for every code C ⊆ F n there is at least one word x ∈ F n for which the shifted set C + x = {y ∈ F n : y − x ∈ C} contains at least ( n n/2 /2 n )|C| ≥ |C|/ √ 2n balanced words. Yet, for most known constantweight codes, the implementation of an encoder for such codes is typically quite complex compared to the encoding of linear codes or to the above-mentioned balancing methods [12].
In this work, we will be interested in linear balancing sets, namely, balancing sets that are linear subspaces of F n . Our main result, to be presented in Section 3, states that most linear subspaces of F n of dimension which is at a (small) margin above 3 2 log 2 n are linear balancing sets. A generalization of this result to sets which are "almost balancing" (in a sense to be formally defined) will be presented in Section 4. On the other hand, we will prove (in Appendix B) that the problem of deciding whether a given set of vectors in F n spans a balancing set, is NP-hard.
Our study of balancing sets was motivated by the potential application of these sets in obtaining efficient coding schemes that combine balancing and error correction, as we outline in Section 5. However, we feel that linear balancing sets could be interesting also on their own right, from a purely combinatorial point of view.

Existence result
From the result in [1] we readily get the following lower bound on the dimension of any linear balancing set.
The dimension of every linear balancing set C ⊆ F n is at least ⌈log 2 n⌉.
As mentioned earlier, we will show that most linear subspaces of F n of dimension slightly above 3 2 log 2 n are in fact balancing sets. We start with the following simpler existence result, as some components of its proof (in particular, Lemma 2.3 below) will be useful also for our random-coding result.
There exists a linear balancing set in F n of dimension ⌈ 3 2 log 2 n⌉.
Theorem 2.2 can be seen as the balancing-set counterpart of the result of Goblick [8] regarding the existence of good linear covering codes (see also Berger [2,, Cohen [5], Cohen et al. [6, §12.3], and Delsarte and Piret [7]); in fact, our proof is strongly based on their technique. In what follows, we will adopt the formulation of [7].
Before proving Theorem 2.2, we introduce some notation. We denote the union C ∪(C +x) by C + Fx. (When C is a linear subspace of F n then so is C + Fx, and C + x is a coset of C within F n .) We also define is the probability that B(x) ∩ C = ∅, for a randomly and uniformly selected word x ∈ F n .
The proof of Theorem 2.2 makes use of the following lemma.
Proof. The proof is essentially the first part of the proof of Theorem 3 in [7], except that we replace the Hamming sphere by B(·). For the sake of completeness, we include the proof in Appendix A.
Proof of Theorem 2.2. Again, we follow the steps of the proof of Theorem 3 in [7]. Write ℓ = ⌈ 3 2 log 2 n⌉. We construct iteratively linear subspaces C 0 ⊂ C 1 ⊂ · · · ⊂ C ℓ as follows. The subspace C 0 is simply {0}. Given now the subspace C i−1 , we let by Lemma 2.3, such a word indeed exists. Now, where the last step follows from the lower bound in (1). Hence, As 2 n Q(C ℓ ) is an integer, we conclude that Q(C ℓ ) is necessarily zero, namely, B(C ℓ ) = F n .

Most linear subspaces are balancing sets
The next theorem is our main result. Hereafter, N stands for the set of natural numbers, and the notation exp(z) stands for an expression of the form a · 2 bz , for some positive constants a and b.
Theorem 3. 1. Given a function ρ : (2N) → N, let C be a random linear subspace of F n which is spanned by ⌈ 3 2 log 2 n⌉ + ρ(n) words that are selected independently and uniformly from F n . Then, Prob {C is a balancing set} ≥ 1 − exp(−ρ(n)) .
(Thus, as long as ρ(n) goes to infinity with n, all but a vanishing fraction of the ensemble of linear subspaces of F n of dimension ⌈ 3 2 log 2 n⌉ + ρ(n) are balancing sets.) Theorem 3.1 is the balancing-set counterpart of a result originally obtained by Blinovskii [3], showing that most linear codes attain the sphere-covering bound. An alternate proof for his result (with slightly different convergence rates as n → ∞) was then presented by Cohen et al. in [6,§12.3]. The proof that we provide for Theorem 3.1 can be seen as an adaptation (and refinement) of the proof of Cohen et al. to the balancing-set setting.
We break the proof of Theorem 3.1 into three lemmas. To maintain the flow of the exposition, we will defer the proofs of the lemmas until after the proof of Theorem 3.1.

Lemma 3.2.
Let C 0 be a random linear subspace of F n which is spanned by ⌈ 1 2 log 2 n⌉ random words that are selected independently and uniformly from F n . There exists an absolute constant β ∈ [0, 1) independent of n (e.g., β = 3 4 ) such that Prob {Q(C 0 ) > β} ≤ exp(−n) . Lemma 3. 3. Let C 0 be a linear subspace of F n . Fix a positive integer r, and let C 1 be a random linear subspace of F n which is spanned by C 0 and r random words from F n that are selected uniformly and independently. Then Prob Q(C 1 ) > (Q(C 0 )) (r/2)+1 < (Q(C 0 )) r/2 . Lemma 3. 4. Let C 1 be a linear subspace of F n and let C 2 be a random linear subspace of F n which is spanned by C 1 and ⌈log 2 n⌉ random words from F n that are selected uniformly and independently. Then Prob Proof of Theorem 3. 1. It is known (e.g., from [10, p. 444, Theorem 9]) that Hence, we can assume hereafter in the proof that ρ(n) is at most linear in n.
Let U be the list of |U| = ⌈ 3 2 log 2 n⌉ + ρ(n) random words from F n that span C, and write ℓ = ⌈ 1 2 log 2 n⌉, t = ⌈log 2 n⌉, and r = |U|−ℓ−t. We partition the words in U into three sub-lists, U 0 , U 1 , and U 2 , of sizes ℓ, r, and t, respectively. We denote by C 0 , C 1 , and C 2 the linear spans of U 0 , U 0 ∪ U 1 , and U 0 ∪ U 1 ∪ U 2 , respectively.
Next, we turn to the proofs of the lemmas.
Proof of Lemma 3.2. Write ℓ = ⌈ 1 2 log 2 n⌉, and let x 1 , x 2 , . . . , x ℓ denote the random words that span C 0 . The proof is based on the fact that, with high probability, the Hamming weight of each nonzero word in C 0 is close to n/2. Indeed, fix some nonzero vector ( Given some δ ∈ [0, 1 2 ), let E denote the event that C 0 has dimension (exactly) ℓ and each nonzero word in C 0 has Hamming weight within 1 2 ± δ n; namely, By the union bound we readily get that .
where the second step follows from the upper bound in (1).
Conditioning on the event E, we get by de Caen's lower bound [4] that where in the last step we have used the lower bound in (1). On the other hand, we also have 2 ℓ ≥ √ n and, so, writing we get that, conditioned on the event E, The result follows by recalling that Prob {E} ≥ 1 − exp(−n) and observing that β(δ) < 1 for every δ ∈ [0, 1 2 ) (in particular, there is some δ for which β(δ) = 3 4 > β(0)). Remark 3. 1. Suppose that C 0 (m, ℓ) is an ℓ-dimensional linear subspace of the linear [n=2 m , m, 2 m−1 ] code over F obtained by appending a fixed zero coordinate to every codeword of the binary [2 m −1, m, 2 m−1 ] simplex code. In this case, we can substitute δ = 0 in (8) and obtain that Q(C 0 (m, ℓ)) ≤ β(0) ≈ 0.748, for every ℓ in the range m/2 ≤ ℓ ≤ m. Thus, C 0 (m, ℓ) can replace the random code C 0 in Lemma 3.2. If ℓ grows sufficiently fast with m so that ℓ−(m/2) tends to infinity, then from (7) it follows that Proof of Lemma 3. 3. Let x 1 , x 2 , . . . , x r be the random words that, together with C 0 , span (the random code) C 1 . Obviously, B(C 0 + x i ) ⊆ B(C 1 ) and Q(C 0 + x i ) = Q(C 0 ) for every i = 1, 2, . . . , r. Hence, the expected value of Q(C 1 ) (taken over all the independently and uniformly distributed words x 1 , x 2 , . . . , x r ∈ F n ) satisfies Therefore, where the last step follows from Markov's inequality.
Proof of Lemma 3. 4. The result is obvious when Q(C 1 ) ∈ (0, 1 8 ); so we assume hereafter in the proof that Q(C 1 ) is within that interval. Write t = ⌈log 2 n⌉, and let x 1 , x 2 , . . . , x t be the random words that, together with C 1 , span C 2 . For i = 0, 1, 2, . . . , t, define the linear space L i iteratively by L 0 = C 1 and Letting Q i stand for (the random variable) Q(L i ) and ω i for 2 i /(8Q(C 1 )), by Lemma 2.3 and Markov's inequality we get for every i = 1, 2, . . . , t that, conditioned on an instance of Hence, for every i = 1, 2, . . . , t, .
The result follows by recalling that the events "Q(C 2 ) ≥ 2 −n " and "Q(C 2 ) > 0" are identical. Figure 1 lists the generator matrices of linear [n, k, d] codes over F that form linear balancing sets, for several values of n that are divisible by 4. These matrices were found using a greedy algorithm and they do not necessarily generate the smallest sets, except for n = 12 and n = 20, where the sets attain the lower bound of Theorem 2.1 (in addition, for the case n = 20, the set attains the Griesmer bound [10, §17.5]).

Remark 3.2.
In view of Remark 3.1, when n = 2 m (or, more generally, when n is "close" to 2 m ), Theorem 3.1 holds also for the smaller ensemble where we fix ⌈m/2⌉ basis elements of the random code C to be linearly independent codewords of the code C 0 (m, ⌈m/2⌉) defined in Remark 3.1. Furthermore, if these ⌈m/2⌉ rows are replaced by ℓ basis elements of the code C ′ 0 (m, ℓ) (as defined in that remark), then the value β in the proof of Theorem 3.1 can be taken as 1 − (π/4) (≈ 0.215) whenever ℓ−(m/2) goes to infinity (yet more slowly than ρ(n)).
We leave it open to find an explicit construction of linear balancing sets in F n of dimension O(log n). We also mention the following intractability result.
Theorem 3. 5. Given as input a basis of a linear subspace C of F n , the problem of deciding whether C is a balancing set, is NP-hard.
The proof of Theorem 3.5 is obtained by some modification of the reduction in [11] from Three-Dimensional Matching. We include the proof in Appendix B.
The proof of this fact is similar to the one showing that the covering radius of the first-order Reed-Muller code is at most (n − √ n)/2 [6, pp. 241-242] (specifically, in the line following Eq. (9.2.4) therein, simply reverse the inequality in "| ·, · | ≥ √ n"; see also (11) below).
Next, we formalize the notion of almost balancing sets and present generalizations for Theorems 2.2 and 3. 1. In what follows, we fix some function λ : 2N → N such that λ(n) < n/2, and write λ = λ(n) for simplicity. For a word x ∈ F n define the set B λ (x) = {y ∈ F n : |d(y, x) − n/2| ≤ λ} .
As was the case for λ = 0, the notation B λ (·) can be extended to subsets C ⊆ F n by A subset C ⊆ F n is called a λ-almost-balancing set if B λ (C) = F n ; equivalently, C is a λ-almost-balancing set if for every y ∈ F n there exists x ∈ C such that |d(y, x) − n/2| ≤ λ.
The following theorem can be seen as a generalization of Theorem 2.2.

Proof.
We follow the steps of the proof of Theorem 2.2, with Q(C i ) replaced by a term Q λ (C i ) which equals 1 − 2 −n B λ (C i ), and with (2) where the penultimate step follows from a well known lower bound on binomial coefficients [10, p. 309]. From (9) we have, thereby obtaining the counterpart of (2). Proceeding as in the proof Theorem 2.2, we see that Finally, using the Taylor series expansion for H( 1 2 − z) and recalling that λ = O( √ n), we obtain thereby completing the proof.
Observe that for n = 2 m and λ = ⌊ √ n/2⌋, the code C 0 (m, m) realizes the dimension guaranteed in Theorem 4.1.
The following theorem is a generalization of Theorem 3. 1.
Proof. The proof is the same as that of Theorem 3.1, except that Q(·) is replaced by Q λ (·) in Lemmas 3.3 and 3.4 (and in their proofs), and Lemma 3.2 is replaced by the following lemma.
, and let C 0 be a random linear subspace of F n which is spanned by ⌈ 1 2 log 2 n − log 2 (2λ + 1)⌉ random words that are selected independently and uniformly from F n . There exists an absolute constant β ∈ [0, 1) such that where the inequality follows from the convexity of z → z 2 . Hence, there is at least one index i ∈ {1, 2, . . . , M} for which We conclude that C (s) 0 is a linear λ-almost-balancing set with λ = ⌊ √ sn/2⌋, and its dimension is m = log 2 (n/s) ≤ 2(log 2 n − log 2 (2λ)).
We end this section by comparing our results to the following generalization of Theorem 2.1.

Balanced error-correcting codes
In this section, we consider a potential application of linear balancing sets in designing an efficient coding scheme that maps information words into balanced words that belong to a linear error-correcting code; as such, the scheme combines error-correction capabilities with the balancing property.
The underlying idea is as follows. Let C be a linear [n, k, d] code over F with the length n and minimum distance d chosen so as to satisfy the required correction capabilities. Suppose, in addition, that we can write C as a direct sum of two linear subspaces C ′ and C ′′ of dimensions k ′ and k ′′ , respectively, where C ′′ is a balancing set 1 . Now, if k ′′ is "small" (which means that k ′ is close to k), we can encode by first mapping a k ′ -bit information word u into a codeword c ∈ C ′ , and then finding a word x ∈ C ′′ so that c + x is balanced. The transmitted codeword is then the (balanced) sum c + x. The mapping u → c can be implemented simply as a linear transformation, whereas the balancing word x can be found by exhaustively searching over the 2 k ′′ elements of C ′′ . At the receiving end, we apply a decoder for C (for correcting up to (d−1)/2) errors) to a (possibly noisy) received word c + x + e, where e is the error word. Clearly, if w(e) ≤ (d−1)/2, we will be able to recover c+x successfully, thereby retrieving u.
Obviously, such as scheme is useful only when k ′′ is indeed small: first, k ′′ affects the effective rate (given by k ′ /n = (k−k ′′ )/n) and, secondly, the encoding process-as described-is exponential in k ′′ . Yet, not always is there a decomposition of C as in (12) that results in a small dimension k ′′ of C ′′ (in fact, for some codes C, such a composition does not exist at all).
A possible solution would then be to reverse the design process and start by first selecting the code C ′ so that it has the desired rate R = k ′ /n and a "slightly" higher minimum distance d ′ than the desired value d. In addition, we assume that there is an efficient (i.e., polynomialtime) decoding algorithm D ′ for C ′ that corrects any pattern of up to (d−1)/2 errors.
Next, we select C ′′ to be a random linear code spanned by k ′′ = ⌈ 3 2 log 2 n⌉ + ρ(n) words that are chosen independently and uniformly from F n , for some function ρ(n) = o(log n) that grows to infinity. By Theorem 3.1, the code C ′′ will be a balancing set with probability 1 − exp(−ρ(n)) = 1 − o(1), and the choice of k ′′ guarantees that an exhaustive search for the balancing word x during encoding will take O(n 3/2+ǫ ) iterations, for an arbitrarily small ǫ > 0 (if the search fails-an event that may occur with probability o(1)-we can simply replace the code C ′′ ). The receiving end can be informed of the choice of the code C ′′ by, say, using pseudo-randomness instead of randomness (and flagging a skip when failing to find a balancing word x).
It remains to consider the distance properties of the direct sum C = C ′ ⊕ C ′′ ; specifically, we need the subset of balanced words in C to have minimum distance at least d; in particular, every balanced word in C should have a unique decomposition of the form c + x where c ∈ C ′ and x ∈ C ′′ . When this condition holds, the decoding can proceed as follows. Given a received word y ∈ F n , we enumerate over all words x ∈ C ′′ and then apply the decoder D ′ to each difference y − x. Decoding will be successful if the number of errors did not exceed (d−1)/2, and the decoding complexity will be O(n 3/2+ǫ ) times the complexity of D ′ .
The next lemma considers the case where the code C ′ lies below the Gilbert-Varshamov bound. Hereafter, V (n, t) stands for t i=0 n i . Lemma 5. 1. Suppose that C ′ is a linear [n, k ′ , d ′ ] code over F that satisfies 2 k ′ · V (n, d ′ −1) ≤ 2 n . For every d ≤ d ′ , the minimum distance d(·) of (the random code) Proof. The code C contains |C|−|C ′ | random codewords, each being uniformly distributed over F n and therefore each having probability V (n, d−1)/2 n to be of Hamming weight less than d. The result follows from the union bound.
It is well known (see [10, p. 310]) that for any integer t = θn ≤ n/2, where H : [0, 1] → [0, 1] is the binary entropy function defined earlier. Hence, taking k ′′ ≤ ( 3 2 + ǫ) log 2 n, we get from Lemma 5.1 and the concavity of z → H(z) that Thus, to achieve a vanishing probability, Prob {d(C) < d}, of ending up with a "bad" code C as n goes to infinity, it suffices to take d ′ = d + O(log n) when d/n is fixed and bounded away from zero, or d ′ = d + O(1) when d is fixed.
Remark 5. 1. Instead of a decoding process whereby we enumerate over the codewords of C ′′ and then apply the decoder D ′ , we could use a decoder for the whole direct sum C, if techniques such as iterative decoding are applicable to C: in such circumstances, the advantage of the linearity of C is apparent. Linearity certainly helps if we are interested only in error detection rather than full correction, in which case the decoding amounts to just computing a syndrome with respect to any parity-check matrix of C.

Now,
Therefore, Using the definition of Q(·) the lemma is proved.

B Proof of Theorem 3.5
We prove Theorem 3.5 below, starting by recalling the reduction that is used in [11] to show the intractability of computing the covering radius of a linear code.
Let G = (V 1 :V 2 :V 3 , E) be a tripartite hyper-graph with a vertex set which is the union of the disjoint sets V 1 , V 2 , and V 3 of the same size t, and a hyper-edge set E = {e 1 , e 2 , . . . , e m } ⊆ The reduction in [11] maps G into a 3t × 8m parity-check matrix H = H G = ( H e ) e∈E , where each block H e is a 3t × 8 matrix over F whose rows and columns are indexed by u ∈ V 1 ∪ V 2 ∪ V 3 and (a 1 a 2 a 3 ) ∈ F 3 , respectively, and is computed from the hyper-edge e = (v e,1 , v e,2 , v e,3 ) as follows: (Namely, the three nonzero rows in H e are indexed by the vertices that are incident with the hyper-edge e, and these rows form a 3 × 8 matrix whose columns range over all the elements of F 3 .) A matching in G is a subset M ⊆ E of size t such that no two hyper-edges in M are incident with the same vertex (thus, every vertex of G is incident with exactly one hyper-edge in M).
For our purposes, we can assume that every vertex in G is incident with at least one hyper-edge (or else no matching exists). Under these conditions, m ≥ t and the matrix H has full rank (since it contains the identity matrix of order 3t).
The proof in [11] is based on the following two facts: (i) There is a matching M in G if and only if the all-one column vector 1 in F 3t can be written as a sum of (exactly) t columns of H (note that 1 cannot be the sum of less than t columns). Those columns then must be those that are indexed by (1 1 1) in all blocks H e such that e ∈ M.
(ii) If M is a matching in G then every column vector in F 3t can be written as a sum e∈M h e , where each h e is a column in H e .
Let C = C G be the linear [8m, 8m−3t] code over F with a parity-check matrix H. It readily follows from facts (i) and (ii) that G has a matching if and only if every coset of C within F 8m has a word of Hamming weight t.
From facts (i)-(ii) we get the following lemma.
Lemma B. 1. Suppose that t > 1 and that G contains a matching. Then every column vector in F 3t can be obtained as a sum of w distinct columns in H, for every w in the range t ≤ w ≤ 8m−t.
Proof. Let M be a matching which is assumed to exist in G. Given w ∈ {t, t+1, . . . , 8m−t}, write σ = min{8(m−t), w−t} , and let x be a column vector in F 3t which is the sum of σ columns in H that do not belong to the t blocks H e that correspond to e ∈ M. Also, write otherwise , and note that t ≤ τ ≤ 7t.
Given an arbitrary column vector s ∈ F 3t , we show that there are w distinct columns in H that sum to s. By fact (ii), for every e ∈ M there is a column h e in H e such that Furthermore, it follows from the structure of each block H e that when h e = 0, then for every integer r in the range 1 ≤ r ≤ 7 there exist r distinct columns h e,1 , h e,2 , . . . , h e,r in H e such that h e = r j=1 h e,j .
In either case we have: In addition, from (9) and (10) we get: We now proceed as in the proof of Lemma 3.2, with (14) replacing (6) and with (15) replacing the lower bound in (1): by de Caen's lower bound [4] we get a bound which is similar to (7), in which we plug ℓ = ⌈ 1 2 log 2 n − log 2 (2λ + 1)⌉. The result follows.