Information Hiding Using Matroid Theory

Inspired by problems in Private Information Retrieval, we consider the setting where two users need to establish a communication protocol to transmit a secret without revealing it to external observers. This is a question of how large a linear code can be, when it is required to agree with a prescribed code on a collection of coordinate sets. We show how the efficiency of such a protocol is determined by the derived matroid of the underlying linear communication code. Furthermore, we provide several sufficient combinatorial conditions for when no secret transmission is possible.


Introduction
The constuctions in this paper were originally motivated by problems in Private Information Retreival (PIR) [1]. Rather than describing the specific setting that motivated our problem, we will give a generalized communication setup.
Consider a pair (Q, Q ′ ) of linear codes of length n over a field F such that Q ⊆ Q ′ ⊆ F n . Two parties, Alice and Bob, have agreed to use the pair (Q, Q ′ ) for communication. Their communication happens in two different states with high-entropy and low-entropy messages, respectively. The reason for doing so might, for example, be that the channel quality varies over time, or (as in the case of PIR) that different information is requested to avoid surveillance. For simplicity, we may assume that one message between Alice and Bob corresponds to one codeword x ∈ F n , and that each symbol of x is sent over a different link. Depending on the application, these links may be servers, routers, or even wires. Without loss of generality, we label these links by the integers 1, . . . , n. In the high-entropy communication state, the message is a codeword drawn from Q ′ , and in the low-entropy state it is drawn from Q.
A collection I of observers each control a different subset T α ⊆ [n], α ∈ I of the links. These subsets need not be disjoint. Depending on the application, these observers can also be spies or computational clusters. The pair (Q, Q ′ ) should now be designed so that none of the observers can tell whether a message is sent in the high-entropy or low-entropy communication state. In other words, we insist that the projections Q| Tα and Q ′ | Tα are the same for every α ∈ I. Definition 1.1. We call the collection T = {T α : α ∈ I} a collusion pattern and its elements T are colluding sets.
Notice that, if Q| T = Q ′ | T , then it immediately follows that Q| T ′ = Q ′ | T ′ for all T ′ ⊆ T , as Q| T ′ is a projection of Q| T . We can thus assume without loss of generality that collusion patterns are closed under inclusion: In other words, T is an abstract simplicial complex and we write T = T 1 , T 2 , · · · , T r , where T α are the maximal colluding sets and |I| = r. Notice that we assume that the observers do not share information between them, so the collusion pattern is not necessarily closed under unions. The unionclosed case is just the case where the collusion pattern is a disjoint union of maximal colluding sets.
Definition 1.2. Let T be a collusion pattern with ground set [n], and Q ⊆ F n a linear code over a field F. The lift of Q over the collusion pattern T is By definition, for fixed Q, the code Q T is the largest code for which the conditions Q |T = Q ′ |T holds for all T ∈ T . We can think of the choice in which state to communicate as a secret.
We will now illustrate this construction with an example. To see this, take any codeword In the context of PIR, the secret is the identity of the downloaded file and the code Q is used to send queries to the database [2,3]. This work is also related to oblivious transfer [4], collision-resistant hashing [5], locally decodable codes [6,7] and secret sharing [8,9]. Motivated by several of these applications, it is natural to define the secrecy rate as dim(Q T /Q) n .
In the words of our general communication setting, this is the fraction of the channel's bandwidth that can be used to send high-entropy information that will not interfere with the low-entropy signal. Our scheme relies on the ability of the transmitter and receiver to prevent the detection of secret transmission. In this respect, it is related to the field of information hiding, which originated in antiquity and has received renewed attention since the 1990s due to the need to protect the copyright on digital content [10]. Some of the known strategies for information hiding include quantize-and-replace [11,12], low-bit modulation [13], additive spread spectrum [14] and quantization index modulation [15]. For a general reference, we refer the reader to [16].
In this paper, we focus on the relationship between the combinatorics of the communication code Q and the collusion pattern T . In particular, we are interested in conditions under which Q = Q T , so that no information hiding is possible.
It has been shown that if the entire ground set is an element of the collusion pattern, then Alice and Bob cannot share secret information [2]. In the current setting, this is just the trivial observation that Q [n] = Q for any code Q ⊆ F n . On the other hand, if the collusion pattern is disconnected, i.e., where T 1 and T 2 are disjoint collusion patterns then it is always possible to choose Q such that Q Q T [1]. One collusion pattern of special importance is the t-collusion [17]. It allows to model the situation when only the maximal size of colluding sets is known but the pattern itself is unknown. Definition 1.4. A t-collusion pattern is the collusion pattern given by As we will explain in Section 3, if T is a t-collusion, then Q = Q T whenever dim Q < t.
The starting point of this paper is the following example [18, Section 7.5].
with indices taken modulo n. Then Q = Q T .
The authors of [18] use the symmetry of this collusion pattern to derive this result. However, the symmetry is not essential for the result to hold, and in Theorem 5.9 we show the stronger result that any n−t+1 of the generators constructed as in Example 1.5 are enough to guarantee that Q = Q T .
The setting in Example 1.5 is not unique and our goal was to understand which combinatorial properties of the code Q determine when a collusion pattern T is equivalent to t-collusion. In Theorem 4.8 we show that the matroid structure of Q T is determined by the so-called derived matroid of Q. We show that all codes with the same matroid and derived matroid will have combinatorially equivalent lifts, in a sense that is made precise in Section 4. This connection to the derived matroid allows us to determine several other t-collusion equivalent patterns, which we present in Section 5.

Combinatorial preliminaries
In this section, we will review the basics of matroid theory, with a special focus on applications in coding theory. For general references on matroid theory we refer the reader to [19,20].
, where E is the ground set and C ⊆ 2 E is a collection of finite subsets of E, called circuits, that satisfies the following axioms: The direct sum, or disjoint union of two matroids M = (C, E) and It is straightforward to verify that this is indeed a matroid. In this work, E is a finite set, which we identify without loss of generality with [n] := {1, 2, . . . , n}. A set is dependent if it contains a circuit and independent otherwise. Thus circuits are the minimal dependent sets. A matroid that can not be written as the disjoint union of two nonempty matroids is called connected.
A maximal independent set is called a basis. Given a basis B of a matroid and an element e in [n]−B, there exists a unique circuit contained in B ∪{e}, known as the fundamental circuit of e with respect to B.
An element e of the ground set is called a loop if it is a circuit, and a coloop if it is contained in every basis.
For any X ⊆ [n] we can define the rank r(X) as the size of the largest independent set contained in X. In particular, the rank r = r([n]) of the matroid of an F-linear code Q is equal to the dimension of the code.
If X is a subset of [n], then the closure cl(X) is the set {e ∈ [n] : r(X ∪ {e}) = r(X)}. If cl(X) = X, then X is a flat. Flats of rank r − 1 are called hyperplanes.
The dual matroid M * is the matroid on the same ground set [n] whose circuits are the sets whose complements are hyperplanes in M. The dual rank of a set is given by In coding theory, deletion corresponds to puncturing of a linear code by columns in X.
It corresponds to shortening of a linear code by columns in X.
A minor of a matroid M is the matroid that is obtained from M by a sequence of deletions and contractions. Such sequences are associative and the operations of contraction and deletion commute. This implies that every minor can be written M\Y /X for X, Y ⊆ E and X ∩ Y = ∅.
In this paper we work with F-representable matroids over finite ground sets [n]. This means matroids M for which there exist a matrix G over F with n columns, such that a set C ⊆ [n] is a circuit in M if the columns indexed by C form a minimal F-linearly dependent set in G. The F-linear code Q generated by such a matrix G is called a representation of M.
Let M = (C, [n]) be an F-representable matroid and Q a representation of M. For every circuit C of M, there exists a vector in the dual code Q ⊥ ⊆ F n that is supported on C ⊆ [n]. Such vectors are called circuit vectors and they are unique up to non-zero scalar multiplication. Associated to the code Q, this set of circuit vectors represent a representation of the derived matroid δM(Q), which is an F-representable matroid on the ground set C [20]. The circuits of δM(Q) are given by the minimal linear dependencies between the circuit vectors in Q ⊥ . The derived matroid depends on the representation of M, so the notation δM(Q) always refers to the specific representation Q. It is proven in [20] that a matroid M is connected if and only if all of its derived matroids are.

Construction of the lift
It will often be useful to consider equivalent definitions of the lift, which we will present below. Notation 3.1. Let d(x, y) be the Hamming distance of two strings x and y of equal length, let Q be a linear code, and let T be a subset of the coordinate set of Q. We write: • d |T (x, y) to denote the Hamming distance of x and y restricted to a subset T of coordinates, i.e., Proof. We denote and note that this is a linear subspace of Q ⊥ . First, assume that there exists We will prove by induction on n − |T | that there exists y ∈ Q such that y |T = x |T . This is Proposition 3.3. Definition 1.2 of the lift can be rewritten as Proof. The first equality is just Definition 1.2 restated using Notation 3.1. The second one follows from Lemma 3.2.
Lemma 3.4. Let S and T be two collusion patterns on ground set [n], and let Q ∈ F n be a linear code. Then Using Lemma 3.2 twice, we have that Corollary 3.5. Let T = T 1 ⊔ T 2 is disconnected, with the vertex sets of T 1 and T 2 disjoint, and let Q i denote the restriction of Q to the vertex set of T i .
By definition, if T and T ′ are two collusion patterns such that T ⊆ T ′ , then for any linear code Q, Q T ⊇ Q T ′ . The following lemma, which will be used many times in the paper, shows that we can restrict attention to dual vectors that are supported on the collusion pattern over which we are lifting.
Proof. The linear conditions are generated by the circuit vectors, so  Let Q be a linear code over F and T a collusion pattern. We will now show how to construct the lift Q T by identifying a basis of the dual (Q T ) ⊥ . By Lemma 3.6, we can restrict our attention to the observed circuit vectors.
Fix a generator matrix G Q for Q and take its row-reduced restriction on T , which we denote by ( where v is a column with non-zero entries, then T corresponds to the support of a circuit vector, and the exact coordinates of this circuit vector are given by v.
We are now ready to describe an algorithm for constructing Q T .
Proposition 3.9. Let Q be a code over F and T a collusion pattern. The lift Q T can be constructed using the following algorithm: 1. Identify the set V of observed circuit vectors using restrictions of G Q on each colluding set; 2. Form a matrix H whose rows are the circuit vectors in V; 3. Row reduce and bring H to the systematic form without changing the labels of the columns (although their order might change).
4. Produce the parity check matrix G of H and permute its columns, so that their order corresponds to the index in Step 3. The matrix G is a generator matrix of Q T .
Proof. The matrix G is constructed to be a generator matrix of some vector space V , so it is unique up to elementary row operations. By construction, V is generated by the circuit vectors in C ∩ T , so, by Lemma 3.6, we have V = Q T .
We will now illustrate the algorithm with an example.

Invariance with respect to derived matroid
Let Q be a linear code of length n and dimension t − 1 over F and T a collusion pattern. By Corollary 3.5, the lifted code Q T is the direct sum of the lifts over all the connected components of T , together with a copy of F for each that is not contained in any colluding set. Moreover, by Lemma 3.6, the collusion pattern T can be replaced by T ∩ C, the largest elements of which have cardinality one larger than the dimension of the code Q. Thus, in order to understand lifts over arbitrary collusion patterns, we will henceforth restrict attention to patterns such that 1. T is connected, i.e. for every maximal colluding set T i in T , there is another maximal T j in T such that T i ∩ T j = ∅; 2. for every i ∈ [n], there exists a colluding set T such that {i} T ; 3. every maximal colluding set T in T has cardinality at most t; Proposition 4.1. Let T and T ′ be two different collusion patterns. Then there exists a code Q such that Proof. Take T ∈ T − T ′ . Choose Q F n q so that T is the only circuit in the matroid M(Q). Then, by Corollary 3.7, Q T ′ = F n q but Q T = Q.      where S ⊆ C ∩ T is a maximal subset of colluding sets that is independent in the derived matroid δM(Q) and F (S) is the flat of S in δM(Q).
Proof. Recall from Proposition 3.9 that Q T is generated by the circuit vectors corresponding to the colluding circuits C ∩ T . Therefore, every circuit vector of M(Q T ) ∩ M(Q) is an element of F (C ∩ T ). But On the other hand, if C ∈ F (S), then any codeword in Q T supported on C should satisfy the linear equation given by its circuit vector. Hence, C ∈ M(Q T ).
We are now ready to show that the operation of taking a lift is invariant with respect to the equivalence class of derived matroids. Proof. The fact that the derived matroids need to be equal follows from Lemma 4.5.
For the other direction, let S be a largest independent set of δM(Q 1 ) contained in T . Let F 1 (S) be the flat of S in δM(Q 1 ). Then by Lemma 4.7, M(Q T 1 ) ∩ M(Q 1 ) = F 1 (S). By assumption, the set S is also a largest independent set of δM(Q 2 ) contained in T and F 1 (S) = F 2 (S). Therefore, . This means that C * is a dependent set in M(Q 1 ) but every circuit of M(Q 1 ) contained in C * is an independent set in M(Q T 1 ). Therefore, if C C * is a circuit of M(Q 1 ), then C / ∈ F 1 (S). But then C / ∈ F 2 (S), so it is an independent set in M(Q T 2 ). For contradiction, assume that C * is an independent set in M(Q T 2 ). For a fixed code Q, denote by v(S) a dual vector supported on the subset S of the coordinates. By assumption, there exists a dual vector v( By construction, each v(C) is unique up to scalar, whenever C ∈ C ∩ T . Furthermore, v(C * ) is also unique up to scalar since it becomes a circuit vector in (Q T 1 ) ⊥ . Without loss of generality, we can assume that all α c are non-zero, i.e., C ∩ T = S.
Among the circuits C C * in M(Q 1 ), choose a collection Σ such that (C ∩ T ) ∪ Σ is an independent set of maximal rank in δM(Q 1 ). Consider a dual vector of the form If not all β and γ c are zero, then v is a non-zero vector. Assume that v is chosen to have minimal support and let the set X be its support. Since v is a linear combination of circuit vectors, X is a dependent set in M(Q 1 ). Furthermore, it is a circuit because if C ′ X C * is a circuit in M(Q 1 ), then (C ∩T ) ∪Σ∪{C ′ } is an independent set in δM(Q 1 ), but this contradicts the maximality of (C ∩ T ) ∪ Σ. Therefore, (C ∩ T ) ∪ Σ ∪ {X} is a dependent set in δM(Q 1 ), so it is also a dependent set in δM(Q 2 ). This is a contradiction, because C * is an independent set in M(Q T 2 ), so there is no dual vector Question 4.9. The property that two codes have the same matroid and the same derived matroid is an equivalence relation. The realization space of an F-representable matroid M is thus partitioned according to the derived matroids. Is it possible to characterise this partition, and is there a natural way in which one of the parts corresponds to a "generic" derived matroid of M?
We can think of at least two different meanings of genericity in Question 4.9. One interpretation is that one of the parts would have higher dimension than all the others, as varieties over F. Another interpretation is that there would be a natural partial order of the derived matroids with a unique maximal element. A related question with a more probabilistic flavour is the following: Question 4.10. Fix a F-representable matroid M and choose two representations Q 1 and Q 2 uniformly at random. What is the probability that they will have the same derived matroid?
In [21], a different but related notion of derived code was defined. For this definition, they conjecture a positive answer to Question 4.9, with an order-theoretic interpretation of genericity.

Collusion patterns equivalent to t-collusion
Recall from Corollary 3.7 that Q T = Q whenever T is a t-collusion pattern and dim Q < t. On the other hand, Example 1.5 shows that if Q is an [n, t−1] MDS code, it is sufficient to take the cyclical collusion pattern given by to ensure that Q T = Q. In this section we will study how much t-collusion can be reduced to maintain Q T = Q for a given Q. In this section we will state several sufficient conditions for t-equivalence. The next lemma shows that we can assume that both the matroid of the underlying code and the collusion pattern are connected. 2. If T is disconnected and M is connected, then Q Q T .

Proof.
1. This is [20,Lemma 10]. If an element e is a loop, then it is a circuit. Then up to non-zero scalar multiplication, there exists a unique circuit vector whose support is {e}. Hence, {e} does not belong to any circuit in δM, so it is a coloop. Therefore, δM = U 1,1 ⊕ δ(M\e).
If an element e is a co-loop, then it is not contained in any circuit.  On the other hand, if C ∩ T i is empty for some i, then Q Q T , so T is not t-collusion equivalent. Assume C ∩ T i is non-empty for all i. Then Remark 5.3. Since the circuit vectors of an [n, t − 1, d] code Q generate the dual code Q ⊥ , the inclusion maximal independent sets of circuits have cardinality n − t + 1. This is the same as saying that the rank of the derived matroid is n − t + 1.
Then S is independent in the derived matroid of every F-representation of M.
Proof. Consider the set V of the circuit vectors corresponding to the elements of S. Take any v i ∈ V. Since the vector v i will be the only vector in V to have non-zero coordinates on ∆. Hence, v i cannot be a linear combination of V − {v i }.
The following is a useful specialization of [20,Lemma 3].  The standard construction of a parity check matrix of a code that involves the transposition of the parity check elements is an example of the bases described in Proposition 5.5. Theorem 5.9 describes when a t-collusion equivalent pattern corresponds to the support of a triangular matrix. We provide examples of such collusion patterns on the ground set of 6 element in Figure 1.
∈ D and e is contained in exactly one C i . A collusion pattern T is called hanging if for every collection of generators C 1 , . . . , C k , D with D ⊆ ∪ k i=1 C i , there is a hanging element with respect to the pair (D, {C 1 , . . . , C k }). Theorem 5.9 (Patterns defining a triangular matrix). Let M be an Frepresentable matroid of rank t − 1. Let T be a collusion pattern generated by n − t + 1 circuits of M. Then the following characterisations of the generating set are equivalent and such T is t-collusion equivalent in every F-representation of M.
1. There is an ordering C 1 , C 2 , · · · , C n−t+1 of the generators of T that is triangular.
2. The collusion pattern T is hanging. Proof.
• 1 =⇒ t-equivalent : If C 1 , C 2 , · · · , C n−t+1 is a triangular sequence of generators of T , then the corresponding circuit vectors form a triangular matrix of rank n − t − 1, and so they generate the dual matroid of M. Thus, every linear condition on any representation of M is observed by T , so T is t-collusion equivalent.
Hanging" collusion pattern • 1 =⇒ 2 : If, for a circuit C i , there exists a collection then i j < i for all j. Then C min i j contains an element e min i j that is not in C i or any other C i j .
• 2 =⇒ 1 : We denote the set of generators of T by S, and want to show that there is a triangular ordering of S. By the hanging property, for any subcollection S ′ ⊆ S, there is at least one set in S ′ that is not covered by the others. Letting S 0 = S, we can thus inductively select C i+1 ∈ S i as a set that is not covered by the other sets in S i , and let S i+1 = S i − {C i+1 }. By construction, for every i = 1...n − t + 1 there is an element e i+1 ∈ C i+1 that is not contained in any of the sets in S i+1 , so {e 1 , e 2 , · · · , e i−1 } ∩ C i = ∅ ∀i > 1.
Thus, this is a triangular ordering of the generators in T .
Even though the two conditions in Theorem 5.9 are equivalent, the second one is more useful in practice because it gives a direct way to test an arbitrary collusion pattern in at most n − t + 1 steps as opposed to considering n! possible permutations of the ground set.
Moreover, Theorem 5.9 shows that in Example 1.5 it is sufficient to consider only n − t + 1 of the n cyclically ordered circuits.
Question 5.10. Is there an F-representable matroid M of rank t − 1 with a set of circuits S = {C 1 , C 2 , · · · , C n−t+1 } that does not satisfy the characterisation from Theorem 5.9 but is a basis in every derived matroid of M?
Question 5.11. What is the minimum number N of colluding circuits that guarantee that Q = Q T ? In general, N = 1 + max{|H| : H − hyperplane in δM(Q)}.
On the other hand, it has been shown that the generalized Hamming weight can be used to characterize the minimum size of a message that guarantees that the eavesdropper is able to recover the secret [22,Appendix] and [23,Section 3]. Is it possible to characterize the derived matroid of a code using its generalized Hamming distance?
We will close by considering one more situation. Assume that one element e of the ground set is compromised, that is, all circuits containing e are observed. Then T is t-collusion equivalent for all representations of M.
Proof. Denote the set of generators by S and let Q be an arbitrary representation of M over F. We claim that M(Q T ) is connected. On the one hand, since every circuit in M(Q T ) is also a circuit in M, and M is connected, the matroid M(Q T ) does not contain any loops. On the other hand, assume there exists an element f ∈ [n] that is a coloop in M(Q T ). Since M is connected, there is a circuit C e,f containing e and f . By construction, C e,f ∈ S, so C e,f is a circuit in M(Q T ) containing f .
Since M(Q T ) is connected and contains S, then we can apply [19, Theorem 4.3.2] to see that all circuits of M(Q T ) not containing e are given by where C 1 and C 2 are two distinct members of S. However, M is also connected and contains S, so its remaining circuits C − S are also given by (1). Therefore, M(Q T ) = M and, thus, Q = Q T .