Non-disjoint strong external difference families can have any number of sets

Strong external difference families (SEDFs) are much-studied combinatorial objects motivated by an information security application. A well-known conjecture states that only one abelian SEDF with more than 2 sets exists. We show that if the disjointness condition is replaced by non-disjointness, then abelian SEDFs can be constructed with more than 2 sets (indeed any number of sets). We demonstrate that the non-disjoint analogue has striking differences to, and connections with, the classical SEDF and arises naturally via another coding application.


Introduction
Difference families are long-studied combinatorial objects, with many applications.A family of (not necessarily disjoint) k-sets in a group G is a difference family if the multiset of pairwise internal differences (between distinct elements of each set) comprises each non-identity element of G λ times.External difference families (EDFs) were introduced in [10], motivated by the information security problem of constructing optimal secret sharing schemes; the mathematical link between EDFs and algebraic manipulation detection (AMD) codes was formalized in [11].EDFs are a generalisation of difference families, in which the pairwise external differences (between distinct sets) are considered; the inter-set differences correspond to possible manipulations of an encoded message.The sets are disjoint (to ensure unique decoding) and the multiset of external differences comprises each non-identity element of G λ times (the identity is ignored as it would correspond to "no manipulation").
An EDF is an optimal example of a weak AMD code ( [11]).There is a stronger security model (strong AMD code) which motivates the definition of a strong external difference family (SEDF) ( [11]).An SEDF is an EDF such that, for any set in the family, the pairwise differences between elements of this set and those of all other sets in the family comprise each non-identity element λ times.An SEDF is an EDF, but not every EDF is an SEDF.
Many constructions for SEDFs exist in the combinatorial literature (see [1,3,5,13]).Surprisingly, all known infinite families have two sets, and just one example is known with more than two sets (having 11 sets in a group of size 243, obtained independently in [5] and [13]).It is conjectured in [6] that no other abelian SEDF exists with more than two sets; many theoretical results and computational searches support this (eg [5,6,7,8]).It is also notable that the only known infinite families with fixed λ have λ = 1.
In this paper we introduce the non-disjoint analogue of the SEDF (the sets are no longer disjoint and the multiset condition requires λ occurences of every group element), and demonstrate that an abelian infinite family exists whose members can have any number of sets.We also obtain infinite families of two-set non-disjoint SEDFs with any fixed frequency value λ ∈ N.For λ = 1, we show that our two-set construction corresponds to a known two-set construction for classical SEDFs, and indicate why our non-disjoint constructions do not yield classical SEDFs with λ > 1 or more than two sets.
The type of non-disjoint SEDFs which we construct satisfies a stronger condition on their external difference properties, namely that the multiset of external differences between any pair of distinct sets in the family comprises each element of G precisely λ times.We call these pairwise strong external difference families (PSEDFs).Each PSEDF is a non-disjoint SEDF but not vice versa.We demonstrate that PSEDFs are useful and indeed optimal for a different application in communications theory, that of optical orthogonal codes and conflict-avoiding codes ( [2], [12]).Here, the external differences between sets correspond to cross-correlation of binary sequences; there is no requirement for disjointness and the identity is treated in the same way as any other group element (since collision with the zero-shift of another sequence is just as significant as collision with any other shift).

Background
All groups are written additively.For two sets A, B in a group G, define the multisets The notation λA denotes the multiset consisting of λ copies of a set A. A translate of a set A is denoted by t + A = {t + a : a ∈ A}.
The existing definition of an SEDF is as follows.
Definition 2.1.Let G be a group of order v and let m > 1.A family of disjoint k-sets The following is a summary of known existence results for abelian SEDFs.
Proposition 2.2.A (v, m, k, λ)-SEDF exists in G in the following cases: ).There are very many non-existence results for SEDFs in abelian groups with more than two sets ( [6]); we summarize some key ones: G is cyclic of prime power order [6]; or (iv) v is a product of at most three (not necessarily distinct) primes, except possibly when G = C 3 p and p is a prime greater than 3 × 10 12 [6]; or Other non-existence conditions for m > 2 include: ) and v − 1 is squarefree ( [3]).In [7], a result is given for groups of order pq for p sufficiently large.
We adapt the classical definition of SEDF in Definition 2.1 by removing the disjointness condition, so the identity becomes a valid external difference.For consistency we require that the identity occurs at the same frequency as the other group elements.This in fact means the sets must be non-disjoint, so this structure is genuinely distinct from the classical SEDF, i.e. it is the non-disjoint analogue rather than a generalisation.Definition 2.4.Let G be a group of order v and let m > 1.We say that a family of k-sets This may be viewed as mathematically more "natural" than the existing SEDF definition, as it treats every element of the group equally.
A classical (v, m, k, 1)-SEDF exists in an abelian group if and only if m = 2 and v = k 2 +1 or k = 1 and m = v [11].An analogous result holds for non-disjoint SEDFs (for these, by Lemma 2.7(i), k = 1 cannot occur).
When working in cyclic groups, we will use the following well-known correspondence with binary sequences (for background on sequences, see [9, Section 5.4]).A binary sequence of length v is a sequence X = x 0 . . .x v−1 where each x i ∈ {0, 1}.We also denote this by X = (x i ) v−1 i=0 .We call a contiguous subsequence x δ x δ+1 . . .x δ+r−1 of X a substring of X of length r.We take indices modulo v unless otherwise stated.The weight of a binary sequence is the number of occurrences of the symbol 1 in the sequence.We call X + s = (x i+s ) v−1 i=0 a (cyclic) shift of X by s places.Definition 2.9.(i) For a k-subset A of Z v = {0 . . ., v−1}, we associate a binary sequence (ii) For a binary sequence In Z 7 , the set A = {1, 2, 4} corresponds to the sequence 0110100.Using this correspondence, we have the following useful relationship.
), with indices taken modulo v, be the sequences corresponding to k-subsets A and B in Z v .Then: x t y t+δ equals the number of occurrences of δ in ∆(B, A).
Proof.For fixed δ ∈ {0 . . ., v − 1} the sum v−1 t=0 x t y t+δ counts the number of positions t such that x t = 1 = y t+δ .This is the number of t ∈ Z v such that t ∈ A and t + δ ∈ B, i.e. the number of times δ occurs in ∆(B, A).
We end the section with some further sequence terminology.Let X be a sequence.Following [9], we define a run of X to be a substring of X consisting of consecutive 0s or consecutive 1s which is neither preceded nor succeeded by the same symbol.We call a run of 0s a gap and a run of 1s a block.For example in the length-9 sequence 111100010, 1111 is a substring which is a block of length 4, and 000 is a substring which is a gap of length 3.

Constructions for PSEDFs
In this section, we present results for PSEDFs and non-disjoint SEDFs which demonstrate their differences and similarities with classical SEDFs.We use binary sequences.For a sequence X = (x i ) v−1 i=0 , indices are taken modulo v.We first construct infinite two-set families of non-disjoint SEDFs for any λ-value.For classical SEDFs, all known families with fixed λ have λ = 1.Here X is a block of length k then a gap of length v − k, while Y comprises a block of length λ then a gap of length k − λ, repeated v k times.Write X = (x t ), Y = (y t ).Let δ ∈ {0, . . ., v − 1}.Consider Σ v−1 t=0 x t y t+δ .We see that Σ v−1 t=0 x t y t+δ = Σ k−1 t=0 x t y t+δ , since x t = 0 for t = k, . . ., v − 1, so we need only consider the length-k substring Y δ = y δ y 1+δ . . .y k−1+δ of Y .The value of Σ k−1 t=0 x t y t+δ is exactly the number of 1s in Y δ .By construction, if any length-k substring W of Y starts with some s ≤ λ 1s, it is followed by a gap of length k − λ, which is then followed by a block of length λ − s.If it starts with some s ≤ k − λ 0s, it is followed by a block of length λ, which is then followed by a gap of length k − λ − s.In either case there are always λ 1s in W . Hence Σ v−1 t=0 x t y t+δ = Σ k−1 t=0 x t y t+δ = λ.This applies to any δ, and hence by Proposition 2.10, Corollary 3.2.(i) For any a, r ∈ N, there exists a (ra 2 , 2, ra, r)-PSEDF in Z ra 2 .
We have the following generalisation of Theorem 3.1.
Theorem 3.3.Suppose v|k 2 and k|v.Let X = (x t ) v−1 i=0 be defined by t=0 is any sequence such that (y t+k ) = (y t ) and y 0 . . .
We next show non-disjoint SEDFs exist with any number of sets.
Proof.For 1 ≤ i ≤ N, define the binary sequence X i = (x t ) v−1 t=0 as follows: So for each X i we have: . X i consists of a block of length 2 i−1 , followed by a gap of length 2 i−1 , and this length-2 i substring is repeated 2 N −i times.By construction, since x t = x t+2 i for t ≥ 2 i , every substring of length 2 i has an equal number of 1s and 0s, i.e. has weight 2 i−1 .Therefore any substring of length r where 2 i |r has weight r 2 .In particular, a substring of length 2 j , j ≥ i, has weight 2 j−1 .
We claim that these sequences X i (1 ≤ i ≤ N) correspond to sets A i (1 ≤ i ≤ N) which form a PSEDF in Z 2 N with the given parameters.We determine ∆(A i , A j ) for 1 ≤ i = j ≤ N; by symmetry we may assume i < j.

Relationship to classical SEDFs
We next explain the similarity between the family of non-disjoint SEDFs in Corollary 3.2(iii) and the family of SEDFs in Example 1.
Proposition 4.1.Let v|k 2 and k|v.As subsets of Z v+1 , the sets of the non-disjoint (v, 2, k, Proof.Take the two length-v sequences X, Y which correspond to the sets A X and A Y in Theorem 3.1.By appending an additional 0 at the end of each sequence, the new length- . X ′ is a block of length k followed by a gap of length v − k + 1, while Y is a block of length λ = k 2 /v followed by a gap of length k − λ, which is repeated v/k times, except that the final gap now has length t+δ is exactly the number of 1s in Y ′ δ , i.e. its weight.We determine Y ′ 0 , . . ., Y ′ v .For 0 ≤ δ ≤ v − 1, let Y δ = y δ y 1+δ . . .y k−1+δ , the substring of the original sequence Y .We have Hence by Proposition 2.10, ∆(A Y ′ , A X ′ ) (and by symmetry ∆(A For (ii), observe the sets of Theorem 3.1 can be made disjoint by translation in Z v+1 , only if the sequence Y ′ has a gap of at least the size of the block in X ′ .This is possible only if Similarly, it is not possible to convert the non-disjoint SEDFs of Theorem 3.4 to classical SEDFs in Z 2 N +1 , except when N = 2 (giving the (5, 2, 2, 1)-SEDF {0, 1}, {2, 4} in Z 5 ).For N > 2, appending 0 gives a structure with more than two frequencies and disjointness is impossible.

Motivation from communications systems
While classical SEDFs arise from AMD codes, non-disjoint SEDFs and PSEDFs have a different communications motivation.Optical orthogonal codes (OOCs) are sets of binary sequences with good auto-and cross-correlation properties for use in optical multi-access communication.The auto-correlation of a sequence X measures how much it collides with its shifts; its cross-correlation with sequence Y measures how much X collides with the shifts of Y (two sequences collide in position i if both have 1's in the ith position).Definition 5.1.Let v, w, λ a , λ c be non-negative integers with v ≥ 2, w ≥ 1.Let C = {X 0 , . . ., X N −1 } be a family of N binary sequences of length v and weight w.
i.e. if auto-correlation values are at most λ a and cross-correlation values are at most λ c .
Although called "codes", OOCs are used as sets of periodic sequences, with X i being repeated.A correlation value gets a contribution of 1 precisely if both sequences have a 1 in the same position.In using OOCs for communication, information can be sent only when there is a 1 in the sequence; if two sequences are used and there is a 1 in both sequences then interference occurs, which can result in errors in both received signals.So a key design principle is to have low cross-correlation values.For more on OOCs see [2].
By Definition 2.9, OOCs can be reformulated as subsets of Z v .Let {X 0 , . . ., X N −1 } be a (v, w, λ a , λ c )-OOC.For each sequence X i , let A i be the set of integers modulo v denoting the positions of the 1s.Then A i ⊆ Z v , |A i | = w for all 0 ≤ i ≤ N − 1, and we have the conditions: (i) |A i ∩ (A i + δ)| ≤ λ a for all δ ∈ Z v \ {0}, i.e. any non-zero δ occurs in ∆(A i ) at most λ a times.
(ii) |A i ∩ (A j + δ)| ≤ λ c for all δ ∈ Z v , i.e. any δ occurs in ∆(A i , A j ) at most λ c times.
An OOC with λ c = 1 and no auto-correlation requirement is a conflict-avoiding code (CAC); see [12].CACs are equivalently defined by the condition that {∆(A 1 ), . . ., ∆(A n )} are pairwise disjoint (distinct x 1 , x 2 ∈ A i and y 1 , y 2 ∈ A j (i = j) with x 1 − x 2 = y 1 − y 2 implies two distinct expressions for x 1 − y 1 in ∆(A i , A j ), and conversely).Proposition 5.2.If C is a (v, w, λ a , λ c )-OOC with |C| ≥ 2, then λ c ≥ w 2 v .Proof.Let C = {A 0 , . . ., A N −1 } as subsets of Z v .Let F = {((x, y), δ) : x − y = δ, x ∈ A i , y ∈ A j } for some A i , A j , A i = A j .There are w values of x and w values of y, and for each pair of (x, y) there is a unique δ = x − y, so |F | = w 2 .On the other hand there are v possible values of δ, and at most λ c pairs of (x, y) such that x − y = δ.So |F | ≤ vλ c .