Primal and Dual Combinatorial Dimensions

We give tight bounds on the relation between the primal and dual of various combinatorial dimensions, such as the pseudo-dimension and fat-shattering dimension, for multi-valued function classes. These dimensional notions play an important role in the area of learning theory. We first review some (folklore) results that bound the dual dimension of a function class in terms of its primal, and after that give (almost) matching lower bounds. In particular, we give an appropriate generalization to multi-valued function classes of a well-known bound due to Assouad (1983), that relates the primal and dual VC-dimension of a binary function class.


Introduction
The Vapnik-Chervonenkis (VC) dimension [11] is a fundamental combinatorial dimension in learning theory used to characterize the complexity of learning a class X consisting of functions f : Y → {0, 1} where X and Y are given (possibly infinite) sets. Informally, the VC-dimension captures how rich or complex a class of functions is. Many extensions of the VC-dimension to multi-valued functions f : X → Z, for some given Z ⊆ Ê, have been proposed in the literature, such as the Vapnik-dimension (also known as the uniform pseudo-dimension) [10], the Pollard-dimension (also known as pseudo-dimension) [9,5], and the fat-shattering dimension [6]. All these combinatorial dimensions are formally defined in Section 2.
Every (primal) class of functions can be identified with a dual class whose functions are of the form g y : X → Z for y ∈ Y defined by g y (f ) = f (y) for f ∈ X. When interpreting a function class as a matrix A whose rows and columns are indexed by X and Y , respectively, the dual class is simply given by the transpose matrix A ⊤ . The (VC, pseudo-, etc..) dimension of the dual class is defined as the dimension of the matrix A T .
Assouad [2] showed the following relation between the primal VC-dimension VC(A) and the dual VCdimension VC * (A): This has turned out to be a very useful inequality, e.g., in the context of so-called sample compression schemes [8]. In case VC * (A) is a power of two, this immediately yields VC * (A) ≤ 2 VC(A) . It is known that this bound is tight for all values of VC * (A), see, e.g., [7].
The purpose of this work is to understand the relation between the primal and dual of combinatorial dimensions for multi-valued function classes, in particular, for multi-valued functions where Z = {0, 1, . . . , k} for k ∈ N. For the pseudo-dimension, as explained in Section 3, it can be shown that which naturally generalizes Assouad's bound in (1.1). 1 Again, when Pdim * (A) is a power of two, this yields Our first contribution is that the bound in (1.3) is in fact tight for every value of k and Pdim(A) (Theorem 4.2). In case Pdim(A) = 1, we give an improved bound of k + 2 (Theorem 4.1), and also show that this is tight (Theorem 4.2). We obtain similar bounds for the fat-shattering dimension (Theorem 4.5).
Remark 1.1. It is sometimes believed that Assouad's bound also holds for combinatorial dimensions beyond the VC-dimension, see, e.g., [4]. Our results show that this is, unfortunately, not correct.
Outline. We continue in Section 2 with all the necessary definitions and notations, in particular the formal definitions of all combinatorial dimensions considered in this work. Then, in Section 3, we outline known results regarding the relations between various combinatorial dimensions and their duals. After that, in Section 4, we summarize our results, followed by their proofs in Section 5.

Preliminaries
and refer to A x as a row of A. For y ∈ Y , we define A y : X → Z by A y (x) = A(x, y) and refer to A y as a column of A. The transpose of A is defined as the function A ⊤ : Y × X → Z given by A ⊤ (y, x) = A(x, y). As suggested by this terminolgy, we view A as a (possibly infinite) matrix with rows indexed by X, columns indexed by Y and with A ⊤ as its transpose. Note that B d is unique modulo renaming rows and columns.
Definition 2.1 (Shattered sets). Let A : X × Y → Z, with Z ⊆ R, be a matrix and let J ⊆ Y be a subset of its columns.
1. Suppose that Z = {0, 1}. We say that J is VC-shattered by A if, for every function b : J → {0, 1}, there exists an x ∈ X such that, for every y ∈ J, we have B(x, y) = b(y).
2. We say that J is P-shattered by A if there exists a function t : J → Ê such that the following holds: for every function b : J → {0, 1}, there exists an x ∈ X such that, for every y ∈ J, we have A(x, y) ≥ t(y) iff b(y) = 1.
3. Let γ > 0. We say that J is P γ -shattered by A if there exists a function t : J → Ê such that the following holds: for every function b : J → {0, 1}, there exists an x ∈ X such that, for every y ∈ J, we have 4. We say that J is V-shattered by A if there exists a number t ∈ Ê such that the following holds: for every function b : J → {0, 1}, there exists an x ∈ X such that, for every y ∈ J, we have A(x, y) ≥ t iff b(y) = 1.

5.
Let γ > 0. We say that J is V γ -shattered by A if there exists a number t ∈ Ê such that the following holds: for every function b : J → {0, 1}, there exists an x ∈ X such that, for every y ∈ J, we have We will refer to t : J → Ê occuring in the definition of P -and the P γ -shattered sets as the thresholds used for shattering J. Similarly, we will refer to t ∈ Ê occuring in the definition of V -and the V γ -shattered sets as the uniform threshold used for shattering J.
Definition 2.2 (Combinatorial dimensions). Let A : X × Y → Z be a matrix. Let τ ∈ {VC, P, P γ , V, V γ } be one of the shattering types mentioned in Definition 2.1. The (primal) τ -dimension of A is the size of a largest set J ⊆ Y that is τ -shattered by A (resp. ∞ if there exist τ -shatterable sets of unbounded size). The dual τ -dimension of A is defined as the τ -dimension of A ⊤ .
The matrix obtained by thresholding the columns of A : X ×Y → Z at t : Y → Ê is defined as the Boolean matrix B : X × Y → {0, 1} such that, for all x ∈ X and y ∈ Y , we have B(x, y) = 1 iff A(x, y) ≥ t(y). For I ⊆ X and J ⊆ Y , we denote the restriction of A to I ×J by A I,J . In other words: A I,J is the submatrix of A whose rows are indexed by I and whose columns are indexed by J. A witness for the inequality Pdim(A) ≥ d is defined as a tripel (I, J, t) such that the following holds: 1. I is a subset of X of size 2 d , J is a subset of Y of size d and t : J → Ê. When analyzing the P -or the V -dimension of a matrix with entries in [k] 0 , we will assume that thresholds are taken from [k] whenever we find that convenient.

Known relations
In this section we review some known relations between the combinatorial dimensions defined in Section 2.

Bounding P-in terms of V-dimension
It follows directly from the definitions that This raises the question whether we can bound the P -in terms of the V -dimension (resp. the P γ in terms of the V γ -dimension). The gap between Pdim(A) and Vdim(A) cannot be bounded in general, as the following well-known example shows. In order to bound the P -in terms of the V -dimension, the focus will therefore be on matrices of the form A : X × Y → [k] 0 . According to the following results of Ben-David et al. [3] (here expressed in our notation), the P -can exceed the V -dimension by factor k, but not by a larger factor 3 : Alon et al. [1] have bounded P γ -in terms of the V γ/2 -dimension.
Proof. The thresholds t 1 , . . . , t d used for P γ -shattering d := P γ (A) many columns of A must belong to the interval [γ, 1 − γ]. Any threshold t i can be rounded to the closest multiple of γ. Denote the latter byt i . The inequality (3.3) becomes now evident from the following observations. First, by using the thresholdst i instead of t i , the width of shattering may drop from γ to γ/2 (but not beyond). Second,t 1 , . . . ,t d can take on at most By the pidgeon-hole principle, there is some t ∈ {t 1 , . . . ,t d } that can be used for V γ/2shattering d/r many points.

Bounding dual dimension in terms of its primal
A well-known result due to Assouad [2] already mentioned in Section 1, which we will refer to as Assouad's bound, states that one can upper bound the dual VC-dimension in terms of the (primal) VC-dimension.
Note that, under the assumption that VC * (A) is a power of two, this means The bound in (3.5) is known to be tight for every value of VC(A), see, e.g., [7]. In Appendix A we show that the Assouad's bound also holds for Vdim(A) and V γ (A), based on the notion of uniform Ψ-dimension as defined in [1]. These observations are summarized in the following result.
, is a power of two, this means Combining Theorem 3.2 (applied to A ⊤ ) with Corollary 3.6, we directly obtain the following result: Theorem 3.7 (Folklore). For every matrix A : X × Y → [k] 0 , the following holds: Similarly, combining Theorem 3.4 with Corollary 3.6, we directly obtain the following result.

Our results
In this section we describe our new contributions, that complement those mentioned in Section 3. We first discuss results related to the pseudo-dimension. We start with a result showing that the upper bound on Pdim * (A) in Theorem 3.7 can be improved by a factor 2 (roughly) for matrices A with Vdim(A) = 1.
The next result implies that the upper bound on Pdim * (A) in the second statement of Theorem 3.7 is tight for matrices with Vdim(A) ≥ 2, as well as the upper bound on Pdim * (A) in Theorem 4.1 whenever Vdim(A) ≥ 1.
Theorem 4.2. The following two lower bounds hold: In combination with the a technical tool defined in Section 5.2, we also obtain the following corollary. It stands in stark contrast to Assouad's bound for the VC-dimension. We next move to our results for the fat-shattering dimensions. The first result here implies that upper bound on P γ (A) from Theorem 3.4 is tight up to a small constant factor: Finally, our last results state that the bound on P * γ (A) from Corollary 3.8 is tight up to a small constant factor.

Proofs
Section 5.1 is devoted to the proof of Theorem 4.1. In Section 5.2, we make some considerations which will allow for an easier presentation of our lower bound constructions, that are given in Sections 5.3 and 5.4.

Proof of Theorem 4.1
For k = 1, the assertion of the theorem collapses to the claim that VC * (A) ≤ 3 for every Boolean matrix A with Vdim(A) = 1. This is an immediate consequence of (3.4). Suppose now that k ≥ 2. It suffices to show that Pdim * (A) ≥ k + 3 implies that Vdim(A) ≥ 2 (i.e., we give a proof by contradiction). Pick a witness (I, J, t) for Pdim * (A) ≥ k + 3. More concretely: • The matrix obtained by thresholding the rows of A I,J at t equals B ⊤ k+3 . We may assume that, after renumbering the rows appropriately, one has t 1 ≤ . . . ≤ t k+3 . We decompose the rows of A I,J into maximal blocks such that the same threshold is assigned to every row from the same block. Since any threshold t i is taken from [k], the total number k ′ of blocks is bounded by k. A block that is different from the first and from the last block is said to be an inner block. We proceed by case analysis: Case 2: The first or the last block contains 3 rows. For reasons of symmetry, we may assume that the first block contains 3 rows. Consider the following (4 × 2)-submatrix of B ⊤ k+3 : 0 0 0 1 1 0 1 1 The first three rows are taken from the first block and the last row is taken from the last block. The separation line between the third and the last row is only intended to illustrate the transition from one block to another. Remember that the rows of the first block of A I,J are thresholded at t 1 while the rows of the last block are thresholded at t k ′ > t 1 . Hence, if we threshold all rows (or all columns) of A I,J at t 1 , then the above submatrix of B ⊤ k+3 will remain unchanged. We may conclude from this discussion that Vdim(A) ≥ 2.
Case 3: One of the inner blocks contains 2 rows, say block b.
The argument is similar to that given in Case 2. The relevant submatrix of B ⊤ k+3 (with one row of the first block, two rows of block b, one row of the last block and two separation lines inbetween) now looks as follows: 0 0 0 1 1 0 1 1 Since t 1 < t b < t k ′ , thresholding all rows (or all columns) of A I,J at t b will leave the above submatrix of B ⊤ k+3 unchanged. We may conclude that Vdim(A) ≥ 2. Since A I,J has k + 3 rows (with k ≥ 2), it is easy to argue that one of the three above cases must occur. Suppose first that k = 2. Then there at most 2 blocks and 5 rows. It follows that the first or the last block contains at least 3 rows. Suppose now that k ≥ 3. If the first and the last block contain at most two rows, respectively, then at least k − 1 rows are left for the k ′ − 2 ≤ k − 2 inner blocks. By the pidgeon-hole principle, there must be an inner block with two rows. This completes the proof of Theorem 4.1. The general balance condition implies the following:

Preliminaries for lower bound constructions
iii) 1st Balance Condition: Each column of B d has as many zeros as ones.
iv) 2nd Balance Condition: For any two distinct columns of B d , any pattern from {0, 1} 2 is realized within these columns by the same number of rows.
Remark 5.1 (Proof templates). Consider a matrix A : X × Y → [k] 0 . The following template for proving assertions like Pdim(A) ≤ d will prove itself quite useful.
• Pick a witness (I, J, t) for this inequality.
• Exploit the fact that the matrix B obtained by thresholding the columns of A I,J at t must be equal to B d+1 .
• Prove that B violates one of the conditions that B d+1 must satisfy.
Sometimes the following (slightly simpler) template can be used instead: • Take a fixed but arbitrary function t : Y → [k].
• Let B be the matrix obtained by thresholding the columns of A at t.
• Show that no more than d columns of B have at least 2 d zeros and at least 2 d ones.
This also shows that Pdim(A) ≤ d because no submatrix of B with d + 1 columns and 2 d+1 rows has a chance to satisfy the first balance condition.
We next introduce matrices that, though not being Boolean, are close relatives of the matrix B d . 2. Obtain A from B D by replacing any 1-entry (resp. 0-entry) in a column belonging to block b ∈ [k] by b (resp. by b − 1).
The B ⊤ D -based matrix with k row blocks of sizes d 1 , . . . , d k is defined analogously. Note that the matrix A resulting from the above procedure has the property that, for any two columns y 1 in block b 1 and y 2 in block b 2 > b 1 and any row x, we have A(x, y 1 ) ≤ A(x, y 2 ). We will refer to this property as block monotonicity.
At this point we also bring into play the matrixȦ, which is defined as the matrix A augmented with a row of zeros. Formally, we assume that 0 / ∈ X and defineȦ : (X ∪ {0}) × Y → Z as the extension of A which satisfiesȦ(0, y) = 0 for all y ∈ Y . The (technical) use ofȦ will become clear in Section 5.4 (in particular, this is explained after Definition 5.7), but it is already included in the statements that follow. Next, set d max = max j∈[k] d j . Pick some index j max ∈ [k] such that d jmax = d max . We still have to show that Vdim(A) = Vdim(Ȧ) = d max . Thresholding the columns of A at the uniform threshold j max , we obtain a matrix B that equals B D within block j max . This shows that Vdim(A) ≥ d max . The inequality Vdim(Ȧ) ≤ d max can be seen as follows. Pick a fixed but arbitrary J ⊆ [D] of size 1 + d max and a fixed but arbitrary uniform threshold t ∈ [k]. Let B be the matrix obtained by thresholding the columns ofȦ at t. The set J must contain two columns belonging to two different blocks, say column y 1 in block b 1 and column y 2 in block b 2 > b 1 . By the block-monotonicity of A (which implies block-monotonicity forȦ as well), no row of B can assign label 1 to y 1 and label 0 to y 2 . Since J and t were arbitrary choices, it follows that no set of size 1 + d max can be V -shattered byȦ. Before we proceed with the proof, we fix some notation. For b = 1, . . . , k, let I b denote the set of row indices in I that belong to block b of A. Set I 0 = I ∩ {0} and note that k b=0 |I b | = |I| = 2 d = 2s (twice the block size) denote the smallest and second-smallest (resp. largest and second-largest) b ∈ [k] such that |I b | = 0. An obvious question is whether 0 ∈ I, that is, whether the extra all-zeros row is among the rows of B. We claim that this is not the case. Assume for contradiction that 0 ∈ I. We proceed by case analysis: Pick an arbitrary but fixed column j ∈ J of B. In order to satisfy the first balance condition, the threshold t j must be large enough so that in block b 0 of column j only 0-entries are found. 6 It follows that any row of B belonging to block b 0 has 0-entries only and therefore coincides with the extra all-zeros row. This is in contradiction with the distinctness condition.
Case 2: |I b0 | = s. Pick an arbitrary but fixed column j ∈ J of B. In order to satisfy the first balance condition, the threshold t j must be large enough so that in block b 0 of column j at least s − 1 0-entries are found. Pick another column j ′ = j in B (also with at least s − 1 0-entries in block b 0 ). Then the pattern 00 occurs in columns j and j ′ of B at least s − 1 times (one time in row 0 and at least s − 2 times in block b 0 ). But then B must have at least 4(s − 1) rows in order to satisfy the second balance condition. Hence 4(s − 1) ≤ 2s (because B has 2s rows). It follows that s ≤ 2, which is in contradiction with our assumptions that d ≥ 2 and s = 2 d ≥ 4.
In any case, we arrived at a contradiction, which proves the above claim that 0 / ∈ I. In order to accomplish the proof, we still have to derive a final contradiction. We proceed by case distinction again.
In order to satisfy the first balance condition, the threshold t j of any column j ∈ J must be large enough so that in block b 0 of this column only 0-entries are found. Thus all rows of B belonging to block b 0 realize the all-zeros pattern, which is in contradiction with the distinctness condition.
Case B: |I b0 | = 1 and |I b ′ 0 | ≤ s − 1. The argument is similar. Now the single row of B belonging to block b 0 and all rows of B belonging to block b ′ 0 realize the all-zeros pattern.
Case C: |I b1 | ≥ 2 or (|I b1 | = 1 and |I b ′ 1 | ≤ s − 1). Then, for reasons of symmetry, the last two rows of B both realize the all-ones pattern.
This contradicts to our assumption d ≥ 2 and s = 2 d ≥ 4.
Hence it suffices to show that Pdim(Ȧ) ≤ 1. The rows ofȦ have indices 0, 1, . . . , k + 2 and index 0 is reserved for the all-zeros row. Assume for contradiction that Pdim(Ȧ) ≥ 2 and pick a witness (I, J, t) for this inequality so that the following holds: , say t(j 1 ) = t 1 and t(j 2 ) = t 2 .
• The matrix B obtained by thresholding the columns ofȦ I,J at t equals B 2 (with rows indexed by I and columns indexed by J).
Consequently B satisfies the distinctness condition and the balance conditions. Consider the smallest index i 1 and the second-smallest index i 2 in I. Note that, since |I| = 4 and the last block of B is of size 2, neither i 1 nor i 2 belongs to the last block, i.e., either i 1 , i 2 ∈ {0, 1, 2} or i 2 belongs to one of the inner blocks consisting of a single row only. In order to establish the first balance condition for the matrix B, the thresholds t 1 and t 2 must be large enough so that only zeros are found in the first two components (indexed by i 1 and i 2 ) of the columns j 1 and j 2 . Thus the first two rows of B both realize the all-zeros pattern, which is in contradiction with the distinctness condition.

Proofs of Theorems 4.4 and 4.5
Matrices A with the properties as prescribed by Theorems 4.4 and 4.5 are easy to construct by means of a suitable operation that merges matrices of a given matrix family into a single matrix.
0 be a given family of matrices. Let X (resp. Y ) denote the disjoint union of the sets X k (resp. Y k ) with k ≥ 1. Assume that X ∩ Y = ∅. For every x ∈ X, let k(x) denote the unique k such that x ∈ X k . The notation k(y) is understood analogously. The matrix A : X × Y → [0, 1] given by is called the merge of the family (A k ) k≥1 .
The merge-operation reveals why we introduce the matrixȦ: The pseudo-dimension (or any other combinatorial dimension for that matter) of the matrix A restricted to the columns Y k is nothing more than the pseudo-dimension of the functions in A k augmented with an infinite number of functions that are zero everywhere. The pseudo-dimension of this function class clearly equals the pseudo-dimension of the matriẋ A k . The merge-operation has the following properties: Lemma 5.8. Let A be the merge of the family (A k ) k≥1 . Then the following holds: Claim 3: Let k 1 denote the common k-value of y 1 , . . . , y d . Then any row x in B with k(x) = k 1 has 0-entries only.
Proof. This is straighforward.
We conclude from Claims 2 and 3 that Pdim(A) = d ≤ Pdim(Ȧ k1 ) and, by assumption, the latter quantity is at most d 0 , which concludes the proof. Theorem 4.4 is now a direct consequence of Lemma 5.8 in combination with Corollary 5.4, while Theorem 4.5 is a direct consequence of Lemma 5.8 in combination with Lemmas 5.5 and 5.6. In order to prove Corollary 4.3, note that Lemma 5.6 tells us that for every k there exists a matrix A k such that Pdim(Ȧ k ) = 1 and Pdim * (Ȧ k ) ≥ Pdim * (A k ) = k + 2. We may then apply Lemma 5.8.

A On the derivation of Assouad's bound for uniform dimensions
We say that J ⊆ Y is VC-shattered by A : X × Y → {0, 1, * } if, for every function b : J → {0, 1}, there exists an x ∈ X such that, for every y ∈ J, we have B(x, y) = b(y). We first note that (3.4) is also valid for every matrix of the form A : X × Y → {0, 1, * }: the central observation in the proof is that B d contains B ⊤ ⌊log d⌋ as a submatrix. This implies that VC(A) ≥ ⌊log VC * (A)⌋, which is equivalent to (3.4).
Consider now a matrix of the general form A : X × Y → Z with Z ⊆ Ê. Making use of the concept of uniform Ψ-dimensions from [3], Let Ψ Y denote the set of all collectionsψ = (ψ y ) y∈Y with ψ y ∈ Ψ. Denote byψ(A) the matrix obtained from A by replacing each entry A(x, y) with ψ y (A(x, y)). The (non-uniform) Ψ-dimension of A is defined as Φ(A) = sup ψ∈ΨY VC(ψ(A)) .
As noted in [3], several popular combinatorial dimensions can be viewed as (uniform or non-uniform) ψ-dimension. Here we are particularly interested in the P -, P γ , V -and V γ -dimension: Remark A.1. We next explain how to interpret known dimensions as special cases of the Ψ-dimension.
The following calculation, with ψ ranging over all functions in Ψ, shows that Theorem 3.5 can be extended to any uniform Ψ-dimension at the place of the VC-dimension: We remark that a similar argument for the non-uniform Ψ-dimension fails as it then no longer holds that ψ(A ⊤ ) =ψ(A) ⊤ (which is the argument we use in the third equality above).