Riding the saddle point: asymptotics of the capacity-achieving simple decoder for bias-based traitor tracing

We study the asymptotic-capacity-achieving score function that was recently proposed by Oosterwijk et al. for bias-based traitor tracing codes. For the bias function, we choose the Dirichlet distribution with a cutoff. Using Bernstein’s inequality and Bennett’s inequality, we upper bound the false-positive and false-negative error probabilities. From these bounds we derive sufficient conditions for the scheme parameters. We solve these conditions in the limit of large coalition size c0 and obtain asymptotic solutions for the cutoff, the sufficient code length, and the corresponding accusation threshold. We find that the code length converges to its asymptote approximately as c0−1/2, which is faster than the c0−1/3 of Tardos’ score function. 94B60


Traitor tracing
Forensic watermarking is a means for tracing unauthorized redistribution of digital content. Before distribution, the content is modified by embedding an imperceptible watermark, which plays the role of a personalized identifier. When an unauthorized copy of the content is found, a tracing algorithm outputs a list of suspicious users, based on the watermark detected in this copy.
The most powerful attacks against watermarking are collusion attacks, in which multiple attackers (the 'coalition') combine their differently watermarked versions of the same content; the observed differences point to the locations of the hidden marks and allow for a targeted attack.
Collusion-resistant codes have been specifically designed as a defense against collusion attacks: when codewords from such a code are embedded into the content, the surviving parts of the watermark, after the collusion attack, still contain enough information to identify (some of the) attackers, provided that the coalition is not too large.
In the past two decades, several types of collusion-resistant codes have been developed. The most popular type in the recent literature is the class of bias-based codes. These were introduced by G. Tardos in 2003. The code construction consists of two steps: first, a sequence of biases is generated, one for each position in the content; then, the watermark symbols for each user are randomly drawn according to these biases. The original paper [1] was followed by a flurry of activity, e.g., improved analyses [2][3][4][5][6][7], code modifications [8][9][10], decoder modifications [11][12][13][14], and various generalizations [15][16][17][18]. The advantage of bias-based versus deterministic codes is that they can achieve the asymptotically optimal relationship ∝ c 2 0 between the sufficient code length and the coalition size c 0 to be resisted.

Capacity-achieving simple decoder
Two kinds of tracing algorithm can be distinguished: (i) simple decoders, which assign a score to single users independent of the watermarks of other users, and (ii) joint decoders [11][12][13], which assign scores to sets of users and are typically more powerful but also require more computational resources. Efficient joint decoders typically employ a simple decoder as a bootstrapping step.
The performance of a traitor tracing code is often measured by looking at the sufficient code length as a function of the coalition size c 0 to be resisted and the imposed low error rate. Equivalently, one can look at the fingerprinting rate, which is defined as the fraction log q n , where q is the size of the alphabet and n is the number of users. The numerator corresponds to the number of q-ary symbols needed to point out one of the n users; the denominator is the number of symbols used to convey this 'message.' Hence, the fingerprinting rate has a natural interpretation as the fraction of codeword symbols that actually encodes the 'message,' i.e., the identifying information that allows for tracing. The fingerprinting rate is a figure of merit that can be used to fairly compare codes which have different alphabet sizes. The fingerprinting capacity, which can be computed information-theoretically, is an upper bound on the fingerprinting rate that can be achieved against colluders who employ an optimal strategy against the tracing scheme. It was found by Boesten and Škorić [19] that the asymptotic a capacity is given by Huang and Moulin [20] found the location of the corresponding asymptotic saddlepoint: the strongest attack is the so-called interleaving attack, and the best bias distribution is the Dirichlet distribution with concentration parameter one half. (See Section 2.) For the colluders as well as the tracer, it is bad to depart from the saddlepoint. If the colluders move away from it, the tracer can achieve a higher fingerprinting rate; if the tracer moves away, the colluders can launch a stronger attack which reduces the rate.
Oosterwijk et al. [21] devised a simple decoder that reaches asymptotic capacity. The possibility of such an achievement was foreseen in [20], where it was shown that the simple decoder capacity becomes equal to the joint decoder capacity as c 0 goes to infinity.

Contributions and outline
In this paper we analyze the performance of the capacity-achieving simple decoder of [21] in the Restricted Digit Model: • Following the approach of [22], we use Bernstein's inequality and Bennett's inequality to upper bound the false-positive and false-negative error probability, respectively. From these bounds, we derive conditions on the code parameters (code length, cutoff, threshold) such that the error probabilities are sufficiently low.
• We determine the asymptotics of the sufficient code length in the direct vicinity of the saddlepoint.
• We find that the optimal choice for the cutoff τ is given by τ ∝ c −γ 0 , with γ slightly larger than one half. With this choice, the code length approaches its saddlepoint value with a correction term of order c γ−1 . Thus, convergence to the limit is faster than in the case of the binary Tardos score, where the correction is of order c −1/3 0 • Our analysis yields a recipe for placing the accusation threshold as a function of the innocent user score variance. This differs from the case of the Tardos score function [1,16], where the threshold is fixed.
In Section 2 we briefly review bias-based traitor tracing, the asymptotic saddlepoint, and the asymptoticcapacity-achieving score function. We also list the inequalities of Bernstein and Bennett. In Section 3 we study the statistical properties of an innocent user's score and the coalition's collective score. In Section 4 we derive the bounds on the error rates and the sufficient conditions on the code parameters. The asymptotics of the sufficient code length are treated in Section 5.

Notation
The number of users is denoted as n, and the code length (the number of positions in the content) as . We define [n] = {1, . . . , n}. The alphabet is Q, with size |Q| = q. The symbols in the alphabet have no natural ordering. The bias in position i is denoted as p (i) . The bias is a q-dimensional vector, with components p The parameter τ 1 is called the cutoff. For each i the bias satisfies |p (i) | = 1, where | · · · | denotes the 1-norm, i.e., α∈Q p (i) α = 1. We will often use multi-index notation: for a scalar z, the notation p z stands for α∈Q p z α ; for a vector m, the notation p m stands for α∈Q p mα α . We introduce the q-component vector 1 q = (1, 1, . . . , 1). The notation δ xy stands for the Kronecker delta.

Code generation
The bias vectors p (i) are drawn independently from a (truncated) Dirichlet distribution F with concentration parameter κ > 0, The δ in the integral is a Dirac delta function; it ensures that the condition |p| = 1 is enforced. The τ is called the cutoff parameter. Note that p α ∈ [τ, 1 − (q − 1)τ ]. Therefore, τ ≤ 1/q must hold, for otherwise the interval is empty (and we would get |p| > 1).
For τ = 0 the normalization constant (3) evaluates to a generalized beta function. Let z ∈ (0, ∞) q be a vector; then the beta function B(z) is defined as In the asymptotic saddlepoint, it holds that τ = 0 and κ = 1/2. For large but finite c 0 , the saddlepoint lies close to the asymptotic saddlepoint, but it is not known exactly where. It is known that for finite c 0 , the optimal bias distribution is a discrete distribution [8,10,23], with a number of discrete p α values proportional to c 0 . In spite of this, we will use the continuous probability density (2). Our motivation is that we only investigate asymptotics. The cutoff τ will depend on c 0 .
The code word assigned to user j is denoted as a row vector X j = (X j1 , . . . , X j ). The set of codewords is arranged in a code matrix X. The elements of the code matrix are independently generated according to the biases p (1) , . . . , p ( ) as follows:

Collusion attack
The coalition is a subset C ⊂ [n] of users, with size |C| = c. We explicitly make the distinction between the actual coalition size c and the parameter c 0 in the code construction, which is the maximum coalition size that can be resisted. The colluders see a submatrix X C of X. The symbol 'tallies' are defined as follows: In words, m α is the number of colluders that received symbol α in position i. Based on X C , the colluders produce an output y = (y 1 , . . . , y ). For our analysis we adopt the Restricted Digit Model as the attack model: for any i ∈ [ ], the output y i is only allowed to be a symbol that the colluders have observed in position i. The strategy for choosing an output is allowed to be probabilistic. We adopt a number of frequently made assumptions about the attack strategy: 1. Symbol symmetry. The strategy is invariant under permutation of the alphabet for each position independently. This assumption is motivated by the lack of a natural ordering of the alphabet.
2. Colluder symmetry. The strategy is invariant under permutation of the colluders. (In other words, the colluders equally share the risk.) This assumption is motivated by the fact that breaking colluder symmetry will make it easier for the tracer to find at least one colluder.
3. Position symmetry. The same strategy is applied in each position i ∈ [ ], and it does not depend on any X jk values with k = i. Motivation: asymptotically the optimal attack must be positionsymmetric [24].
When assumptions 2 and 3 hold, the strategy can be parametrized by a set of probabilities that depend only on the 'local' tallies: in position i, the probability of outputting symbol y i is a function of only m (i) . Omitting the position index, this is denoted as Furthermore, if assumption 1 holds as well, it is possible [6] to re-parametrize this as Ψ b (x) = θ y|m for {m y = b, and m without the y component is x}.
In other words, Ψ b (x) is the coalition's probability of outputting a symbol given that it has tally b and that the other tallies are x. The probability Ψ b (x) is invariant under permutation of x.

Simple decoder
The tracer notices the pirated copy with watermark sequence y 'in the wild.' Based on y and X, he tries to find at least one colluder. The asymptotic-capacity-achieving simple decoder of [21] works as follows: for each user j ∈ [n], a score Note that we normalized the function h differently from [21], by a factor √ q − 1, for notational brevity.
The score function (7) has the special property of being 'strongly centered': for any p and y (we are omitting the position index), the expected score of an innocent user is zero.
The collective score of the coalition is written as S C , The tracer makes a list L of 'suspicious' users, whose score exceeds a threshold Z, Whereas the Tardos scheme uses a fixed threshold, the score function h leads to a more complicated scheme where Z must be chosen as a function of the biases and the observed tallies and colluder outputs (see Section 3.1).

Measuring the performance
Two types of error can occur: a false-positive, with P FP defined as the probability that a fixed innocent user gets added to L, and a false-negative, with P FN defined as the probability that none of the colluders is found: The tracer demands that P FP ≤ ε 1 and P FN ≤ ε 2 , where ε 1 and ε 2 are constants, typically with ε 1 ε 2 .
The code length and threshold Z are often parametrized as This parametrization is motivated by the fact that asymptotically, for the Tardos code, A and B can be considered as constants. The relationship between the code length parametrization (12) and the fingerprinting rate is as follows. The rate is R = (log q n)/ = (ln n)/(Ac 2 0 ln q ln ε −1 1 ). Let η = Pr[L \ C = ∅], i.e., the probability that at least one innocent user ends up in the list L. The η is a fixed small number (e.g., 10 −6 ) that does not depend on n. It can be shown (Lemma 6 in [22]) for n 1, c n that ε 1 ≈ η/n. Then, ln ε −1 1 ≈ ln n − ln η ≈ ln n. (In the last approximation, we used that η is fixed.) Asymptotically, the rate satisfies R ∼ 1/(Ac 2 0 ln q). Definition 1. The variance of an innocent user's score and the average and variance of the coalition score are written asσ Here E stands for the expectation over all the probabilistic degrees of freedom: the biases p (i) , the code matrix X, and the coalition output y. (The 'tilde' notation indicates that there is an average over positions.) Note thatμ inn = 0, as shown in (8).
Remark If assumption 3 holds (position symmetry, Section 2.1.3) then in Definition 1 the average over the positions is not necessary; in every position E[· · · ] has the same value. In this paper, we introduce a rescaled version (β) of the threshold parameter B, It will turn out that it is more natural to use the quantity β than B.
Asymptotically, the first and second moments completely determine the shape of the probability distribution of the score, for an innocent user as well as for the coalition score. (The distribution becomes Gaussian in accordance with the central limit theorem.) It was found [7] that the code length parameter (and hence the fingerprinting rate) then depends onμ andσ inn as follows: In the asymptotic saddlepoint, the tracer uses the bias distribution (2) with τ = 0, while the coalition strategy is the interleaving attack, θ y|m = m y /c. In the asymptotic saddlepoint, it holds [21] that µ 2 /σ 2 inn = q − 1.

Computing expectations
Following the previous work [6,16,22], we define (conditional) expectations as shown below. We omit the position index and write x as shorthand for X ji for a fixed innocent user j / ∈ C.
Here P 1 (b) is a marginal probability for a single fixed symbol to have tally b. The quantity K b is the probability, given that a certain symbol has tally b, for the colluders to output that symbol; i.e., for arbitrary fixed α, we have K b = Pr[y = α|m α = b]. The sum rule b P 1 (b)K b = 1/q holds [6], since the overall probability of outputting y = α is 1/q.

Lemma 3 (weaker form of Bennett's inequality).
Let b > 0 be a constant. Let Y 1 , . . . , Y be independent zero-mean random variables, with Proof. We substitute Property 1 in Lemma 2. This is allowed since the argument of ξ is positive.

Statistics of the innocent score and coalition score
We study the moments of the innocent score and coalition score in two cases: (i) interleaving attack and arbitrary bias distribution and (ii) the bias distribution is the Dirichlet distribution with τ = 0 and arbitrary concentration parameter κ; the attack is arbitrary.
These two scenarios represent two different ways of departing from the asymptotic saddlepoint. In the first one, the bias distribution is varied. In the second one, not only the attack is varied but also a limited change of the bias distribution is allowed (κ).
The results of this section do not all contribute directly to the analysis of the sufficient code length in Section 5, but they are important in their own right since they elucidate how the score moments behave in a variety of circumstances.

General result for the moments
We investigate the first and second moments of an innocent user's score and of the coalition score. We begin with a general result for position-symmetric colluder strategies. Then, we look more specifically at the interleaving attack.

Lemma 4. If the coalition is employing a position-symmetric strategy, theñ
Proof. We start from Definition 1. In all three definitions, the summation over i merely yields a factor which cancels against the factor 1/ in front of the summation. Thus, forσ 2 inn we can write, for arbitrary index i, and recalling thatμ inn = 0,σ 2 inn = E(S (i) The results forμ andσ follow directly from the fact that S Note that Lemma 4 allows the tracer to obtain an estimate of the score moments: he can replace the E by an empirical average over the codeword positions.

The case of the interleaving attack Lemma 5. If the coalition is using the interleaving attack, theñ
where α ∈ Q is arbitrary.
Proof. For the interleaving attack, we have . We will make use of the binomial properties Remark 1. Part of Lemma 5 (μ andσ inn ) was already done in [21]. We show the proof again because of our modified normalization of the score function.
Remark 2. The result forμ Int and (σ 2 inn ) Int does not depend on the bias distribution F , butσ Int does. Remark 3. In the large-c limit, the variance of the coalition score tends to be large due to the c(q − 1) term as well as the expression E[1/p α ] which blows up when τ becomes small.
Let the coalition use a strategy that is colluder-symmetric and position-symmetric. Then the quantitiesμ andσ inn can be written as Furthermore, if the colluder strategy is also symbol-symmetric, theñ    1 and κ ∈ (0, 1). Let the coalition use a strategy that is colluder-symmetric and position-symmetric. Then, both quantitiesμ andσ inn are maximized by the minority voting attack and minimized by the majority voting attack.
Proof. For c 1, we can use the τ = 0 approximation forμ andσ inn , i.e., Lemma 6. In (35) and (36), the θ y|m in the y summation multiplies a decreasing function of m y . Hence, the summand is maximized by outputting a symbol y with tally m y as small as possible (but nonzero because of the marking assumption) and, vice versa, minimized by outputting the symbol with the largest tally.
Theorem 1 gives insight into the trade-offs that the colluders have to deal with. They want to minimizẽ µ and to maximizeσ inn , since this leads to high error rates. However, the strategy that optimizesμ for them is the worst possible strategy regardingσ inn and vice versa. The interleaving attack at the saddlepoint is 'in the middle' between minority voting and majority voting.
Remark It is possible to obtain a tighter upper bound by treating the m y = c term separately in (35),(36), since then θ y|m = 1. However, the improvement of the tightness is minimal.

Bounding the error probabilities
We use Bernstein's inequality and Bennett's inequality to upper bound the false-positive and falsenegative error probability, respectively.

Bounding the false-positive probability
Theorem 2. Let q ≥ 2. Let the coalition use any attack strategy. Then the false-positive probability for a fixed innocent user can be bounded as (42) Proof. For any coalition strategy, even one that breaks the position symmetry, the single-position scores S (i) j for the innocent user are mutually independent [1]. Hence, we are allowed to use Bernstein's inequality. In Lemma 1 we set U i = S (i) j for the innocent user. This is allowed since S (i) j has zero expectation value. We have In the last equality, we used τ ≤ 1/q (see Section 2.1.2). Thus, we are allowed to set a = 1/τ in Lemma 1. Furthermore, we note that by definition E[U 2 i ] =σ 2 inn for all i. Lemma 1 then gives Substituting a = 1/τ , = Ac 2 0 ln 1 ε 1 and Z = βσ inn c 0 ln 1 ε 1 finish the proof.
Remark In (42), we see that the bound on P FP is a decreasing function of the product c 0 τ . Hence, it is advantageous to set τ such that c 0 τ 1.

Corollary 1.
Let q ≥ 2 and τ ≤ 1/2. Let the coalition use any attack strategy. Then, it holds that Proof. The proof follows directly from Theorem 2.
Then the false-negative probability can be bounded as Proof. We start from Because of the assumption that the collusion attack is position-symmetric, the random variables S (i) C are mutually independent. We are then allowed to use Bennett's inequality (we take the weaker form, Lemma 3), which we do with the following parameters: where the last equality is a consequence of the assumption (46). We can see that the T is positive from the assumptionμAc 0 −σ inn βc > 0.
Notice that at c c 0 Theorem 3 no longer applies, because the conditionμAc 0 −σ inn βc > 0 cannot be satisfied. In practical terms, this means that for c > c 0 , the FN probability is no longer under control, and the colluders may evade detection with high probability. Theorem 4. Let q ≥ 2. Let the coalition employ a position-symmetric strategy. Let 2 ≤ c ≤ c 0 . Let µA −σ inn β > 0. Let τ ≤ 2/(2 +μ). Then the false-negative probability can be bounded as Proof. We start from Theorem 3. Due to the conditions c ≤ c 0 andμA −σ inn β > 0, the conditioñ µAc 0 −σ inn βc > 0 in Theorem 3 holds. Due to c ≥ 2 and τ < 2/(2 +μ), the condition (46) holds. Since all the conditions are satisfied, we are allowed to apply Theorem 3. Finally, we make use of the fact that the expression (47) is an increasing function of c for c ≤ c 0 .

Asymptotics of the sufficient code length
The main aim of this paper is to determine the performance of the score system (7) at large but finite c 0 . The performance at 'c 0 = ∞' is known: the saddlepoint is given by the interleaving attack, combined with the κ = 1 2 Dirichlet distribution (with τ = 0) as the bias distribution; in this saddlepoint, the rate of the score system is equal to capacity. What we want to know is how the fingerprinting rate approaches capacity and how to optimally choose the cutoff τ as a function of c 0 .

Sufficient code length
We aim for an analysis in the (unknown!) large-but-finite-c 0 saddlepoint: -The saddlepoint ('SP') of the mutual information minimax game [20] is close to the asymptotic saddlepoint. The unknown strategy θ SP is close to interleaving. The unknown bias distribution F SP (p) is some discrete distribution close to the Dirichlet distribution. We approximate F by the continuous Dirichlet distribution with cutoff τ because this is the only available constructive approach that we know of.
-A practical tracing system that uses the score function (7) cannot have a fixed threshold Z like the Tardos scheme, since the score statistics strongly depend on the colluder strategy. The threshold has to be chosen as a function of estimated values forσ inn andμ. (See Section 3.1 for the estimation method.) When attacking this tracing system, the best choice for the colluders is to use θ SP as their strategy, for otherwise they get caught faster. We will assume that the colluders use θ SP , which in the analysis leads to a 'fixed' threshold Z that only has meaning in this context.
-Hence, we analyze the tracing system consisting of the bias distribution (2) and the score system (7), when pitted against an unknown attack close to interleaving. Our starting point will be the 'sufficient' conditions given by Corollaries 1 and 2. We know thatμ SP = q − 1 − μ and (σ 2 inn ) SP = q − 1 + σ 2 inn , and we have to carefully deal with the corrections μ and σ 2 inn . On the other hand, theσ appears only in the logarithm in (51) and hence any corrections with respect to Lemma 5 can be neglected.
Corollary 1 and the conditionμA −σ inn β > 0 together define an interval for the sufficient code length parameter 'A suff ,'