University of Birmingham On the complexity of finding and counting solution-free sets of integers

. Given a linear equation L , a set A of integers is L -free if A does not contain any ‘non-trivial’ solutions to L . This notion incorporates many central topics in combinatorial number theory such as sum-free and progression-free sets. In this paper we initiate the study of (parameterised) complexity questions involving L -free sets of integers. The main questions we consider involve deciding whether a ﬁnite set of integers A has an L -free subset of a given size, and counting all such L -free subsets. We also raise a number of open problems.


Introduction
Sets of integers which do not contain any solutions to some linear equation have received a lot of attention in the field of combinatorial number theory. Two particularly well-studied examples are sum-free sets (sets avoiding solutions to the equation x + y = z) and progression-free sets (sets that do not contain any 3-term arithmetic progression x, y, z or equivalently avoid solutions to the equation x + z = 2y). A lot of effort has gone into determining the size of the largest solution-free subset of {1, . . . , n} and other sets of integers, and into computing (asymptotically) the number of (maximal) solution-free subsets of {1, . . . , n}.
In this paper we initiate the study of the computational complexity of problems involving solution-free subsets. We are primarily concerned with determining the size of the largest subset of an arbitrary set of integers A which avoids solutions to a specified linear equation L; in particular, we focus on sum-free and progression-free sets, but many of our results also generalise to larger families of linear equations. For suitable equations L, we demonstrate that the problem of deciding whether A contains a solution-free subset of size at least k is NP-complete (see Section 2); we further show that it is hard to approximate the size of the largest solution-free subset within a factor (1 + ) (see Section 3), or to determine for a constant c < 1 whether A contains a solutionfree subset of size at least c|A| (see Section 6). On the other hand, in Section 5 we see that the decision problem is fixed-parameter tractable when parameterised by either the cardinality of the desired solution-free set, or by the number of elements of A we can exclude from such a set. We also consider the complexity, with respect to various parameterisations, of counting the number of solution-free sets of a specified size (see Section 7): while there is clearly no polynomial-time algorithm in general, the problem is fixed-parameter tractable when parameterised by the number of elements we can exclude from A; we show that there is unlikely to be a fixed-parameter algorithm to solve the counting problem exactly when the size of the solution-free sets is taken as the parameter, but we give an efficient approximation algorithm for this setting. Finally, in Section 8 we consider all of these questions in a variant of the problem, where we specify that a given solution-free subset B ⊂ A must be included in any solution.
Many of our results are based on the fact that we can set up polynomial-time reductions in both directions between our problem and (different versions of) the well-known hitting set problem for Date: March 6, 2018. The first author is supported by a Personal Research Fellowship from the Royal Society of Edinburgh, funded by the Scottish Government, and the second author is supported by EPSRC grant EP/M016641/1. hypergraphs. In particular, in Section 2.1 we provide a construction that has several applications throughout the paper. In Section 4 we also derive some new lower-bounds on the size of the largest solution-free subset of an arbitrary set of integers for certain equations L, which may be of independent interest. Our approach here utilises a trick of Alon and Kleitman [3] which transfers the problem into the setting of solution-free sets in cyclic groups.
Our aim is to provide a thorough introduction to the study of (parameterised) complexity questions involving L-free sets of integers. As such, some of the results presented have straightforward proofs, such as the parameterised complexity results discussed in Section 5, whilst other proofs are more involved. However, even the simplest of our results lead to natural open questions. In Section 9 we collect together a number of open problems which we hope will stimulate further interest in the topic.
In the remainder of this section, we give some background on solution-free sets in Section 1.1 and review the relevant notions from the study of computational complexity in Section 1.2. In Section 1.3 we outline the main results of the paper.
1.1. Background on solution-free sets. Consider a fixed linear equation L of the form where a 1 , . . . , a , b ∈ Z. We say that L is homogeneous if b = 0. If i∈ [k] a i = b = 0 then we say that L is translation-invariant. (Here [k] denotes the set {1, . . . , k}.) Let L be translation-invariant. Then notice that (x, . . . , x) is a 'trivial' solution of (1) for any x. More generally, a solution (x 1 , . . . , x k ) to L is said to be trivial if there exists a partition P 1 , . . . , P of [k] so that: (i) x i = x j for every i, j in the same partition class P r ; (ii) For each r ∈ [ ], i∈Pr a i = 0. A set A of integers is L-free if A does not contain any non-trivial solutions to L. If the equation L is clear from the context, then we simply say A is solution-free.
1.1.1. Sum-free sets. A set S (of integers or elements of a group) is sum-free if there does not exist x, y, z in S such that x + y = z. The topic of sum-free sets has a rich history spanning a number of branches of mathematics. In 1916 Schur [42] proved that, given r ∈ N, if n is sufficiently large, then any r-colouring of [n] := {1, . . . , n} yields a monochromatic triple x, y, z such that x + y = z. (Equivalently, [n] cannot be partitioned into r sum-free sets.) This theorem was followed by other seminal related results such as van der Waerden's theorem [46], and ultimately led to the birth of arithmetic Ramsey theory.
Paul Erdős had a particular affinity towards sum-free sets. In 1965 he [20] proved one of the cornerstone results in the subject: every set of n non-zero integers A contains a sum-free subset of size at least n/3. Employing the probabilistic method, Alon and Kleitman [3] improved this bound to (n + 1)/3 and further, Kolountzakis [33] gave a polynomial time algorithm for constructing such a sum-free subset. Then, using a Fourier-analytical approach, Bourgain [11] further improved the bound to (n + 2)/3 in the case when A consists of positive integers. Erdős [20] also raised the question of determining upper bounds for this problem: recently Eberhard, Green and Manners [18] asymptotically resolved this important classical problem by proving that there is a set of positive integers A of size n such that A does not contain any sum-free subset of size greater than n/3+o(n). This result raises the question of whether one can decide efficiently whether a set A of non-negative integers contains a sum-free subset of size at least c|A| for some c > 1/3. As we shall see in Section 6, the answer is likely to be no. In Section 7 we consider the complexity, with respect to various parameterisations, of counting the number of sum-free sets of a specified size. A number of important questions concerning (counting) sum-free sets were raised in two papers of Cameron and Erdős [13,14]. In [13], Cameron and Erdős conjectured that there are Θ(2 n/2 ) sum-free subsets of [n]. Here, the lower bound follows by observing that the largest sum-free subset of [n] has size n/2 ; this is attained by the set of odds in [n] and by { n/2 + 1, . . . , n}. Then, for example, by taking all subsets of [n] containing only odd numbers one obtains at least 2 n/2 sum-free subsets of [n]. After receiving much attention, the Cameron-Erdős conjecture was proven independently by Green [27] and Sapozhenko [40]. Given a set A of integers we say S ⊆ A is a maximal sum-free subset of A if S is sum-free and it is not properly contained in another sum-free subset of A. Cameron and Erdős [14] raised the question of how many maximal sum-free subsets there are in [n]. Very recently, this question has been resolved via a combinatorial approach by Balogh, Liu, Sharifzadeh and Treglown [7,8].
Sum-free sets have also received significant attention with respect to groups. One highlight in this direction is work of Diananda and Yap [15] and Green and Ruzsa [28] that determines the size of the largest sum-free subset for every finite abelian group. In each case the largest sum-free set has size linear in the size of the abelian group. Another striking result in the area follows from Gowers' work on quasirandom groups. Indeed, Gowers [31] proved that there are non-abelian groups for which the largest sum-free subset has sublinear size, thereby answering a question of Babai and Sós [5]. See the survey of Tao and Vu [45] for a discussion on further problems concerning sum-free sets in groups.

1.1.2.
Progression-free sets. A set S (of integers or elements of a group) is progression-free if there does not exist distinct x, y, z in S such that x + y = 2z. The study of progression-free sets has focused on similar questions to those relating to sum-free sets.
Unlike in the case of sum-free sets, one cannot ensure that every finite set of non-zero integers contains a progression-free subset of linear size. Indeed, a classical result of Roth [36] implies that the largest progression-free subset of [n] has size o(n). This has led to much interest in determining good bounds on the size of such a subset of [n]. See [10,19,29,39] for the state-of-the-art lower and upper bounds for this problem.
Roth's theorem has been generalised in various directions; most famously via Szemerédi's theorem [44] which ensures that, if n is sufficiently large, every subset of [n] of linear size contains arithmetic progressions of arbitrary length. Analogues of Roth's theorem have also been considered for finite abelian groups; see, for example, [12,24,34].
As in the case of sum-free sets, it is also natural to ask for the number of progression-free subsets of [n]. More generally, Cameron and Erdős [13] raised the question of how many subsets of [n] do not contain an arithmetic progression of length k. Significant progress on the problem has recently been made in [6,9,41].
We remark that there has also been much work on L-free sets other than the cases of sum-free and progression-free sets; see for example [32,37,38].
1.2. Computational complexity. In this paper we are concerned with determining which problems related to solution-free sets of integers are computationally tractable. In the first instance, we seek to classify decision problems as either belonging to the class P (i.e. being solvable in polynomial time) or being NP-hard and so unlikely to admit polynomial-time algorithms. For further background on computational complexity, the classes P and NP, and polynomial-time reductions we refer the reader to [26].
When dealing with sets of positive integers as input, it should be noted that the amount of space required to represent the input depends both on the cardinality of the set and on the magnitude of the numbers in the set. Given any finite set A ⊆ Z, we write max(A) and min(A) for the elements of A whose values are maximum and minimum respectively; we further define max * (A) := max{|a| : a ∈ A} and min * (A) := min{|a| : a ∈ A}. We write size(A) for the number of bits required to represent A, and note that there exist positive constants c 1 and c 2 such that c 1 max{|A| log(min * (A)), log(max * (A))} ≤ size(A) ≤ c 2 |A| log(max * (A)).
We therefore consider a problem involving the set A to belong to P if it can be solved by an algorithm whose running time is bounded by a polynomial function of size(A); note that this is true if and only if the running time is bounded by a polynomial function of |A| log(max * (A)). If A = {a 1 , . . . , a n }, we will assume that A is stored in such a way that, given i, we can read the element a i in time O(log(|a i |)).
There are two basic operations which we will need to consider in almost all of the algorithms and reductions discussed in this paper. First of all, we often need to determine whether a given -tuple (x 1 , . . . , x ) is a solution to the -variable linear equation L. Note that we assume throughout that and all coefficients in L are constants, but x 1 , . . . , x are taken to be part of the input; thus, allowing for the time required to carry out the necessary arithmetic operations, we can certainly determine whether (x 1 , . . . , x ) is a solution in time O(log(max 1≤i≤ |x i |)).
Secondly, we will in many cases need to determine whether a given set A ⊆ Z is L-free. We can do this in a naive way by considering all possible -tuples, and checking for each one whether it is a solution. Since there are |A| possible -tuples, we see by the reasoning above that we can complete this procedure in time O |A| · log 2 (max * (A)) . Note that, as is a constant, this is a polynomial-time algorithm in terms of size(A).
In this paper, we will also discuss the parameterised complexity of various decision problems involving solution-free sets. Parameterised complexity provides a multivariate framework for the analysis of hard problems: if a problem is known to be NP-hard, so that we expect the runningtime of any algorithm to depend exponentially on some aspect of the input, we can seek to restrict this exponential blow-up to one or more parameters of the problem rather than the total input size. This has the potential to provide an efficient solution to the problem if the parameter(s) in question are much smaller than the total input size. A parameterised problem with total input size n and parameter k is considered to be tractable if it can be solved by a so-called fpt-algorithm, an algorithm whose running time is bounded by f (k) · n O(1) , where f can be any computable function. Such problems are said to belong to the complexity class FPT. The primary method for showing that a problem is unlikely to belong to FPT is to show that it is hard for some class W [t] (where t ≥ 1) in the W-heirarchy (see [23] for a formal definition of these classes). When reducing one parameterised problem to another in order to demonstrate the hardness of a parameterised problem, we have to be a little more careful than with standard NP-hardness reductions: as well as making sure that we can construct the new problem instance efficiently, we also need to ensure that the parameter value in the new problem depends only on the parameter value in the original problem.
Let Σ be a finite alphabet. We define a parameterised decision problem to be a pair (Π, κ) where Π : Σ * → {YES, NO} is a function and κ : Σ * → N is a parameterisation (a polynomial-time computable mapping). Then, an fpt-reduction from (Π, κ) to (Π , κ ) is an algorithm A such that (1) A is an fpt-algorithm; (2) given a yes-instance of Π as input, A outputs a yes-instance of Π , and given a no-instance of Π as input, A outputs a no-instance of Π ; (3) there is a computable function g such that, if I is the input to A, then κ (A(I)) ≤ g (κ(I)).
For further background on the theory of parameterised complexity we refer the reader to [17,23]. We will also consider parameterised counting problems in Section 7, and relevant notions will be introduced at the start of this section.
1.3. Summary of results. We conclude the introduction by informally stating the main results of the paper, just in the cases of sum-free and progression-free sets: • We prove that the problem of determining whether a finite set of integers contains a sum-free (or progression-free) subset of a given size is NP-complete (see Theorem 4). • We prove that the problem of determining whether a finite set of integers A contains a sum-free subset covering a cth proportion of A is NP-complete for any fixed c > 1/3 (see Theorem 18). This complements the aforementioned result of Erdős [20] that ensures any such set A contains a sum-free subset of size at least |A|/3. • In contrast, the problem of determining whether a finite set of integers A contains a progression-free subset covering a cth proportion of A is NP-complete for any fixed c > 0 (see Theorem 19). • In Theorem 8 we show that it is unlikely one can efficiently approximate the size of the largest sum-free (and progression-free) subset in a finite set of integers. • It is essentially trivial to see that the problem of determining whether a finite set of integers A contains a sum-free (or progression-free) subset of a given size k is in FPT, when parameterised by k (see Proposition 12). Using [23, Theorem 1.14], it is straightforward to conclude the same assertion holds if instead we parameterise by |A| − k. • A result of Thurley [43] easily implies that the analogous problem, for counting the number of sum-free (or progression-free) subsets of A of a given size k, belongs to FPT when parameterised by |A| − k (see Theorem 20) . • Perhaps surprisingly, it is unlikely that this problem is in FPT when instead we parameterise by k (see Theorem 23).

The decision problem
In this section, we consider the following problem, where L is any fixed linear equation.

L-Free Subset
Input: A finite set A ⊆ Z and k ∈ N. Question: Does there exist an L-free subset A ⊆ A such that |A | = k?
We show that this problem is closely related to the well-known Hitting Set problem, and exploit this relationship to show that the problem is NP-complete for a large family of equations L, including those defining both sum-free and progression-free sets. On the other hand, we see that the problem is polynomially solvable whenever L is a linear equation with only two variables.
We begin in Section 2.1 by showing how to construct an instance of L-Free Subset that corresponds in a specific way to a given hypergraph; we will make use of this same construction to prove many results in later sections of the paper. We exploit this construction to give an NP-completeness proof in Section 2.2, before considering the two-variable case in 2.3.

A useful construction.
In this section we describe the main construction we will exploit throughout the paper, and prove its key properties. Recall that an -uniform hypergraph is a hypergraph in which every edge has size exactly . Given any set X and ∈ N, we write X for the set of ordered -tuples whose elements belong to X. Lemma 1. Let L be a linear equation a 1 x 1 + · · · + a x = by where each a i ∈ N and b ∈ N are fixed, and let H = (V, E) be an -uniform hypergraph. Then we can construct in polynomial time a set A ⊆ N with the following properties: (1) A is the disjoint union of two sets A and A , where |A | = |V | and |A | = |E|; (2) there exist bijections φ V : A → V and φ E : A → E; (3) for every (x 1 , . . . , x , y) ∈ A +1 , we have that (x 1 , . . . , x , y) is a non-trivial solution to L if and only if x 1 , . . . , x ∈ A , y ∈ A and {φ V (x 1 ), . . . , φ V (x )} = φ E (y); (4) log(max(A)) = O(|V |).
Proof. Write a := max 1≤j≤ a j and set d : , and note that φ V is a welldefined bijection. We define φ E : to see that φ E is also a well-defined bijection it suffices to observe that, by the uniqueness of base-d representation of natural numbers, there is a unique way to write any y ∈ A in the form a 1 d i 1 + a 2 d i 2 + · · · + a d i . It follows from the bijectivity of φ V and φ E that we have defined A so that for each vertex v i ∈ V there is a unique number bd i ∈ A associated with it, and defined A so that for each edge Notice that conditions (1), (2) and (4) of the lemma immediately hold. To prove (3) note that it suffices to prove the following claim.
Claim. The only non-trivial solutions (x 1 , x 2 , . . . , x , y) to L in A are such that each x j = bd i j ∈ A for some i 1 < i 2 < · · · < i and y = ( To prove the claim it is helpful to consider the natural numbers working in base d. We will use the coordinate notation [c 0 , c 1 , c 2 , . . . ] to denote the natural number c 0 d 0 + c 1 d 1 + c 2 d 2 + . . . . So with respect to this notation, each bd i ∈ A has a zero in each coordinate except the ith coordinate, which takes value b. Each (a 1 d i 1 + a 2 d i 2 + · · · + a d i ) ∈ A takes value a j in its i j th coordinate, and zero otherwise.
Suppose we have x 1 , . . . , x t ∈ A for some t ∈ N. Define coord(x 1 , . . . , x t ) to be the set of all integers i ≥ 0 such that for at least one of the elements x j in {x 1 , . . . , x t }, the ith coordinate of x j is non-zero. Note that (since d is sufficiently large compared with the a i and b) we have |coord(x)| = 1 for all x ∈ A and |coord(y)| = for all y ∈ A . Moreover, for any x, x ∈ A, coord(x) = coord(x ) if and only if x = x .
Suppose (x 1 , . . . , x , y) is a non-trivial solution to L in A. (Note the choice of L ensures the only trivial solutions to L are such that x 1 = · · · = x = y.) Crucially we defined d to be large with respect to the a i and b. Thus, coord(x 1 , . . . , x ) = coord(a 1 x 1 + a 2 x 2 + · · · + a x ). That is, in coordinate notation, the coordinates of (a 1 x 1 + a 2 x 2 + · · · + a x ) that are non-zero are precisely those coordinates that are non-zero in at least one of the x j . So this gives us that coord(x 1 , . . . , x ) = coord(by).
If y ∈ A then |coord(by)| = 1. Note in this case we obtain a contradiction if |coord(x 1 , . . . , x )| ≥ 2. So it must be the case that x 1 = · · · = x and x 1 ∈ A . This means that there is some i ∈ [n] such that x 1 = · · · = x = bd i ; y = bd i ; and further a 1 + · · · + a = b. Thus (x 1 , . . . , x , y) is a trivial solution to L, a contradiction to our assumption. Therefore Suppose that all of the x j lie in A . Then since coord(x 1 , . . . , x ) = coord(by), we must have that x 1 = · · · = x = y. This implies a 1 + · · · + a = b and hence (x 1 , . . . , x , y) is a trivial solution to L, a contradiction.
Altogether this implies that each x j ∈ A . The claim, and therefore lemma, now immediately follows.
A relationship between independent sets in H and L-free subsets of A now follows easily.
Corollary 2. Let L be a linear equation a 1 x 1 + · · · + a x = by where each a i ∈ N and b ∈ N are fixed, and let H = (V, E) be an -uniform hypergraph. Let A and A be as in Lemma 1 on input H and L. Then, for any k ∈ N, there is a one-to-one correspondence between independent sets of H of cardinality k and the L-free subsets of A of cardinality |A | + k which contain all the elements of A .
Proof. The corollary follows immediately from Lemma 1. Indeed, let φ V and φ E be as in Lemma 1. Given an independent set I of H, note that φ −1 V (I) ∪ A is an L-free subset of A of size |I| + |A |; by bijectivity of φ V , the L-free subsets corresponding to independent sets I 1 = I 2 are distinct. Further, given any L-free subset S ∪ A of A of size |S| + |A |, we have that φ V (S) is an independent set in H of size |S|; again, by bijectivity of φ V , we obtain a unique independent set for each such L-free subset.
Finally, if we are only interested in the existence of independent sets, we can drop one of the conditions on the L-free subsets.
Corollary 3. Let L be a linear equation a 1 x 1 + · · · + a x = by where each a i ∈ N and b ∈ N are fixed, and let H = (V, E) be an -uniform hypergraph. Let A and A be as in Lemma 1 on input H and L. Then, for any k ∈ N, H contains an independent set of cardinality k if and only if A contains an L-free subset of cardinality |A | + k.
Proof. By Corollary 2, it suffices to show that, if A contains an L-free subset of cardinality |A | + k, then in fact A contains such a subset which includes all elements of A . To see this, let A 1 be an L-free subset of A of size |A | + k which does not contain all elements of A ; we will show how to construct an L-free subset of equal or greater size which does have this additional property.
Suppose y ∈ A such that y ∈ A 1 . By Lemma 1(3) there is a unique choice of x 1 , . . . , x ∈ A such that (x 1 , . . . , x , y) is a non-trivial solution to L in A. If one of these x j does not lie in A 1 we add y to A 1 without creating a solution to L. Otherwise, arbitrarily remove one of the x j from A 1 and replace it with y. Repeating this process, we obtain an L-free subset which contains A and is at least as large as A 1 .
2.2. The case of three or more variables. The next result shows that for a range of linear equations, L-Free Subset is NP-complete. For example, the result includes the cases when L is x + y = z (i.e. sum-free sets) and x + y = 2z (i.e. progression-free sets). To prove that L-Free Subset is NP-hard we will use a reduction from the following NP-complete problem [26]. Recall that a hitting set S in a hypergraph H is a collection of vertices such that every edge in H contains at least one vertex from S.
-Hitting Set Input: An -uniform hypergraph H and s ∈ N. Question: Does H contain a hitting set of size s? Theorem 4. Let L be a linear equation of the form a 1 x 1 + · · · + a x = by where each a i ∈ N and b ∈ N are fixed and ≥ 2. Then L-Free Subset is NP-complete.
Proof. Recall from the discussion in Section 1.2 that we can determine in time polynomial in size(A) whether a set A is L-free, so L-Free Subset is in NP. To show that the problem is NP-complete, we give a reduction from -Hitting Set.
Let (H, s) be an instance of -Hitting Set. We construct A and A ⊆ A as in Lemma 1 under input H and L (taking time polynomial in size(A)). It suffices to show that H has a hitting set of size s if and only if A contains an L-free subset of size k := |A| − s.
Observe that H has a hitting set of size s if and only if it has an independent set of size |H| − s; by Corollary 3, this holds if and only if A has an L-free subset of size |A | + |H| − s = |A| − s = k.
Note that since Lemma 1 outputs a set A of natural numbers, we have actually proved the following stronger result.
Theorem 5. Let L be a linear equation of the form a 1 x 1 + · · · + a x = by where each a i ∈ N and b ∈ N are fixed and ≥ 2. Then L-Free Subset is NP-complete, even if the input set A is a subset of N.

2.3.
The two variable case. For any linear equation L in two variables, it is straightforward to see that L-Free Subset is in P. Our strategy here is to reduce to the problem of finding an independent set (rather than reducing from this problem as in the previous section) and to note that the graph we create must have a very specific structure. Proof. Let A = {a 1 , . . . , a n } ⊆ Z, and let k ∈ N. We now construct G to be the graph with vertex set A, where a i a j ∈ E(G) precisely when (a i , a j ) is a non-trivial solution to L. Note that we can construct G in time bounded by a polynomial function of size(A). The construction of G ensures that a set A ⊆ A is L-free if and only if A is an independent set in G.
Notice that a vertex x could lie in a loop. However, in this case x is not adjacent to any other vertex in G. All other vertices in G have degree at most 2. Thus, G is a collection of vertex-disjoint paths, cycles, isolated vertices and loops. The largest independent set in both a path and an even cycle on t vertices is t/2 ; the largest independent set in an odd cycle on t vertices is t/2 . So in time O(n) we can determine the size of the largest independent set in G, and thus whether A contains an L-free subset of size k.

Approximating the size of the largest L-Free Subset
Thus far we have focused on decision problems involving solution-free sets ("Does the set A contain a solution-free set of a certain size?"), but it is also natural to consider a maximisation problem: "What is the size of the largest solution-free subset of A?" An efficient algorithm to answer the decision problem can clearly be used to solve the maximisation problem, as we can repeatedly run our decision algorithm with different target sizes; however, as we have demonstrated in Section 2 that in many cases such an algorithm is unlikely to exist, it makes sense to ask whether we can efficiently approximate the optimisation problem. We define the maximisation version of L-Free Subset formally as follows.
Maximum L-Free Subset Input: A finite set A ⊆ Z. Question: What is the cardinality of the largest L-free subset A ⊆ A?
Given any instance I of an optimisation problem, we denote by opt(I) the value of the optimal solution to I (so, for example, the cardinality of the largest solution-free subset). Given a constant ρ > 1, we say that an approximation algorithm for the maximisation problem has performance ratio ρ if, given any instance I of the problem, the algorithm will return a value x such that Note that there is a trivial approximation algorithm for Maximum Sum-Free Subset with performance ratio 3: if we always return |A|/3 then, as |A|/3 ≤ opt(A) ≤ |A|, we have 1 ≤ opt(A) |A|/3 ≤ 3 as required.
We might hope to improve on this to obtain, given arbitrary positive , an approximation algorithm for Maximum L-Free Subset with performance ratio 1 + . However, we will show in this section that in certain cases this is no easier than solving the problem exactly. Specifically, we show that for a large family of 3-variable linear equations (including those defining sum-free and progression-free sets), there is no polynomial-time approximation scheme unless P = NP.
A polynomial-time approximation scheme (PTAS) for a maximisation problem is an algorithm which, given any instance I of the problem and a constant > 0, returns, in polynomial-time, a value x such that Note that the exponent of the polynomial is allowed to depend on . The complexity class APX contains all optimisation problems (whose decision version belongs to NP) which can be approximated within some constant factor in polynomial time; this class includes problems which do not admit a PTAS unless P = NP, so one way to demonstrate that an optimisation problem is unlikely to admit a PTAS is to show that it is hard for the class APX. In order to show that a problem is APX-hard (and so does not admit a PTAS unless P = NP), it suffices to give a PTAS reduction from another APX-hard problem.
Definition. Let Π 1 and Π 2 be maximisation problems. A PTAS reduction from Π 1 to Π 2 consists of three polynomial-time computable functions f , g and α such that: (1) for any instance I 1 of Π 1 and any constant error parameter , f produces an instance We cannot immediately deduce results about the inapproximability of Maximum L-Free Subset from Corollary 3 together the inapproximability of Independent Set, as the cardinality of the largest L-free subset in A will in general be dominated by the cardinality of A . However, we can instead reduce from Max IS-3, the problem of finding the size of a maximum independent set in a graph of maximum degree 3, which was shown to be APX-hard by Alimonti and Kann [2]. For this reduction we imitate the approach of Froese, Janj, Nichterlein and Niedermeier [25], who obtained a result analogous to Lemma 1 when reducing 3-Hitting Set to the problem of finding a maximum subset of points in general position.
Corollary 3 implies that we have a polynomial-time reduction from 3-IS to L-Free Subset (for suitable L) in which |A | ≤ 3|V | 2 . Since it is also well-known that in any graph G on n vertices with maximum degree at most ∆, every maximal independent set has cardinality at least n

∆+1
(if the independent set is smaller than this, there must be some vertex which does not have any neighbour in the set and so can be added to the independent set), it follows that for every maximal independent set U in G we have |U | ≥ |V | 4 ≥ 1 6 |A |. Using this observation, we can now define a PTAS reduction from Max IS-3 to Maximum L-Free Subset for certain 3-variable equations.
Then there is a PTAS reduction from Max IS-3 to Maximum L-Free Subset.
Proof. We define the functions f , g and α as follows.
First, we let f be the function which, given an instance G of Max IS-3 (where G = (V, E)) and any > 0, outputs the set we know from Lemma 1 that we can construct this set in polynomial time.
Next suppose that B is an L-free subset in A. We can construct in polynomial time a set B, with | B| ≥ |B|, such that If B fails to satisfy the first condition, we can use the method of Corollary 3 to obtain a set with this property, and if the resulting set is not maximal we can add elements greedily until this condition is met. We now define g to be the function which, given an L-free set B ⊆ A and any > 0, outputs ). Finally, we define α to be the function → 1 + 7 . Let us denote by opt(G) the cardinality of the maximum independent set in G, and by opt(A) the cardinality of the largest L-free subset in A. Note that opt(A) = opt(G) + |E|. To complete the proof, it suffices to demonstrate that, whenever B is an L-free subset in A such that opt(A) |B| ≤ α( ) = 1 + 7 , we have opt(G) Since we know that |E| ≤ 3|V | 2 and, by our assumptions on maximality of B and hence I, we also know that |I| ≥ |V | 4 , it follows that |E| |I| ≤ 6. We can therefore conclude that opt(A) as required.
We now obtain our main inapproximability result as an immediate corollary.
Theorem 8. Let L be a linear equation of the form a 1 x 1 + a 2 x 2 = by, where a 1 , a 2 , b ∈ N are fixed. Then Maximum L-Free Subset is APX-hard.

L-free subsets of arbitrary sets of integers
In much of the rest of this paper, we prove complexity results which hold whenever we can guarantee that our input set A will contain a reasonably large L-free subset. We already know that this is the case for sum-free subsets (in which case an arbitrary input set A of non-zero integers must contain a sum-free subset of size at least (|A| + 1)/3); in this section we extend this result to a much larger family of linear equations, proving that, given any homogeneous non-translation-invariant linear equation L, every finite set of non-zero integers contains an L-free set of linear size. Note that a homogeneous linear equation L is non-translation-invariant if and only if it can be written in the form a 1 x 1 + · · · + a k x k = b 1 y 1 + · · · + b y for some fixed a i , b i ∈ N where a 1 + · · · + a k = b 1 + · · · + b .
For this we will use a trick of Alon and Kleitman [3] which transfers the problem into the setting of solution-free sets in cyclic groups. We also utilise the following simple observation.
Theorem 10. Consider a non-translation-invariant homogeneous linear equation L. There exists some λ = λ(L) > 0 such that, if n ∈ N is sufficiently large, then any set Z ⊆ Z \ {0} so that |Z| = n contains an L-free subset of size more than λn.
Proof. Suppose that L is of the form a 1 x 1 + · · · + a k x k = b 1 y 1 + · · · + b y for some fixed a i , b i ∈ N where a 1 +· · ·+a k = b 1 +· · ·+b . Observation 9 implies that there is some λ = λ (L) > 0 such that, if m ∈ N is sufficiently large, then [m ] contains an L-free subset of size at least λ m . We say a subset S of a group G is L-free if S contains no solutions to L. Set c := max{(a 1 +· · ·+a k ), (b 1 +· · ·+b )}.
Claim. There is some λ := λ /(2c) > 0 such that, if m ∈ N is sufficiently large, then Z m contains an L-free subset of size at least λm that does not contain the zero element. . , x k , y 1 , . . . , y as integers we have that a 1 x 1 + · · · + a k x k = b 1 y 1 + · · · + b y however, a 1 x 1 + · · · + a k x k ≡ b 1 y 1 + · · · + b y mod m. Thus the difference between a 1 x 1 + · · · + a k x k and b 1 y 1 + · · · + b y is at least m. This yields a contradiction since, by the definition of m , neither of these numbers is bigger than m. Thus, as [m ] contains an L-free subset of size at least λ m ≥ λm, Z m contains an L-free subset of size at least λm avoiding the zero element, proving the claim.
The rest of the proof modifies the argument presented in [4, Theorem 1.4.1] that shows every set of n non-zero integers contains a sum-free subset of size more than n/3. Let n ∈ N be sufficiently large and consider any set Z = {z 1 , . . . , z n } of non-zero integers. Let p be a prime so that p > 2 max(Z). Since n is sufficiently large and p > n, by the claim we have that Z p contains an L-free subset S so that |S| ≥ λp and additionally 0 ∈ S.
Choose an integer x uniformly at random from {1, 2, . . . , p − 1}, and define d 1 , . . . , d n by d i ≡ xz i mod p where 0 ≤ d i < p. For every fixed 1 ≤ i ≤ n, as x ranges over all numbers 1, 2, . . . , p − 1, then d i ranges over all non-zero elements of Z p . Therefore, P(d i ∈ S) = |S|/(p − 1) > λ. So the expected number of elements z i such that d i ∈ S is more than λn. Thus, there is some choice of x with 1 ≤ x < p and a subset Z ⊆ Z of size |Z | > λn such that xz i (mod p) ∈ S for all z i ∈ Z . Since S is L-free in Z p , and L is homogeneous, this implies Z is an L-free set of integers, as desired.
In the case when L is translation-invariant, Ruzsa [37] observed that the largest L-free subset of [n] has size o(n). So one cannot prove an analogue of Theorem 10 for such equations L.
The next result follows immediately from Theorem 10.
Theorem 11. Consider a non-translation-invariant homogeneous linear equation L. There exists some λ = λ(L) > 0 such that every finite set Z ⊆ Z \ {0} contains an L-free subset of size more than λ|Z|.
Note that is necessary to restrict our attention to sets Z ⊆ Z \ {0} here since Z := {0} does not contain a non-empty L-free subset.
It is natural to ask how large λ(L) can be in the previous theorem. Let C(L) denote the set of all positive reals κ so that Theorem 11 holds with κ playing the role of λ, and define C (L) analogously now with respect to Theorem 10. We claim that C(L) = C (L). It is immediate that C(L) ⊆ C (L). To see that there is no λ ∈ C (L) \ C(L) consider the following observation: Suppose Z ⊆ Z \ {0} is such that it does not contain an L-free subset of size more than λ|Z| for some λ > 0. Set z := |Z|. Then for every n ∈ N there is a set Z ⊆ Z \ {0} of size zn such that it does not contain an L-free subset of size more than λ|Z |. Indeed, writing cZ as shorthand for {cz : z ∈ Z}, one can choose Z to be the union of c 1 Z, . . . , c n Z where the c i s are positive integers chosen to ensure the sets c i Z are pairwise disjoint. (Notice we required that 0 ∈ Z to ensure this.) Define κ(L) := sup(C(L)).
Write L as a 1 x 1 + · · · + a k x k = b 1 y 1 + · · · + b y for some fixed a i , b i ∈ N where a 1 + · · · + a k > b 1 + · · · + b . We remark that it is easy to check in the statement of Theorem 10, and therefore Theorem 11, one can set where here we define c := max{(a 1 + · · · + a k ), (b 1 + · · · + b )}. That is, κ(L) ≥ 1 2c 1 − b 1 +···+b a 1 +···+a k . In the case when L is x + y = z we know that κ(L) = 1/3 and this supremum is attained. Indeed, recall that every set of n non-zero integers has a sum-free subset of size at least (n + 1)/3 [3] whilst there are sets of positive integers A of size n such that A does not contain any sum-free subset of size greater than n/3 + o(n) [18]. It would be interesting to determine κ(L) for other equations L.
Problem. Determine κ(L) for non-translation-invariant homogeneous linear equations L.

Parameterised complexity of the decision problem
Once one knows that a problem is NP-hard, there is interest in identifying the specific properties of a problem instance which contribute to the hardness; as mentioned in Section 1.2, one of the most natural ways to do this is to analyse the problem from the perspective of parameterised complexity, with the goal of establishing which parameterisations do or do not allow the design of fpt-algorithms. In this section we make two straightforward observations about the complexity of L-Free Subset with respect to two natural parameterisations, namely the number of elements in the sum-free subset (k) and the number of elements not in this subset (|A| − k).
First, consider parameterisation by k. Note that whenever L satisfies the conditions of Theorem 11 we know that there exists an explicit constant λ = λ(L) > 0 such that any finite set A ⊂ Z \ {0} contains an L-free subset of size at least λ|A|. Thus, if |A| is large enough compared with k we can immediately answer yes; otherwise |A| is bounded by a function of k and so we can solve the problem by brute force in time bounded by a function of k. This gives the following result. To deal with the dual parameterisation |A| − k, we use the fact that the following problem is known to belong to FPT [23, Theorem 1.14].
p-card-Hitting Set Input: A hypergraph G = (V, E) and s ∈ N. Parameter: s + d, where d = max e∈E |e|. Question: Does G contain a hitting set of cardinality s?
The strategy is then to reduce L-Free Subset to p-card-Hitting Set.
Lemma 13. Let L be any fixed linear equation with variables, and let A ⊆ Z be finite. Then we can construct, in time polynomial in size(A), a hypergraph G on |A| vertices in which every edge contains at most vertices, such that there is a one-to-one correspondence between L-free subsets of A of cardinality k and hitting sets in G of cardinality |A| − k.
Sketch proof. Suppose without loss of generality that L is of the form a 1 x 1 + · · · + a x = b, where a 1 , . . . , a , b ∈ Z. Let G be the hypergraph with vertex set A and edge set E := {{x 1 , . . . , x } : (x 1 , . . . , x ) is a non-trivial solution to L} .
Note that x 1 , . . . , x are not necessarily all distinct, so while every edge in E contains at most vertices, an edge may contain strictly fewer than vertices. There is then a one-to-one correspondence between hitting sets of cardinality k in G and L-free subsets of cardinality |A| − k in A.
The fixed parameter tractability of our problem with respect to the parameter |A|−k now follows immediately.

L-free subsets covering a given fraction of elements
We know, by Theorem 4, that there is unlikely to be a polynomial time algorithm to decide whether a set A has an L-free subset of size k, for arbitrary k ∈ N. It is therefore natural to ask whether we can efficiently solve a restricted version of the problem in which we want to determine whether a finite set A of (non-zero) integers contains an L-free subset that houses some fixed proportion of the elements of A. Given any linear equation L and 0 < ε < 1, we define the following problem.
ε-L-Free Subset Input: A finite set A ⊆ Z \ {0}. Question: Does there exist an L-free subset A ⊆ A such that |A | ≥ ε|A|?
In the case when L is x+y = z we refer to ε-L-Free Subset as ε-Sum-Free Subset. Note that ε-L-Free Subset concerns finite sets of non-zero integers A; thus, the definition of κ(L) (given in Section 4) immediately implies that ε-L-Free Subset is in P for all ε ≤ κ(L), as in this case every instance is a yes-instance.
Further, recall from Section 1.2 that, given any fixed linear equation L, we can decide in time polynomial in size(A ) whether a set A ⊆ Z is L-free, so -L-Free Subset clearly belongs to NP.
We will show in Section 6.2 that, for certain choices of L and , the -L-Free Subset problem is no easier than L-Free Subset. For this, we will actually restrict our attention to the case when we have input set A ⊆ N. In this case, we need to be able to add elements to A without creating any additional solutions; we prove results about this in Section 6.1.
6.1. Extending sets without creating additional solutions. In Section 6.2, and also later in Section 7, we will make use of the following lemma, which allows us to extend sets without creating additional solutions to an equation. Proof. Write m := max(A) and set τ := min{ c a+b , a+b c }. Define N ∈ N to be the smallest natural number so that τ N ≥ 2(a+b+c)m and N − τ N ≥ t. This choice of N means that τ (N −1) < 2(a + b + c)m or (N − 1) − τ (N − 1) < t. Thus, By Observation 9 and the choice of N , [N ] contains an L-free subset I so that and |I | = t − |A|.
Chebyshev's theorem implies that there is a prime p so that abcm < p < 2abcm. Set I := pI and let B := A ∪ I . Note that we can clearly determine N and hence construct I in time bounded by a polynomial function of size(A) and t. We can determine an appropriate value for p (and then construct I ) by exhaustively searching the specified interval and testing for primality in polynomial time (using the AKS test [1]). This set immediately satisfies (ii) and, since p > m, A and I are disjoint so (i) is satisfied. Moreover, it follows from (3) and the choice of p that max(B) ≤ max 2abctm 1−τ , 4m 2 abc(a+b+c) τ + 2abcm = O t (max(A)) 2 , so (iv) is satisfied.
To see that (iii) is satisfied, we first observe that there are no solutions to L in I . Since min(I ) > 2(a + b + c)m it is easy to check that there are no solutions to L in B which consist of two elements from A and one element from I . Suppose there is a solution to L in B which consists of two elements z 1 , z 2 from I and one element z 3 from A. Consider the case when az 1 + bz 2 = cz 3 (the other cases follow identically). Since every element of I is divisible by p we have that p divides cz 3 . So as c < p this implies p must divide z 3 . However, no element of A is divisible by p since max(A) = m < p, a contradiction. Hence B satisfies condition (iii). This completes the proof.
We can also prove an analogous result for equations L of the form ax + by = cz where a, b, c ∈ N are fixed and a + b = c. To do so, we will need the following fact.
Fact 16. Suppose L is a linear equation ax + by = cz where a, b, c ∈ N are fixed; a + b = c; and a ≤ b. Given any x 1 < x 2 < x 3 that form a solution (x, y, z) to L in N, we have that x 2 plays the role of z and cx 2 > ax 3 .
Proof. Consider any x 1 < x 2 < x 3 that form a solution (x, y, z) to L in N. Note that since a+b = c, we have cx 3 > max{(ax 1 + bx 2 ), (ax 2 + bx 1 )}. Thus x 3 cannot play the role of z. Further, x 2 must play the role of z. Indeed, otherwise x 1 plays the role of z and then we have ax + by > cz, a contradiction. Altogether this implies that cx 2 > ax 3 .
Lemma 17. Suppose L is a linear equation ax + by = cz where a, b, c ∈ N are fixed and a + b = c. Suppose A ⊆ N is a finite set and t ∈ N. Then there is a set B ⊆ N such that: Moreover, B can be computed in time polynomial in t and size(A).
Proof. Without loss of generality assume that a ≤ b. Note that (x, y, z) is a non-trivial solution to L if and only if (x, y, z) is a solution to L with x, y, z distinct.
Set m := max(A), and define A := {c i · 2m : i ∈ [t]}; we can clearly construct B in time polynomial in t and size(A). We claim that B := A ∪ A is our desired set. Certainly (i), (ii) and (iv) follow immediately.
We now prove (iii). Suppose x 1 < x 2 < x 3 form a solution (x, y, z) to L in A . Then by Fact 16 we must have that cx 2 > ax 3 . However, by definition of A , ax 3 ≥ x 3 ≥ cx 2 , a contradiction. So A does not contain any non-trivial solutions to L. The same argument shows that there are no non-trivial solutions to L in B which contain two elements from A and one element from A. Finally suppose x 1 < x 2 < x 3 form a solution (x, y, z) to L in B where x 1 , x 2 ∈ A and x 3 ∈ A . As before we must have that cx 2 > ax 3 . However, ax 3 > acm ≥ cx 2 by definition of A , a contradiction. This proves (iii). Lemma 17 will be applied in the next subsection to prove that for any equation L as in its statement, ε-L-Free Subset is NP-complete for any 0 < ε < 1.

6.2.
Hardness of ε-L-Free Subset. In this section we show that, in two specific cases, ε-L-Free Subset is NP-complete. We begin with the case of sum-free subsets. Note that if ε ≤ 1 3 then the problem is trivially in P as, by the aforementioned result of Erdős [20], the answer is always "yes"; also if ε = 1 then it suffices to check whether the input set is sum-free (which can be done in polynomial time). We now demonstrate that the problem is NP-complete for all other values of ε. Recall that, given a set X ⊆ N and y ∈ N, we write yX as shorthand for {yx : x ∈ X}.
Proof. Recall that ε-Sum-Free Subset belongs to NP. To show that the problem is NP-hard, we describe a reduction from Sum-Free Subset (restricted to inputs A ⊆ N), shown to be NP-hard in Theorem 5.
Suppose that (A, k) is an instance of Sum-Free Subset where A ⊆ N. We will define a set B ⊆ N such that B has a sum-free subset of size at least ε|B| if and only if A has a sum-free subset of size k. The construction of B depends on the value k.
First suppose that k ≤ ε|A|. Set so that d is the least positive integer such that ε(|A| + d) ≤ k + d and hence ε(|A| + d) = k + d.  [18] implies that there is a set S ⊆ N such that the largest sum-free subset of S has size precisely ε |S| where 1/3 < ε < ε. In particular, through an exhaustive search, one can construct such a set S. Crucially, S is independent of our input (A, k) (so size(S), |S| and max(S) are all fixed constants). Set The choice of the d i ensures the only solutions to x + y = z in A * are such that x, y, z ∈ A or x, y, z ∈ d i S for some i ∈ [r]. The largest sum-free set in d i S is of size ε |S| = ε |d i S|. Define k * := k + rε |S|, and observe that A has a sum-free subset of size k if and only if A * has a sum-free subset of size k * . By definition of r, rε |S| ≤ rε|S|+ε|A|−k, so we see that k * ≤ ε(|A|+r|S|) = ε|A * |. Now we can argue precisely as in the first case: from A * one can construct a set B in time polynomial in size(A) so that A * has a sum-free subset of size k * if and only if B has a sum-free subset of size at least ε|B|. In particular, B will be a yes-instance for ε-Sum-Free Subset if and only if (A, k) is a yes-instance for Sum-Free Subset, as required.
We are also able to prove an NP-completeness result in the only other cases of three-variable equations L where κ(L) is known, using a slight variation on the method of Theorem 18. In particular, the following result covers the case of progression-free sets. Recall that if L is translationinvariant, Ruzsa [37] observed that the largest L-free subset of [n] has size o(n) (and so κ(L) = 0). Theorem 19. Consider any rational 0 < ε < 1 and let L denote the equation ax + by = cz where a, b, c ∈ N and a + b = c. Then ε-L-Free Subset is NP-complete.
Proof. Fix ε and L as in the statement of the theorem; we will assume without loss of generality that b ≥ a. Recall that ε-L-Free Subset is in NP. To show NP-hardness, we once again give a reduction from L-Free Subset (restricted to inputs A ⊆ N), shown to be NP-hard in Theorem 5.
Suppose that (A, k) is an instance of L-Free Subset, where A ⊆ N. We will define a set B ⊆ N such that B has an L-free subset of size at least ε|B| if and only if A has an L-free subset of size k. The construction of B depends on the value k.
First suppose that k ≤ ε|A|. As in the proof of Theorem 18, we define d so that ε(|A| + d) = k + d. By Lemma 17 we can construct, in time bounded by a polynomial function of size(A), a set B ⊆ N of size |A| + d such that (by conditions (i)-(iii) of the lemma) B has an L-free subset of size k + d if and only if A has an L-free subset of size k. By our choice of d, this means that B is a yes-instance to ε-L-Free Subset if and only if (A, k) is a yes-instance to L-Free Subset. Now suppose k > ε|A|. Since the largest L-free subset of [n] has size o(n), we can find by exhaustive search a set S ⊆ N such that the largest L-free subset of S has size ε |S| for some 0 < ε < ε. Crucially, S is independent of our input (A, k) (so |S|, max(S) and size(S) are all fixed constants).
As in the proof of Theorem 18, we set We claim that the only solutions to ax + by = cz in A * are such that x, y, z ∈ A or x, y, z ∈ d i S for some i ∈ [r]. To see this, first suppose there are , and x 1 , x 2 , x 3 form a solution to L. Then by Fact 16 we have that cx 2 > ax 3 . However, we also know that, if i > 1, In either case this gives a contradiction. Next suppose there are , and x 1 , x 2 , x 3 form a solution to L. Suppose i > 1. Then d i divides x 2 and x 3 and so, by Fact 16, d i divides ax 1 or bx 1 . In particular, we have that bx 1 ≥ d i . However, since x 1 ∈ d i S, we have that x 1 ≤ d i−1 m and so bx 1 ≤ cm d i−1 < 3cm d i−1 = d i , a contradiction. The case i = 1 yields an analogous contradiction. Altogether this indeed proves the only solutions (x, y, z) to L in A * are such that x, y, z ∈ A or x, y, z ∈ d i S for some i ∈ [r]. Now we can continue as in the proof of Theorem 18. Note that the largest L-free set in d i S is of size ε |S| = ε |d i S|, and set k * := k + rε |S|, so that A has an L-free subset of size k if and only if A * has an L-free subset of size k * . By definition of r, we have k * ≤ ε(|A| + r|S|) = ε|A * |, so we can now argue precisely as in the first case to obtain a set B such that B is a yes-instance to ε-L-Free Subset if and only if (A, k) is a yes-instance to L-Free Subset.

Counting solution-free subsets of a given size
As mentioned in the introduction, much of the previous work on sum-free (and more generally solution-free) sets has focused on counting problems. In this section we consider such questions in the (parameterised) complexity setting. Consider the following counting problem.
#Given-Size L-Free Subset Input: A finite set A ⊆ Z and k ∈ N. Question: How many L-free subsets of A have cardinality exactly k?
It is clear that, whenever L satisfies the conditions of Theorem 4, there cannot be any polynomialtime algorithm for this problem unless P = NP, as such an algorithm would certainly tell us whether or not the number of such subsets is zero and hence solve the decision problem. However, it is interesting to consider the complexity of the counting problem with respect to the parameterisations k and |A| − k, as we saw in Section 5 that the decision problem is tractable with respect to both of these parameterisations.
In Section 7.2 we show that the counting problem is in FPT when parameterised by |A|−k; similar to the proof of Theorem 14, this result relies on a reduction to a counting version of Hitting Set. In contrast, we show in Section 7.3 that, for certain equations L, the counting problem is unlikely to admit an fpt-algorithm when parameterised by k; however, in many cases there is an efficient parameterised algorithm to solve this problem approximately, as we will see in Section 7.4. We begin in Section 7.1 with some background on the theory of parameterised counting complexity. 7.1. Parameterised counting complexity. We make use of the theory of parameterised counting complexity developed by Flum and Grohe [22,23]. Let Σ be a finite alphabet. A parameterised counting problem is formally defined to be a pair (Π, κ) where Π : Σ * → N 0 is a function and κ : Σ * → N is a parameterisation (a polynomial-time computable mapping). Flum and Grohe define two types of parameterised counting reductions, fpt parsimonious reductions and fpt Turing reductions. The latter is more flexible than the former, as it does not require us to preserve the number of witnesses as we tranform between problems; rather we must be able to compute the number of witnesses in one problem using information about the number of witnesses in one or more instances of the other problem, which allows us to make use of several standard techniques for counting reductions (such as polynomial interpolation and matrix inversion).
Definition. An fpt Turing reduction from (Π, κ) to (Π , κ ) is an algorithm A with an oracle to Π such that (1) A computes Π, (2) A is an fpt-algorithm with respect to κ, and (3) there is a computable function g : N → N such that for all oracle queries "Π (I ) = ?" posed by A on input I we have κ (I ) ≤ g (κ(I)).
In this case we write (Π, κ) ≤ fpt T (Π , κ ). There is an analogue of the W-hierarchy for counting problems; in order to demonstrate that a parameterised counting problem is unlikely to belong to FPT it suffices to show that it is hard (with respect to fpt-Turing reductions) for the first level of this heirarcy, #W [1] (see [23] for the formal definition of the class #W [1]).
A parameterised counting problem is considered to be efficiently approximable if it admits a fixed parameter tractable randomised approximation scheme (FPTRAS), which is defined as follows: Definition. A fixed parameter tractable randomised approximation scheme (FPTRAS) for a parameterised counting problem (Π, κ) is a randomised approximation scheme that takes an instance I of Π (with |I| = n), and rational numbers ε > 0 and 0 < δ < 1, and in time f (κ(I))·g(n, 1/ε, log(1/δ)) (where f is any computable function, and g is a polynomial in n, 1/ε and log(1/δ)) outputs a rational number z such that

Parameterisation by the number of elements not included in the solution-free set.
In this section we show that the counting problem, parameterised by |A| − k, is in FPT. This is a straightforward extension of the argument used for the decision problem in Section 5: since there is a one-to-one correspondence between L-free subsets of cardinality k and hitting sets of cardinality |A|−k in the construction described in Lemma 13, we can also make use of parameterised algorithms for counting hitting sets to count L-free subsets. Thurley [43] describes an fpt-algorithm for the following counting version of the problem. #p-card-Hitting Set Input: A hypergraph G = (V, E) and s ∈ N. Parameter: s + d, where d = max e∈E |e|. Question: How many hitting sets in G have cardinality exactly s?
As for Theorem 14, our result now follows immediately.
Theorem 20. Let L be any fixed linear equation. Then #Given-Size L-Free Subset, parameterised by |A| − k, belongs to FPT.

7.3.
Parameterisation by the cardinality of the solution-free set. In contrast with the positive result in the previous section, we now show that there is unlikely to be an fpt-algorithm with respect to the parameter k to solve #Given-Size L-Free Subset. To do this, we give an fpt-Turing reduction from the following problem, which can easily be shown to be #W[1]-hard by means of a reduction from p-#Clique (shown to be #W[1]-hard in [22]), along the same lines as the proof of the W[1]-hardness of p-Multicolour Clique in [21].
p-#Multicolour Clique Input: A graph G = (V, E), and a partition of V into k sets V 1 , . . . , V k . Parameter: k Question: How many k-vertex cliques in G contain exactly one vertex from each set V 1 , . . . , V k ?
When reducing from p-Multicolour Clique or its counting version, it is standard practice to assume that, for each 1 ≤ i < j ≤ k, the number of edges from V i to V j is equal. We can make this assumption without loss of generality because we can easily transform an instance which does not have this property to one which does without changing the number of multicolour cliques; note that if the input does not already satisfy this condition then k ≥ 3. We set q := max{e(V i , V j ) : 1 ≤ i < j ≤ k} (where e(A, B) denotes the number of edges with one endpoint in A and the other in B), and for any pair of sets (V i , V j ) where e(V i , V j ) = q < q, we add vertices {u 1 , . . . , u q−q } to V i and {w 1 , . . . , w q−q } to V j , and the set of edges {u r w r : 1 ≤ r ≤ q − q }; note that the largest cliques created by this process contain two vertices.
We in fact reduce p-#Multicolour Clique to a multicolour version of #Given-Size L-Free Subset, defined as follows.
p-#Multicolour L-Free Subset Input: A k-tuple of disjoint subsets A 1 , . . . , A k ⊆ Z. Parameter: k Question: How many L-free subsets of A = 1≤i≤k A i contain exactly one element from each set A 1 , . . . , A k ?
It is easy to give an fpt-Turing reduction from p-#Multicolour L-Free Subset to #Given-Size L-Free Subset parameterised by k. Proof. Let (A 1 , . . . , A k ) be the input to an instance of p-#Multicolour L-Free Subset. For each non-empty I ⊆ [k], we can use our oracle to L-Free Subset to find N I , the number of Lfree subsets of cardinality exactly k in the set i∈I A i . This requires Θ(2 k ) oracle calls, and for each oracle call the parameter value is the same as for the original problem. We can now use an inclusion-exclusion method to compute the number of L-free subsets of size k in A that contain exactly one number from each of the sets A i : this is precisely The main work in the reduction is in the next lemma, where we show that p-#Multicolour Clique can be reduced to p-#Multicolour L-Free Subset for certain equations L.
Lemma 22. Let L be a linear equation of the form a 1 x 1 + a 2 x 2 = by, where a 1 , a 2 , b ∈ N are fixed. Then p-#Multicolour Clique ≤ fpt T p-#Multicolour L-Free Subset. Proof. Let (G, {V 1 , . . . , V k }) be the input to an instance of p-#Multicolour Clique, where G = (V, E), and for each 1 ≤ i < j ≤ k let E i,j denote the set of edges between V i and V j . We may assume that |E i,j | = q for each 1 ≤ i < j ≤ k and that each V i is an independent set.
Suppose that V = {v 1 , . . . , v n }. We begin by constructing a set A ⊆ N as in Lemma 1; note that |A| = O(|V | 2 ) and log(max We define X t ⊆ N to be a set of t k 2 < k 4 natural numbers disjoint from A, chosen so that every solution to L in A ∪ X t is contained in A. Without loss of generality, we may assume that k 4 < |V | (otherwise we would be able to execute a brute force approach in time bounded by a function of k alone), so it follows from Lemmas 15 and 17 that we can construct such a set X t in time bounded by a polynomial function of |A| log(max(A)), and hence by a polynomial function of |V |. Moreover, the space required to represent X t is also bounded by a polynomial function of |V |. We now partition X t arbitrarily into k 2 sets X i,j t for 1 ≤ i < j ≤ k, each of size exactly t. The set A i,j [t] is then defined to be A i,j ∪ X i,j t . We set It follows from Lemma 1 and the construction of A[t] that the only solutions to L in A[t] are of the form a 1 x 1 + a 2 x 2 = by, where y corresponds to an edge whose endpoints are the vertices corresponding to x 1 and x 2 . We will say that a subset of A[t] is colourful if it contains precisely one element from each set A i (for 1 ≤ i ≤ k) and one element from each set A i,j [t] (for 1 ≤ i < j ≤ k).
Let N (A[t]) denote the number of L-free subsets of A[t] that are colourful. We can compute the value of N (A[t]) using a single call to our oracle for p-#Multicolour L-Free Subset with input (A 1 , . . . , A k , A 1,2 [t], . . . , A k−1,k [t]); note that the total size of the instance in such an oracle call is bounded by h(k) · |V | O(1) for some function h, and that the value of the parameter in our oracle call depends only on k.
Given any subset U ⊆ V such that |U ∩ V i | = 1 for each 1 ≤ i ≤ k, let us denote by N (A[t], U ) the number of colourful L-free subsets of A[t] whose intersection with A is precisely φ −1 V (U ). We now claim that , where e(U ) denotes the number of edges in the subgraph of G induced by U . To see that this is true, suppose that U = {w 1 , . . . , w k }, where w i ∈ V i for each i. If w i and w j are not adjacent, then we can choose freely any element of A i,j [t] to add to the set, without risk of creating a solution to L, so there are |A i,j [t]| = q + t possibilities for the element of A i,j [t] in the set; on the other hand, if w i and w j are adjacent, we must avoid the element of A i,j [t] corresponding to w i w j , so there are |A i,j [t]| − 1 = q + t − 1 possibilities for the element of A i,j [t] in the set. Since we can choose each element of A to include in the set independently of the others, this gives the expression above for N (A[t], U ).
Observe now that For 0 ≤ j ≤ k 2 , let C j denote the number of subsets U ⊂ V such that |U ∩ V i | = 1 for each 1 ≤ i ≤ k and e(U ) = j. We can then rewrite the expression above as If we now define it is clear that p is a polynomial in z of degree k 2 , and moreover that p(z) = N (A[z − q]). Thus if we know the value of p(z) for k 2 + 1 distinct values of z, we can interpolate in polynomial time to determine all the coefficients of p; we can obtain the required values of p(z) by using our oracle to evaluate N (A[t]) for t ∈ {0, 1, . . . , k 2 }. To complete the reduction we must demonstrate that, once we know the coefficients of p(z), it is straightforward to calculate the number of multicolour cliques in G. We will in fact argue that we only need to determine the constant term of p(z). Note that, if 0 ≤ j < k 2 , then z is a factor of C j z ( k 2 )−j (z −1) j . Thus the constant term of p(z) is the same as the constant term of the polynomial This constant term is so the absolute value of the constant term in p(z) is precisely C ( k 2 ) . But C ( k 2 ) is by definition the number of subsets U ⊆ V such that |U ∩ V i | = 1 for each i and e(U ) = k 2 , that is the number of multicolour cliques in G.
This completes the fpt-Turing reduction from p-#Multicolour Clique to p-#Multicolour L-Free Subset.
The main result of this section now follows immediately from Lemmas 21 and 22.
Theorem 23. Let L be a linear equation of the form a 1 x 1 + a 2 x 2 = by where a 1 , a 2 , b ∈ N are fixed. Then #Given-Size L-Free Subset, parameterised by k, is #W[1]-hard with respect to fpt-Turing reductions. 7.4. Approximate counting. For 3-variable homogeneous linear equations L, we have seen that there is unlikely to be an fpt-algorithm, with parameter k, to solve #Given-Size L-Free Subset exactly; however, the problem does admit an FPTRAS (for any non-translation-invariant homogeneous linear equation L). We prove the following result, following the method of [35,Lemma 3.4].
Lemma 24. Let L be any non-translation-invariant homogeneous linear equation. Let A ⊂ Z be finite and k ∈ N, and let N denote the number of elements of A (k) that are L-free. Then, for every ε > 0 and δ ∈ (0, 1) there is an explicit randomised algorithm which outputs an integer α, such that and runs in time at most f (k)q(size(A), ε −1 , log(δ −1 )), where f is a computable function, q is a polynomial.
Sketch proof. The algorithm uses a simple randomised sampling method: we repeatedly select a random subset of A of cardinality k, and check whether it is L-free. After completing a suitable number of trials, we conclude that the observed proportion of L-free subsets is a good estimate for the overall proportion of such subsets. In order for this conclusion to be valid (without requiring a prohibitively large number of trials) we need to know that the proportion of L-free subsets of size , where f is any computable function andq is any polynomial). This follows immediately (if |A| is sufficiently large compared with k; otherwise we can count exactly by brute force in time bounded by a function of k) from the fact that, by Theorem 11 A must contain a large L-free subset B, and any subset of B of size k will necessarily be L-free.
The existence of an FPTRAS now follows immediately.
Theorem 25. Let L be any non-translation-invariant homogeneous linear equation. Then #Given-Size L-Free Subsetadmits an FPTRAS.

An extension version of the problem
A natural variant of the problem L-Free Subset is to ask whether, given A ⊆ Z and an (L-free) subset B ⊂ A, there is an L-free subset of A of cardinality k which contains B. This problem can be stated formally as follows.

L-Free Subset Extension
Input: A finite set A ⊆ Z, a set B ⊂ A and k ∈ N. Question: Does there exist an L-free subset A ⊆ A such that B ⊆ A and |A | = k?
We can make certain easy deductions about the complexity of this problem from the results we have already proved about L-Free Subset. Notice that we can easily define a reduction from L-Free Subset to L-Free Subset Extension by setting B = ∅; the next result follows immediately from this observation together with Theorem 4.
Proposition 26. Let L be a linear equation of the form a 1 x 1 + · · · + a x = by where each a i ∈ N and b ∈ N are fixed and ≥ 2. Then L-Free Subset Extension is NP-complete, and the problem is para-NP-complete parameterised by |B|.
Perhaps the most obvious parameterisation of this problem to consider is k − |B|, the number of elements we want to add to the set B. It is straightforward to adapt our earlier results to demonstrate that, in the case of three-term equations, the problem is unlikely to admit an fptalgorithm with respect to this parameterisation.
Proposition 27. Let L be a linear equation of the form a 1 x 1 + a 2 x 2 = by, where a 1 , a 2 , b ∈ N are fixed. Then L-Free Subset Extension is W[1]-hard, parameterised by k − |B|.
Proof. We prove this result by means of a reduction from the W[1]-complete problem p-Independent Set, which is defined as follows.
p-Independent Set Input: A graph G and k ∈ N. Parameter: k Question: Does G contain an independent set of cardinality k?
The W[1]-completeness of this problem can easily be deduced from that of p-Clique, shown to be W[1]-complete in [16].
Let (G, k ) be the input to an instance of p-Independent Set. Once again, we rely on the construction in Lemma 1 to give us the set A in our instance of L-Free Subset Extension; we set B := A and k := |B| + k (so the parameter of interest in our instance of L-Free Subset Extension is equal to the solution size in the instance of p-Independent Set). By Corollary 2, we know that there is a one-to-one correspondence between independent sets in G of cardinality k and L-free subsets of A of cardinality |A | + k that contain A , so it follows immediately that (A, B, k) is a yes-instance for L-Free Subset Extension if and only if (G, k ) is a yes-instance for p-Independent Set.
On the positive side, we observe that we can once again make use of fpt-algorithms for p-card-Hitting Set if we consider L-Free Subset Extension parameterised by the number of elements of A that are not included in the subset. Note that the standard bounded search tree method for p-card-Hitting Set (or its counting version) [23, Theorem 1.14] in fact gives an fpt-algorithm to find all hitting sets of size k. As it is easy to check in time polynomial in size(A) whether a given hitting set in the hypergraph defined in the proof of Lemma 13 contains a vertex corresponding to an element of B, we can use this method to count L-free subsets of A that contain B (and hence to decide whether there is at least one).
Proposition 28. Let L be any fixed linear equation. Then L-Free Subset Extension, parameterised by |A| − k, belongs to FPT; the same is true for the counting version of the problem with this parameterisation.
Finally, we consider parameterising simultaneously by the number of elements we wish to add to B and the size of the set B; this is equivalent to parameterising by k, the total size of the desired L-free subset.
Proposition 29. Let L be a linear equation of the form a 1 x 1 + a 2 x 2 = by, where a 1 , a 2 , b ∈ N are fixed and a 1 + a 2 = b. Then L-Free Subset Extension, parameterised by k, belongs to FPT.
Proof. Let (A, B, k) be an instance of L-Free Subset Extension. If k < |B| then this is necessarily a no-instance, so we may assume without loss of generality that k ≥ |B|; we may also assume that B is L-free (we can check this in polynomial time and if B contains a solution to L we immediately return NO).
As a first step in our algorithm, we delete from A every a ∈ A \ B such that B ∪ {a} is not L-free: note that this does not change the number of L-free subsets containing B, as such an a cannot belong to any set of this kind. We call the resulting set A 1 , and note that we can construct A 1 in time polynomial in size(A). Note that no set containing 0 can be L-free, so we know that 0 ∈ A 1 .
Fix the constant λ = λ(L) as in equation (2). Our algorithm proceeds as follows. If |A 1 | < 6|B| + 1 λ (k − |B|) + |B|, we exhaustively consider all k-element subsets of A 1 and check if they form an L-free subset; if |A 1 | ≥ 6|B| + 1 λ (k − |B|) + |B| then we return YES. To see that this is an fpt-algorithm, note that, if |A 1 | < 6|B| + 1 λ (k − |B|) + |B|, then |A 1 | = O(k 2 ), so we can perform the exhaustive search in time depending only on k. It is clear that we will return the correct answer whenever we perform the exhaustive search; in order to prove correctness of the algorithm, it remains to show that, if |A 1 | ≥ 6|B| + 1 λ (k − |B|) + |B|, then we must have a yes-instance. To see that this is true, we first prove that there exists a large set A 2 ⊆ A 1 such that B ⊆ A 2 and no solution to L in A 2 involves an element of B. We will call a triple (x, y, z) ∈ A 3 1 bad if a 1 x + a 2 y = bz. By construction of A 1 , note that every bad triple contains at most one element of B. We aim to bound the number of bad triples containing some fixed u ∈ A 1 \ B and at least one element of B. First, we also fix v ∈ B, and bound the number of bad triples which contain both u and v. If we fix the positions of u and v in a triple, there is at most one w ∈ A 1 \ B such that w completes the triple; as there are 6 options for the choice of positions of u and v in the triple, this means there are in total at most 6 bad triples involving both u and v. Summing over all possibilities for v, we see that there are at most 6|B| bad triples involving any fixed u ∈ A 1 \ B and at least one element of B.
We can therefore greedily construct a set C ⊆ A 1 \ B of size at least |A 1 |−|B| 6|B|+1 such that every bad triple in B ∪ C is entirely contained in C. Indeed, initially set C := ∅ and A := A 1 \ B. Move an arbitrary element u of A to C and delete all elements of A that form a bad triple with u and at least one element of B; by the reasoning above, this involves deleting at most 6|B| elements of A . Repeat this process until A = ∅, and note that C is as desired; set A 2 := B ∪ C. Now observe that every L-free subset of cardinality k − |B| in C can be extended to an L-free subset of A 2 of cardinality k which contains B. We know from Theorem 11 that there exists an L-free subset C ⊆ C of cardinality at least λ|C| (where λ is the constant defined in equation (2)). B ∪ C is then an L-free subset of A, and has cardinality at least so we should indeed return YES.
We note that the argument used in this proof can be adapted to demonstrate the existence of an FPTRAS for the counting version of this problem, using the ideas from Lemma 24. However, there is unlikely to be an fpt-algorithm to solve the counting version exactly, as we can easily reduce #Given-Size L-Free Subset to this problem (with the same parameter) by setting B = ∅.

Conclusions and open problems
We have shown that the basic problem of deciding whether a given input set A ⊆ Z contains an L-free subset of size at least k is NP-complete when L is any linear equation of the form a 1 x 1 + · · · + a x = by (with a i , b ∈ N and ≥ 2), although the problem is solvable in polynomial time whenever L is a linear equation with only two variables. We also demonstrated that the maximisation version of the problem is APX-hard for equations L of the form a 1 x 1 + a 2 x 2 = by  (with a 1 , a 2 , b ∈ N).
Two natural questions arise from these results. First of all, in our NP-hardness reduction, we construct a set A where max(A) is exponential in terms of |A|: is this problem in fact strongly NPcomplete, so that it remains hard even if all elements of A are bounded by some polynomial function of |A|? Secondly, can either the NP-completeness proof or the APX-hardness proof be generalised to other linear equations L? A natural starting point for an equation that is not covered by Theorem 4 would perhaps be the case of Sidon sets (i.e. x + y = z + w).
On the positive side, we saw that the decision problem belongs to FPT for any homogeneous nontranslation-invariant equation L when parameterised by the cardinality of the desired L-free subset, and that it belongs to FPT for any linear equation L with respect to the dual parameterisation (the number of elements of A not included in the L-free subset). While we have considered two natural parameterisations here, there is another natural parameterisation that we have not considered. We know that, for certain linear equations, there is some function c * L such that every set A ⊆ Z is certain to contain an L-free subset of cardinality at least c * L (|A|). It is therefore natural to consider the complexity parameterised above this lower-bound: what is the complexity of determining whether a given subset A ⊆ Z contains an L-free subset of cardinality at least c * L (|A|) + k, where k is taken to be the parameter? The main difficulty in addressing this question is that the exact value of c * L is not known for any linear equations L: even in the case of sum-free subsets, we only know that the bound on c * L is of the form |A| 3 + o(|A|). We also considered the complexity of determining whether a set A ⊆ Z \ {0} contains an Lfree subset containing a fixed proportion ε of its elements. We demonstrated that this problem is also NP-complete for the case of sum-free sets, and also for L-free sets whenever L is a 3-variable translation-invariant linear equation. It would be interesting to investigate how far these results can be generalised to other linear equations: given any non-translation-invariant, homogeneous linear equation L and any rational κ(L) < ε < 1, is ε-L-Free Subset NP-complete?
Concerning the complexity of counting L-free subsets, we have addressed the problem of counting L-free subsets containing exactly k elements. For equations L covered by the NP-hardness result of Theorem 4, even approximate counting is hard: there is no FPRAS for arbitrary k unless NP = RP (as if we could count approximately we could, with high probability, determine whether there is at least one L-free subset of size k). We also considered the complexity of this problem parameterised separately by k and |A| − k.
However, there are other natural counting problems we have not addressed here. For example, we might want to count the total number of L-free subsets of any size; here the decision problem ("Is there an L-free subset of any size?") is trivial, so there is no immediate barrier to an efficient counting algorithm. Alternatively, we might want to count the total number of maximal L-free sets. Our results do not have any immediate consequences for either of these problems, but the corresponding counting versions of the extension problem are necessarily #P-complete: by Corollary 2, we have a polynomial-time reduction to this problem from that of counting all (maximal) independent sets in an arbitrary -uniform hypergraph; for = 2 and = 3 this problem was shown to be #P-complete by Greenhill [30], and we can easily add further dummy vertices to each edge (and then require that the elements corresponding to the dummy vertices are included in our L-free subset) to deal with larger values of .