Counting elliptic curves with a rational $N$-isogeny for small $N$

We count the number of rational elliptic curves of bounded naive height that have a rational $N$-isogeny, for $N \in \{2,3,4,5,6,8,9,12,16,18\}$. For some $N$, this is done by generalizing a method of Harron and Snowden. For the remaining cases, we use the framework of Ellenberg, Satriano and Zureick-Brown, in which the naive height of an elliptic curve is the height of the corresponding point on a moduli stack.


Introduction
Let E be an elliptic curve over Q.An isogeny φ : E → E ′ between two elliptic curves is said to be cyclic of degree N if Ker(φ)( Q) ∼ = Z/N Z. Further, it is said to be rational if Ker(φ) is stable under the action of the absolute Galois group, G Q .A natural question one can ask is, how many elliptic curves over Q have a rational cyclic N -isogeny?Henceforth, we will omit the adjective 'cyclic', since these are the only types of isogenies we will consider.It is classically known that for N ≤ 10 and N = 12, 13, 16, 18, 25, there are infinitely many such elliptic curves.Thus we order them by naive height.An elliptic curve E over Q has a unique minimal Weierstrass equation y 2 = x 3 + Ax + B where A, B ∈ Z and gcd(A 3 , B 2 ) is not divisible by any 12th power.Define the naive height of E to be ht(E) = max{|A| 3 , |B| 2 }.Notation 1.For two functions f, g : R → R, we say that f (X) ≍ g(X) if there exist positive constants K 1 and K 2 such that K 1 g(X) ≤ f (X) ≤ K 2 g(X).For a positive real number X and positive integer N , define We are interested in finding a function h N (X) such that N (N, X) ≍ h N (X) for any real X > 0. Note that h N (X) describes the rate of growth of N (N, X), rather than being an asymptotic.In this paper we will often call it the asymptotic growth rate of the the function N (N, X).
Theorem 1.1.Maintaining the notation above, we have the following values of h N (X).
X 1/2 8 X 1/6 log(X) 3 X 1/2 9 X 1/6 log(X) 4 X 1/3 12 X 1/6 5 X 1/6 (log(X)) 2 16 X 1/6 6 X 1/6 log(X) 18 X 1/6 Table 1.Values of h N (X), ordered by naive height This result is motivated by work of Harron and Snowden in [HS17].In their paper they ask, for a given group G from Mazur's list in [MG78, Theorem 2] , how many elliptic curves have and compute d(G) for each group in Mazur's list.Our counting results are a generalization of those in their paper, as explained later in this section.
Remark 1.1.Some of these counts are not new, but we are able to give new proofs for them.Counting elliptic curves with a rational isogeny of degree 2 is equivalent to counting elliptic curves with a rational 2-torsion point.This is covered in [HS17].The case of N = 3 was recently worked out by Pizzo, Pomerance and Voight [PPV20].The case N = 4 was completed by Pomerance and Schaefer in [PS20], building off of work by Cullinan, Keeney and Voight in [CKV20].The case N = 4 also follows from recent work of Bruin and Najman in [BN20], which we say more about in Remark 1.2.We include these cases in this paper since our methods are different, and the case N = 3 serves as a good example for the purpose of exposition.An interesting phenomenon that occurs in this case is that of an 'accumulating subvariety,' which occurs in a variety of questions related to the original Batyrev-Manin conjecture for schemes.It turns out that the X 1/2 contribution comes from elliptic curves with j-invariant 0, while the rest only contribute an X 1/3 log(X).In a later section, we explain the absence of the cases N = 7, 10, 13 and 25.
1.1.Counting rational points on stacks.Let X 0 (N ) be the compactification of the modular curve parametrizing pairs (E, C), where E is an elliptic curve and C is a subgroup of E isomorphic to Z/N Z.
We can rephrase our problem of finding h N (X) in terms of counting rational points on X 0 (N ).Each pair (E, C) has an non-trivial automorphism.Thus X 0 (N ) has generic inertia stack Bµ 2 and is not a scheme.
For N ≤ 10 and N = 12, 13, 15, 16, 18 and 25, the coarse space of X 0 (N ) can be identified with P 1 via hauptmoduln.The difficulty in counting rational points on the stack X 0 (N ) arises from the fact that there are many rational points with the same hauptmodul, parametrized by quadratic twists.
Let X be a modular curve that is a scheme and let E : y 2 = x 3 + Ax + B be a rational point on it.Then max{|A| 3 , |B| 2 } is, up to a constant, the height with respect to the twelfth power of the Hodge bundle on X .So the natural question one might ask is, does the naive height also come from geometry in the case of X 0 (N )?A positive answer to this question can be deduced from a forthcoming paper of Ellenberg, Satriano and Zureick-Brown ( [ESZ20]).In this paper, the authors establish a theory of heights on stacks.In particular, their height on X 0 (N ) with respect to the 12th power of the Hodge bundle coincides with naive height.We use this geometric interpretation of naive height in §5 to counts points on certain modular curves.
1.2.Outline of Proof of Theorem 1.1.We use two main methods in the proof of the main theorem.For N = 3, 4, 6, 8, 9, 12, 16, 18, we generalize the methods of Harron and Snowden in [HS17] to count elliptic curves in families.The idea of Harron and Snowden is to use an equation for the universal family of elliptic curves with a given torsion subgroup to reduce to a problem in analytic number theory -counting polynomials over Q whose coefficients satisfy certain conditions.Counting elliptic curves with rational torsion isomorphic to Z/N Z can be rephrased in terms of counting points on the modular curves X 1 (N ) (see §2.1).For 5 ≤ N ≤ 10 and N = 12, X 1 (N ) is in fact a scheme isomorphic to P 1 Q , and so there exists a universal family E → X 1 (N ) with a model y 2 = x 3 + f (t)x + g(t) over Q(t).An elliptic curve y 2 = x 3 + Ax + B over Q has a rational N -torsion point if and only if there exist u, t ∈ Q such that A = u 4 f (t) and B = u 6 g(t), and this is what is exploited by Harron and Snowden.The cases N = 3, 4 are handled by a slight modification of these methods.
The problem with applying this idea to counting elliptic curves with an N -isogeny is that X 0 (N ) is not a scheme or even a stacky curve (in the sense of [VZ15]).As such, there is no nice universal family at our disposal.Instead, the proof of Theorem 1.1 involves reducing to a case where the framework of [HS17] can be applied.We show that given an elliptic curve E with an N -isogeny, there exists a twist E χ of E that can be interpreted as a rational point on either a scheme or at worst, a stacky curve.This curve, which we construct in §2, is in fact a double cover of X 0 (N ).Thus we reduce to counting quadratic twists within a framework very similar to the one used by Harron and Snowden in the X 1 (2) case.We note here that the technical counting theorems in [HS17] are not enough to give us the results that we need.We thus prove a generalization of one of their theorems in §3, below.
For N = 2 and 5 this strategy does not work, since the cover of X 0 (N ) that we construct is still a µ 2 gerbe over its rigidification (in fact for N = 2, it is equal to X 0 (N )).Instead, we use the interpretation of naive height as the height with respect to the 12th power of the Hodge bundle.This allows us (in §5) to use different sections of the Hodge bundle to calculate the height of a rational point on X 0 (N ), giving us a height that only differs from the naive height by a constant.We do this for N = 2, 3, 4, 6, 8 and 9 as well, thus giving a different proof for the asymptotic growth rates in these cases.

Remark 1.2. In recent work ( [BN20]
), Bruin and Najman use the structure of X 0 (2) and X 0 (4) as weighted projective lines to obtain h 2 (X) = X 1/2 and h 4 (X) = X 1/3 .This asymptotic holds over number fields as well.Although their proof is different from ours, the underlying idea of utilizing the stacky nature of these modular curves is the same.1.3.Acknowledgements.We would like to thank Jordan Ellenberg for his valuable help and support.We would also like to thank David Zureick-Brown, John Voight and Jeremy Rouse for many helpful conversations and comments.We are also grateful to Andrew Snowden, Peter Bruin and Filip Najman for comments on the early draft of this paper.The second author would also like to thank Brandon Alberts and Libby Taylor.

Preliminaries
2.1.Modular curves.We start with some background and notation for modular curves.Most of this can be found in any standard textbook on modular curves.We recall this primarily in order to introduce the notation we will use for the rest of the paper.Let S be a scheme.An elliptic curve over S is a map π : E → S, together with a section z : S → E, such that every fiber of π is a smooth projective genus 1 curve.Let N be a positive integer.Let Y 0 (N ) denote the modular curve such that for a where E/S is an elliptic curve over S, C is a sub-group scheme of E defined over S, and the pair is taken up to isomorphism.Let X 0 (N ) denote the compactification of Y 0 (N ) (in the sense of Deligne and Mumford).Every point of this moduli space possesses the extra automorphism −1, and so X 0 (N ) is a stack with generic inertia stack µ 2 .
Let Y 1 (N ) denote the curve whose points are given by: where E/S is an elliptic curve over S, P ∈ E(S) is a point of order N , and the pair is taken up to isomorphism.Let X 1 (N ) denote the Deligne-Mumford compactification of Y 1 (N ).For N ≥ 5, X 1 (N ) is a scheme.There is a natural map Φ N : X 1 (N ) → X 0 (N ) which sends (E, P ) to (E, P ), where P denotes the subgroup of E generated by P .We remark here that the cusps of modular curves also have a moduli interpretation.They paramterize generalized elliptic curves with Γ 0 (N ) or Γ 1 (N ) structures.For a more detailed exposition on these, we refer the reader to [DR73] or [Con07].A short summary can be found in Appendix A.
Definition 1.Let M denote any modular curve.For any point S → M, let p : E → S denote the corresponding elliptic curve.The Hodge bundle λ M is the line bundle on M such that (λ M ) S = p * ω E/S .From the definition, one can see that if M is a modular curve parametrizing elliptic curves with some level structure, then λ M is the pull back of λ X (1) along the forgetful map M → X (1).For ease of notation, we will omit the M in λ M whenever the underlying modular curve is clear from context.Modular forms of weight k and level N are sections of the k-th power of the Hodge bundle on X 0 (N ).The coefficients A and B in the Weierstass equation y 2 = x 3 + Ax + B are, up to a scalar, the Eisenstein series E 4 and E 6 on X (1) respectively.Thus A 3 and B 2 are sections of λ ⊗12 on X 0 (N ).Thus counting elliptic curves of bounded naive height is the same as counting elliptic curves of bounded height with respect to λ ⊗12 on any modular curve that is a scheme (and as we shall see later, also moduli stacks).Definition 2. Let M denote the coarse space of a modular curve M. When M ∼ = P 1 , its function field is freely generated by a single element; this element is called a hauptmodul.These hauptmoduln parametrize elliptic curves with a given level structure, and can be used to write equations for modular curves.2.2.Rationally defined subgroups.In this subsection, we describe a degree two cover of X 0 (N ) that we will use in our counting problem.To this end, let N ≥ 3 and let G = (Z/N Z) × .Then Φ N : X 1 (N ) → X 0 (N ) is a branched G-cover of X 0 (N ), with branch locus supported at irregular cusps and possibly points with j = 0, 1728.Away from the branch locus, G acts freely and transitively on the fibers of Φ N , by sending a : (E, P ) → (E, aP ).Let H be an index two subgroup of G.We denote by X 1/2 (N ) the quotient X 1 (N )/H.One can make sense of this quotient at the cusps by using the moduli interpretation of cusps as stated in §2.1; this construction is carried out in Appendix A. We will denote by Y 1/2 (N ) the quotient Y 1 (N )/H.Remark 2.1.Before we proceed, we make some comments about the curves X 1/2 (N ).
(1) The curve X 1/2 (N ) is not a novel construction.It can be understood classically as the quotient of the upper half plane by an index 2 subgroup of Γ 0 (N ).Further, we do not claim that X 1 (N )/H is a scheme.In fact it is a stack in many cases (see §4). (2) The notation X 1/2 (N ) might be misleading, since there is not always a unique index two subgroup of (Z/N Z) × .However, in our case we will only consider the H for which G/H is represented by {+H, −H}.As an example, (Z/8Z) × ∼ = Z/2Z × Z/2Z.We will write this set as {1, 3, 5, 7}.This has three index two subgroups: H 1 = {1, 3}, H 2 = {1, 5} and H 3 = {1, 7}.The two cosets of H 1 are therefore H 1 = {1, 3} and −H 1 = {5, 7}.Similarly for H 2 .However, the two cosets of H 3 are H 3 = {1, 7} and 3H 3 = {3, 5}.We will make it a point to not pick H 3 .The choice between H 1 or H 2 will not affect our final result.(3) In the context of the remark above, we note that there are some values of N (namely N = 5, 10, 13, 25) for which there is no choice of index 2 subgroup such that G/H = {±H}.For these N , while the construction of X 1/2 (N ) still makes sense, it does not have the nice properties that we want (see Lemma 2.1 and Proposition 4.1).Another way to rephrase the condition that G/H = {±H} is in terms of the subgroup Γ 0 (N ) ⊂ SL 2 (Z).Consider the short exact sequence: For N = 5, 10, 13, 25, this sequence is non-split, while for the remaining N , it does split.This splitting enables us to construct a degree two cover of X 0 (N ) without generic inertia.
We now explain the significance of the curves X 1/2 (N ).Most of what follows is well known (e.g., see [RZ15], [Gre+14]) but we recall them here for completeness.Let E be an elliptic curve over Q with a rational N -isogeny.For notational convenience, we fix a Weierstrass form y 2 = x 3 + Ax + B, with A, B ∈ Z, for E. Fix an isomorphism of the kernel of the rational N -isogeny with Z/N Z.The Galois action of G Q on the kernel defines a homomorphism: This allows us to factor χ into two characters χ 1 : G Q → Z/2Z and χ 2 : G Q → Z/mZ.That is, we may write χ = χ 1 χ 2 using the isomorphism f .Now, since χ 1 is a quadratic character, it factors through a quadratic extension K = Q( √ d), with d a squarefree integer.Let E χ1 : dy 2 = x 3 + Ax + B denote the quadratic twist of E over K. Lemma 2.1.Maintaining the above notation, E χ1 has a rational N -torsion subgroup on which G Q acts via χ 2 .That is, the Galois action on this N -torsion subgroup factors as: We have thus proved the following for N ∈ {3, 4, 6, 7, 8, 9, 12, 16, 18}.

Proposition 2.2. Fix an appropriate index 2 subgroup H ⊂ (Z/N Z) × and consider the corresponding curve
Then there exists a unique d ∈ Z squarefree, such that the corresponding twist (E χ1 , φ(C)) satisfies: (1) (φ(C)) × has an index two subgroup H C defined over Q, and therefore (2 Proof.This follows from combining the interpretation of X 1/2 (N ) as a fiberwise quotient of X 1 (N ) with Lemma 2.1.
A nice example of Proposition 2.2 is in the cases N = 3, 4, 6, where (Z/N Z) × ∼ = Z/2Z.In these cases X 1/2 (N ) = X 1 (N ).For these values of N , Proposition 2.2 says that if E has a rational N -isogeny then there exists a quadratic twist of E that has a rational N torsion point.

Automorphisms and universal families.
In this section, we briefly recall the relation between automorphisms and the existence of universal families.For more details, we refer the reader to [KM85], Chapter 4 and Appendix A.4.Let F be a functor on the category Ell of elliptic curves over a ring R. Let F denote the corresponding functor on the category of R-schemes sending an R-scheme S to isomorphism classes of pairs (E/S, α), where E is an elliptic curve over S and α ∈ F (E/S) is an 'F -level structure'.The functor F (resp.F ) is representable if there exists a universal elliptic curve E over a scheme M (resp.a scheme M) such that F (E/S) = Hom(E/S, E/M) (resp.F (S) = Hom(S, M)).Note that the representability of F guarantees the existence of M, and therefore implies the representability of F .The functor F is said to rigid if for any E/S ∈ Ell, and any α ∈ F (E/S), the pair (E/S, α) has no non-trivial automorphisms.In general, if F is representable, then F is rigid.The following proposition tells us when the converse is true: , 4.7.0).Suppose that for every elliptic curve E/S, the functor on the category of schemes over S defined by T → F (E T /T ) is representable by a scheme.Suppose further that F is affine over Ell, that is, the morphism F E/S → S is affine.Then F is representable if and only F is rigid.
In this paper, we will be interested in the functors of points corresponding to X 0 (N ), X 1 (N ) and the intermediate quotient X 1/2 (N ).To see that in these cases, the two hypotheses of Proposition 2.3 are satisfied, we refer the reader to [KM85].Thus we may move freely between the existence of universal families and rigidity.
2.4.Counting lattice points in a region.In this section, we state a theorem of Davenport on a Lipschitz principle ( [Dav51]).Let R be a closed and bounded region in R n .Suppose R satifies the following two conditions: (1) Any line parallel to one of the coordinate axes intersects R in a set that is a union of at most h intervals.
(2) The same is true (with n replaced by m) for any of the m-dimensional regions obtained by projecting R down to an m-dimensional coordinate axis (1 ≤ m ≤ n − 1).Let V (R) be the volume of the region R and N (R) the number of lattice points in it.Then, the following theorem holds.

Theorem 2.4 ([Dav51]
).For R satisfying 1 and 2, where V m is the sum of the (m-dimensional) volumes of the m-dimensional projections of R and V 0 = 1.We will use this theorem repeatedly in the next section.

Counting quadratic twists in families
From §2, we see that in order to count elliptic curves in X 0 (N )(Q) with respect to naive height, we must count elliptic curves for which there exists a quadratic twist that gives a rational point on X 1/2 (N )(Q).In this section we state and prove the counting results that will enable us to do so.Proposition 3.1 ([HS17], Theorem 4.1).Let f, g ∈ Q[t] be coprime polynomials of degrees r and s respectively.Let max{r, s} > 0 and let m and n be coprime integers such that Assume that either n = 1 or m = 1.Let S(X) be the set of pairs (A, B) ∈ Z 2 such that As we will see in Section 4, this theorem is not enough for all the cases that we are interested in.For N = 3, the condition: 'either n = 1 or m = 1' is not satisfied.We will thus prove a generalization of this proposition.
Remark 3.1.We note here that we do not prove the most general version of Theorem 3.2 possible, since we do not need it.It might be an interesting exercise in analytic number theory to prove such a version, independent of the interpretation of counting points on a moduli space.Theorem 3.2.Let f, g ∈ Q[t] be coprime polynomials of degrees r and s respectively.Let max{r, s} > 0 and let m and n be coprime integers such that Let S(X) be the set of pairs (A, B) ∈ Z 2 such that Then, S(X) ≍ X (m+1)/6n log(X).
Remark 3.2.Note that the hypotheses on m, n, r and s make it so that there aren't many choices of these variables that satisfy all the hypotheses together.The degree conditions for X 0 (3), which give m = 3, n = 2, h = 1 and w = 2, are perhaps the only moduli problem of interest that that satisfy these.However, stating the theorem in this manner instead of using numbers makes the method less opaque and more amenable to generalization.
3.0.1.Proof of Theorem 3.2.The proof of this theorem closely follows that in [HS17].We provide the key parts of the proof here for the sake of completeness.We prove the upper bound and the lower bound in two separate sections.For the reader's convenience, we outline each proof first.

Notation 2. For any two real valued functions h(X) and k(X), we say that h(X) <
∼ k(X) if there is a positive constant C such that h(X) ≤ Ck(X).
Upper bound.Our goal is to reduce the problem of counting pairs in S(X) to the problem of counting tuples of integers in a bounded region, perhaps with some divisibility conditions.Let S 1 (X) be the set of u, t such that (u 2 f (t), u 3 g(t)) ∈ S(X).Counting S 1 (X) gives an upper bound for S(X).We will express u and t as qc −1 db n and ab −m respectively for some integers a, b, c and d and some rational number q. Lemmas 3.3, 3.4 and 3.5 enable us to do this.The next key observation is that there are only finitely many possibilities for q.Thus for the kind of upper bound that we are looking for, we can count 4-tuples of integers in a particular region.Lemma 3.6 gives the bounds for such a region.Lemma 3.7 outlines what divisibility conditions these integers must satisfy, and also calculates the number of such tuples.
Furthermore, we can take c p = 1 for all sufficiently large p.
Let S 1 (X) be the set of u, t such that (u 2 f (t), u 3 g(t)) ∈ S(X).
Lemma 3.4.For each prime p there is a constant C p such that for all (u, t) ∈ S 1 (X), we have: for some |ǫ ′ | ≤ C p .Moreover, we can take C p = 1 for all p sufficiently large.
Proof.The proof of this lemma closely follows that of Lemma 2.3 in [HS17].Fix a prime p.Since A and B must be integral, we have that: Note that if val p (u) ≥ 2 + K, then by replacing u by p 2 u we see that p 12 | gcd(|A| 3 , B 2 ).Thus we must have K ≤ val p (u) ≤ K + 1.The rest of the proof goes exactly like in [HS17].Suppose val p (t) < 0. Pick K 1 such that | val p (f (t)) − r val p (t)| < K 1 and | val p (g(t)) − s val p (t)| < K 1 for all such t.Note that K 1 can depend on p and is 0 for large enough p.Then, where |ǫ| < K 2 for some K 2 .Thus we have: for val p (t) < 0. Now consider the case when val p (t) ≥ 0. By Lemma 3.3, there exists K 3 such that min(val p (f (t)), val p (g(t)) ≤ K 3 .Further, K 3 = 0 for p ≫ 0. Thus − val p (u) ≤ K 4 for some constant K 4 .Since val p (t) ≥ 0, there is a K 5 such that val p (f (t)) ≥ K 5 and val p (g(t)) ≥ K 5 for all such t.This gives a lower bound on − val p (u), appealing again to (4).Thus there is a constant K 7 such that | val p (u)| ≤ K 7 .We remark here to avoid confusion that all the K i 's are constant with respect to t and u, but do depend on p, f and g.This gives us first part of the lemma.For the second part of the lemma, we need only take, as in [HS17], p ≫ 0 such that: (1) the coefficients of f and g are p-integral, (2) the leading coefficients of f and g are p-units and (3) the constant c p in Lemma 3.3 can be taken to be 1.Since K ≤ val p (u) ≤ K + 1, we can only get C p = 1 for p ≫ 0. The next step is to prove an analogue of Lemma 2.4 in [HS17].This will enable us to reduce our problem to that of counting lattice points in a region.We start with some notation.Recall that w = max{3h/s, 2h/r}.For a given pair of positive integers (a, b), we say a prime p satisfies ( * ) if: p|b =⇒ p w |a Lemma 3.5.Suppose (u, t) ∈ S 1 (X).There is a finite set Q ⊂ Q × (independent of u and t) such that: we can write t = ab −m and u = qc −1 db n , where: (1) a, b ∈ Z, with b > 0, (2) gcd(a, b m ) is m-th power free, (3) d is a squarefree integer, (4) q ∈ Q, and (5) c ∈ Z such that val p (c) ≤ h for all p and val p (c) > 0 if and only if p satisfies ( * ).
Proof.Given t ∈ Q, one can always write t = ab −m satisfying (1) and (2).Pick any such representation.We now analyze ub −n and show that val p (ub −n ) must satisfy the required constraints.For convenience we will fix N 0 to be an integer such that C p = 1 for p ≥ N 0 .Such an N 0 exists by Lemma 3.4.
We divide the set of all primes into two groups: Therefore for any p, we have If p ≤ N 0 , we have no control over C p , but we know that there are finitely many possibilities for the N 0 -smooth part of ub −n , since | val p (ub −n )| ≤ C p + h (here, N 0 -smooth means the part of the numerator or denominator that is divisible only by primes less than or equal to N 0 ).For p ≥ N 0 , we have: In the case that p ∤ b and p ≥ N 0 , we see that val p (t) ≥ 0. Further, in the proof of the previous lemma, N 0 is picked so that for p ≥ N 0 , the val p (f (t)) ≥ 0 and val p (g(t)) ≥ 0, with at least one of them being an equality.In particular, this implies that val p (ub −n ) ≥ 0 for such p.Similarly, for p|b (val p (t) < 0) and p ≥ N 0 , we can take ǫ ′ = 0. Thus, val p (ub We factor the p ≥ N 0 part of ub −n as c −1 d, where val p (d) = 0 iff either p ∤ b or val p (ub −n ) = 1 if p|b.Further, in these cases, we set val p (d) = val p (ub −n ).The previous paragraph shows that d is a squarefree integer and that val p (c) ≤ h.
We now explain the condition ( * ).This comes from the fact that A and B are required to be integers.For any p: where K 1 is a positive constant that can be taken to be 0 for p ≫ 0. Similarly, for B, we get that: val p (u 3 g(t)) = 3 val p (q) + K 1 + s val p (a) − 3 val p (c).Since q is N 0 -smooth, for p large enough, the condition of integrality of A and B translates directly to condition ( * ).Further, since we are only interested in an upper bound for the asymptotic growth, not imposing conditions on say, 2 val p (q) + K 1 for small p causes us no harm.Now consider (u, t) ∈ S 1 (X) and write them as in Lemma 3.5.The fact that max{|A| 3 , B 2 } < X implies bounds for a, b, c and d, which we now find.Lemma 3.6.Let (u, t) ∈ S 1 (X).Represent u = qc −1 db n and t = ab −m as in Lemma 3.5.Then, Let K be the positive constant such that max(|f (t)| 1/2 , |g(t)| 1/3 ) > K for all t.Thus: |u| ≤ K −1 X 1/6 .Let M 2 = K −1 (max q∈Q |q| −1 ).Thus, we have that: We now turn to bounding a(= tb m ).Suppose t < 1.Then, by the above bound for b, we have |a| < M 2 X m/6n c m/n d −m/n .If t ≥ 1, then we can find a constant M > 0 such that M 2 |t| r ≤ |f (t)| and M 3 |t| s ≤ |g(t)|.Thus we have: and so, |c −1 da n/m | < M −1 (max q∈Q |q| −1 )X 1/6 .Thus, we see that: Lemma 3.7.Under the hypotheses of Theorem 3.2, |S 1 (X)| < ∼ X m+1/6n log(X).Proof.Fix a c > 1.Let S 1 (X; c) denote the set of all (a, b, d) ∈ Z 3 such that: (1) We will only consider the case m + 1 > n.Further, since m = n, the error term above just becomes . Summing over h + 1-th power-free c, with c < X α (for any α), since m+1 n − (w + 1) = −1, we have |S 1 (X)| < ∼ X (m+1)/6n log(X).
Lower Bound.The outline of the proof of the lower bound is as follows: we know that if (u, t) ∈ S 1 (X), then u and t have expressions as in Lemma 3.5.Instead of counting all of these, we only count ones of the form u = c −1 b n and t = ab −m , where a, b and c are within appropriate bounds.Let S 2 (X) be the set of such triples (a, b, c).There is a map S 2 (X) → S(X), and the bulk of the proof is in showing that this map has bounded fibers.We first form another intermediary set, which we call S 3 (X).We then describe maps S 2 (X) → S 3 (X) → S(X), and bound the fibers of these maps.This will enable us to find a lower bound for S(X) by finding one for S 2 (X) instead.
Since we only need a lower bound, observe that by changing u to M u for large enough M , we can assume that f (t), g(t) ∈ Z[t].For a triple (a, b, c) ∈ Z 3 , set u = c −1 b n and t = ab −m .Let A = u 2 f (t) and B = u 3 g(t).Fix some constant κ > 0. Define S 2 (X) to be: the set of triples (a, b, c) ∈ Z 3 such that: 0 (where A and B are as defined above).Note that if (a, b) ∈ S 2 (X), then for a suitable value of κ, we get (A, B) ∈ S(X), since , and similarly for B.
Notation: Define S 3 (X) ⊂ Z 2 to be the set of (A, B) ∈ Z 2 coming from S 2 (X).We then have a map from S 3 (X) → S( X The following lemma will help us bound the fibers of the map S 3 (X) → S(X).(a, b, c) ∈ S 2 (X), then gcd(A 3 , B 2 ) can be factored as (M D )β such that M D divides D and p|β =⇒ p|b.

Lemma 3.8. There exists a non-zero integer D (depending only on f and g) with the following property: if
Proof.We follow the same method of proof as in Harron and Snowden.Let (a, b) ∈ S 2 (X; c) and let p be a prime.Let M 1 be a constant such that |3 val p (f (t)) − 3r val p (t)| < M 1 and |2 val p (g(t)) − 2s val p (t)| < M 1 for all t ∈ Q with val p (t) < 0. Let M 2 be the constant for which min{3 val p (f (t)), 2 val p (g(t))} ≤ M 2 for all t ∈ Q with val p (t) ≥ 0. Note that max{M 1 , M 2 } is 0 for p ≫ 0 (specifically, p ≥ N 0 , as defined in Lemma 3.6).Now, consider the case where val p (t) < 0. In particular, p|b.Let val p (b) = k and let val p (a) = l(< m).We then have: where |ǫ| < M 1 and |δ| < M 1 .Let M 0 = min{3rm, 2sm}.Let and take D = p≤N0 p ep .This proves the lemma.
Remark 3.3.We find that D is N 0 -smooth and β consists of p|b for p ≥ N 0 .It is crucial that D, M 1 , M 2 and M 0 do not depend on (a, b, c) in any way.They only depend on f and g.
We now use this lemma to to bound the fibers of S 3 (X) → S(X) in our case of interest, namely when: min{3rm − 6h, 2sm − 6h} ≤ 6.
Lemma 3.9.There exists a constant N such that the size of the fibers of S 3 (X) → S(X) is bounded by N .
Proof.The fiber over a point (A Thus for any (A, B) ∈ S 3 (X), the size of the fiber above the pair is bounded above by the number of 12th powers dividing gcd(|A| 3 , B 2 ).We show that this is exactly the number of 12th powers dividing D from Lemma 3.8, i.e. no 12th powers divide β.
with torsion point (0, 0) [Kub76, Table 3].The j-invariant of the universal family is This gives exactly eight values of v producing a curve of j-invariant 0. An explicit computation with the torsion point confirms that the stacky points correspond to the roots of v 2 −v+1, which are defined over Q( √ −3).
If N = 8, then recall from §2 that the choice of index 2 subgroup of (Z/N Z) × is not unique, and we choose one that works for us.That is, write (Z/8Z) × = {P, 3P, −3P, P } and say we chose the subgroup {P, 3P }, so that fiber above (E, C) consists of the points (E, {P, 3P }) and (E, {−P, −3P }).Neither pair has extra automorphisms.If N = 5.Then C = {P, 2P, −2P, −1P }, which has a unique index 2 subgroup: {P, −P }.Thus the fiber above (E, C) has two points: (E, {P, −P }) and (E, {2P, −2P }).Each of these points still has the automorphism [−1].This proves the theorem for N = 5.What this proposition tells us is that if N ∈ {3, 4, 6, 7, 8, 9, 12, 16, 18}, then there is an open sub-stack U of X 1/2 (N ) that is isomorphic to a scheme.Therefore U(Q) can be parametrized via the universal over U.For N ∈ {4, 6, 8, 9, 12, 16, 18}, the non-stacky locus contains Y 1/2 (N ), and thus there exist f N and g N ∈ Q[t] coprime such that every elliptic curve arising from a rational point on Y 1/2 (N ) is isomorphic to one of the form: E N,t : ).Thus, by Proposition 2.2, we have the following: To find the asymptotic growth for N (N, X) in these cases, we use Proposition 3.1 to find the value of h N (X), given in Table 2 below.
For N = 3, the situation is slightly different.X 1/2 (3) = X 1 (3) has one stacky point lying above the elliptic curve with j-invariant 0. Let Φ 3 : X 1/2 (3) → X (1) be the usual forgetful map.Set Y = Y 1/2 (3)\φ −1 3 ({j = 0}).Then, for a suitable embedding of Y ֒→ A 1 , there is a universal family E 3,t over Y (e.g.see [HS17]) given by: Every elliptic curve with non zero j-invariant and a rational 3-torsion point is isomorphic to one of the above form for some t ∈ Q.However, this family does not extend to a universal family over t = 1/6.Indeed E 3,1/6 is given by y 2 = x 3 − 1 108 and its torsion subgroup of order 3 is generated by the rational point: (1/3, 1/6).On the other hand, all curves E D : y 2 = x 3 + D 2 , D ∈ Q contain the rational 3 torsion point (0, D) and have j-invariant 0, but none of them is isomorphic to E 3,1/6 over Q.For this reason, we separate our counting function into two pieces: By Theorem 3.2, we have the following proposition: Proposition 4.2.Maintaining the notation as above, In order to find the asymptotics for N (3, X) j=0 , we observe the following: by Lemma 3.4 in [HS17], we know that any elliptic curve that has j-invariant 0, a rational 3 torsion point, but is not of the form E 3,t for any t ∈ Q, admits an equation of the form y 2 = x 3 + D 2 , D ∈ Z. Thus the curves missing from our count are those that are quadratic twists of these exceptional curves.That is, they are elliptic curves of the form: for some u, t ∈ Q with u 3 t 2 integral and minimal.This is the same as counting elliptic curves y 2 = x 3 + b, with b 2 < X and b 6th power free.This number is just a constant times X 1/2 .Remark 4.1.Note that our result agrees with that in [PPV20].In fact the argument for N (3, X) j=0 is exactly the same as in their paper, albeit stated slightly differently.
To complete the proof of the main theorem, for each N we need only calculate r, s, m and n in the notation of Proposition 3.1 and Theorem 3.2.In Table 2, we give the components required to compute r and s in each of the cases of interest (in the notation of the above theorems).

Values of invariants
Remark 4.2.We now explain the reason for the omission of X 0 (7) from our asymptotics.Our general strategy of counting points on X 0 (7) by counting quadratic twists of points on X 1/2 (7) still makes sense.However, the universal family that we obtain for the subscheme of X 1/2 (7) is a little bit worse for counting.More precisely, let Y denote the largest substack of X 1/2 (7) that is isomorphic to a scheme and doesn't contain any cusps.Then there exist f and g ∈ Q[t] such that for any E coming from Y (Q), E is isomorphic to an elliptic curve of the form y 2 = x 3 + f (t)x + g(t) for some t ∈ Q.However, f and g are not coprime.For instance, if t was taken to be the hauptmoduln (η 1 /η 7 ) 4 , then f and g would have a common factor of t 2 + 13t + 49.One might wonder if this might be resolved choosing f and g cleverly, but that is not the case.This is an artifact of X 1/2 (7) having two stacky points, neither of which is rational, which makes it impossible to move the lack of semistability to ∞ ∈ P 1 .

Counting points of bounded height on stacks
In this section, we prove Theorem 1.1 for N = 2, 3, 4, 5, 6, 8, 9 by using results from [ESZ20].As we have seen, one can define some height on X 0 (N ), namely the naive height.The question is does this height come from geometry?We know that this is true for modular curves that are schemes (see §2.1) -the naive height is the height with respect to the twelfth power of the Hodge bundle.It follows from the work in [ESZ20] that the same is true for moduli stacks of elliptic curves, and we use their machinery to count the number of points of bounded height.Before we proceed, we must set some notation: Notation 3. Recall that we use ht(E) for the naive height of a point E on any modular curve.Let X be a stack and V a vector bundle on it.We will let ht V denote the logarithmic height with respect to V as defined in [ESZ20] and Ht V the multiplicative height corresponding to it.That is to say, Ht V = exp(ht V ).
We will not define ht V here, but we will use the fact that if V = λ ⊗12 on X 0 (N ), then for an elliptic curve E corresponding to a rational point x : Spec Q → X 0 (N ), log ht(E) = ht V (x) + O(1) (see Example 5.1 below).Thus our counting function satisfies λ (x) < X}. 5.1.Computing heights on stacks.Throughout this subsection, X will be a proper Artin stack over Spec Z with finite diagonal.A Q-rational point x of X is a map x : Spec Q → X .Let V be a vector bundle on X .Consider for a moment the special case where X = X, a proper scheme, and V is an ample line bundle on it.When computing the height of a point on X, we use a power of V to embed X ֒→ P n for some n, and then use the naive height of the image of the point on P n .This makes computations easier.For a stack, the analogue would be mapping it into weighted projective space.In [ESZ20], the authors show that this works.We recall the specific result below.
Consider the special case where V is a metrized line bundle L (see [ESZ20] for precise definition).Suppose s 1 , s 2 , . . ., s k are sections of L.Then, L is said to be generically globally generated by s 1 , . . ., s k if the cokernel of the corresponding morphism O ⊕k X → L vanishes over the generic point of Spec Z.In particular, this implies that the cokernel is supported at finitely many places.
Proposition 5.1 ([ESZ20], Proposition 2.27).Let X be a stack over Spec Z, let L be a line bundle on X such that L ⊗n is generically globally generated by sections s 1 , s 2 • • • s k .Let x : Spec Q → X and for each i, let x i = x * (s i ) (after picking an identification of x * L with Q).Scale x 1 , . . ., x k so that each x i ∈ Z and for every prime p, there is some x i such that v p (x i ) < n.Then where | • | is the usual archimedean absolute value.
Note here that we have only stated the version of the that we require, i.e. for Spec Q and Spec Z.A more general version of this proposition holds for other global fields.We will say that the tuple (x 1 , . . .x k ) ∈ Z k is minimal if it satisfies the last condition in the theorem: for each prime p, there is some i ∈ {1 . . .k} such that p n ∤ x i .
Example 5.1.Let L = λ, the Hodge bundle on X (1).Then the global sections of λ ⊗12 are weight 12 modular forms, and it is a classical fact that the Eisenstein series E 3 4 , E 2 6 generically globally generate λ ⊗12 .An elliptic curve E : The assumption about scaling the sections corresponds to choosing a minimal Weierstrass equation for E. Proposition 5.1 then says that which is, up to the constant O(1), a twelfth of the logarithmic naive height of E. Thus, Ht 12 λ (x) is a constant multiple of the naive height ht(E) as defined in §1.5.2.The ring of modular forms of low level.Since modular forms are sections of powers of the Hodge bundle, we will rely on the structure of the rings of modular forms of X 0 (N ) quite heavily.This subsection summarizes part of the work of Hayato and Tomohiko in [HT11].Notation 4. Let M k (N ) denote the space of modular forms for Γ 0 (N ) of weight k.We let M (N ) = k M k (N ) be the entire ring of modular forms for Γ 0 (N ).
• E k : classical Eisenstein series of weight k, normalized to have constant coefficient equal to 1.Note that E k ∈ M k (1) for k ≥ 4. • For a modular form f and an integer h, let f • For certain d ∈ Z >0 , Hayato and Tomohiko define modular forms α d and β d .We refer the reader to [HT11] for the precise definitions, since we do not use them.The crucial properties of these modular forms that we use are their weight, level and the fact that they have integral coefficients.
We obtain the upper bound by counting integer triples (a, b, c) without the minimality condition ( †).Equation 6 can be rearranged to one of the form: For any integer n, let r 2 (n) denote the number of ways of writing an integer as a sum of two squares.An upper bound can be proved by summing r 2 (a 4 ) over all a < X 1/6 .Lemma 5.7 ([Bei66], Chapter XV).Let n ∈ Z >0 have factorization n = 2 a0 p e1 1 . . .p er r q 2f1 1 q 2f2 2 . . .q 2fs s , where the p i 's are ≡ 1 mod 4 and the q i 's are ≡ 3 mod 4. Define B(n) = r i=1 (e i + 1).Then: r 2 (n) = 4B(n) Remark 5.1.This is a well known result.Note that the constant in front of B is different depending on whether one takes into account signs and order.But this will not make a difference to our result, since we are only interested in the asymptotic growth rate.
Proposition 5.8.Maintaining the above notation, there is a constant c > 0 such that for any 0 < δ < 1/6, Proof.Consider the Dirichlet series: n≥1 B (4) (n) n s .By multiplicativity, this can be written as the Euler product: We now simplify this expression.
Proof.The main ingredient here is the upper bound proved in Proposition 5.8.To refine this to give an asymptotic growth rate, we must count only the minimal (a, b, c).If a triple is non-minimal, then there exists a prime p such that p 2 |a, p 4 |b and p 4 |c.Let p be such a prime.Then the number of such triples is in bijection with the number of ways of writing a 4 as a sum of two squares, say a 4 = A 2 + B 2 , such that p 4 |A and p 4 |B.This is the same as the number of ways of writing (a/p 2 ) 4 as a sum of two squares.Therefore the number of triples that are non-minimal at p can be calculated by: |n|<X 1/6 /p 2 B (4) (n).By Proposition 5.8, this has the same asymptotic growth rate as: c X 1/6 p 2 log X p 12 2 = (c/p 2 )X 1/6 (log(X) 2 − 2 log(X) log(p 12 ) + log(p 12 ) 2 ) = cX 1/6 log(X) 2 1 − 1 p 2 − 24 log(X) −1 log(p) p 2 + 144 log(X) −2 log(p) 2 p 2 , where c is independent of p.This leaves us to examine the product 1 − 1 p 2 − 24 log(X) −1 log(p) p 2 + 144 log(X) −2 log(p) 2 p 2 .
This product is bounded both above and below by positive constants.One can see this by noting that each term is bounded below by 1 − 3/p 2 and above by 1 − 1/p 2 .The proposition follows.

Open questions
This paper raises multiple questions, some that we believe can be answered by pushing further the methods used here, and some that require different approaches.The first question is about X 0 (7).We believe that the ideas of §2 and §3 can be generalized to count points on X 0 (7), since X 1/2 (7) is a stacky curve with two stacky points.In this case, one must generalize Proposition 3.1 to the case where f and g are not necessarily coprime.The tricky bit here turns out to be the analogue of Lemma 3.4.
One might wonder whether one can count rational points on X 0 (7) via the framework in [ESZ20], as we did for some values of N in §5.The issue with this is that for each level not listed in Table 3, the ring of modular forms is quite complicated.Using relations between the generators of these rings to count points on X 0 (N ) can lead to very hard counting problems.For instance, the problem of counting rational points on X 0 (7) can be rephrased in terms of counting integral points on the intersection of one cubic and two quadric hypersurfaces in A 5 .This gets more complicated with higher N , at least as far using the description in [HT11] goes.For these higher N , if one were to find a smaller set of modular forms that could both globally generate λ ⊗12 and had simpler relations among them, then one could perhaps count points on the corresponding X 0 (N ) more easily.We do not know at this time if that is indeed possible.
There is of course the question of an exact asymptotic as opposed to an asymptotic growth rate.More precisely, one can ask if the limit: exists and what its values is.The case N = 2 is known due to [HS17], N = 3 due to [PPV20] and N = 4 due to [PS20].It would be interesting to calculate the values for other N .
The stacky Batyrev-Manin-Malle conjecture.For a scheme X and an ample line bundle L on it, the Batyrev-Manin conjecture predicts that there are constants a(L) and b(L) such that the number of rational points of X of height bounded by a number B grows like: B a(L) log(B) b (L) .
Here the height refers to the height with respect to the line bundle L. The weaker analogue states that the number of rational points should grow like B a(L)+ǫ .In [ESZ20], the authors make a similar conjecture for stacks, which they call the 'Weak stacky Batyrev-Manin-Malle conjecture'.For each of the modular curves considered in this paper, as well as those in [HS17], the asymptotic growth rate seems to be of the same form as predicted, but it would be interesting to verify if the constants match the constants in [ESZ20].This is work in progress.