Considering Fluctuation Energy as a Measure of Gyrokinetic Turbulence

In gyrokinetic theory there are two quadratic measures of fluctuation energy, left invariant under nonlinear interactions, that constrain the turbulence. The recent work of Plunk and Tatsuno [Phys. Rev. Lett. 106, 165003 (2011)] reported on the novel consequences that this constraint has on the direction and locality of spectral energy transfer. This paper builds on that work. We provide detailed analysis in support of the results of Plunk and Tatsuno but also significantly broaden the scope and use additional methods to address the problem of energy transfer. The perspective taken here is that the fluctuation energies are not merely formal invariants of an idealized model (two-dimensional gyrokinetics) but are general measures of gyrokinetic turbulence, i.e. quantities that can be used to predict the behavior of the turbulence. Though many open questions remain, this paper collects evidence in favor of this perspective by demonstrating in several contexts that constrained spectral energy transfer governs the dynamics.


Introduction
The free energy (also referred to as incremental or perturbed entropy) has been identified by many authors as an important measure of fluctuations in gyrokinetic turbulence. Collisional dissipation of this quantity is known to be a necessary feature of the true steady state for gyrokinetic turbulence [3]. It enters standard expressions for entropy balance and is directly connected to transport [4]. Its usefulness is identified by Hallatschek [5], who calls it the "generalized grand canonical potential." It is the central quantity in the free energy (or entropy) cascade [6,7,8,9,2,10,11] and is also used in a non-cascade theory [12] that describes gyrokinetic turbulence as a process of both injection and decay on the same scale.
Studies of two-dimensional gyrokinetics [6,2] have focused on another quantity that is distinct from free energy but is also a fundamental measure of fluctuations. It measures the intensity of fluctuations in the electromagnetic fields and is conserved under nonlinear interactions (Candy and Waltz [13] call it "field energy"). In the electrostatic approximation, we refer to it as electrostatic energy.
The nonlinear conservation of free energy and electrostatic energy, both being quadratic measures of fluctuation intensity, constitutes a constraint on nonlinear interactions in electrostatic gyrokinetics. It is the goal of this paper to study the implications of this constraint. We build on the recent publication of Plunk and Tatsuno [1], although the scope of the present work is quite a bit broader. As with Plunk and Tatsuno [1], we make repeated reference to the theory of Fjørtoft [14] and its generalization to gyrokinetic plasmas. This theory is simple, being essentially just the careful accounting of the conservation laws imposed on nonlinear energy transfer between different modes of the system. We wish to answer the basic question "how will the energy of fluctuations redistribute spectrally from a given initial state?" We believe that the answer to this question can lead to insight into the broader question: "How can one predict the features of fully developed gyrokinetic turbulence?" Predicting the features of turbulent states (i.e. statistical properties such as saturation amplitudes, frequency and wavenumber spectra, transport fluxes, etc.) is a basic goal in studies of gyrokinetic turbulence. Ideally, one would like to identify generic measures that are useful for various instability drives, magnetic geometries and basic plasma parameters. The present work identifies a predictor, namely the spectral distribution of free energy. We will give evidence that a single measure of this distribution, the ratio of free energy to electrostatic energy, seems by itself to be a fairly good predictor of the direction of the nonlinear energy transfer.
We focus on a minimal form of gyrokinetics that retains nonlinear interactions but neglects other effects such as linear instability, collisionless damping, and wave phenomena.
This is the two-dimensional electrostatic gyrokinetic system in a homogeneous background. It has been argued before that two-dimensional gyrokinetics is a paradigm for kinetic magnetized plasma turbulence [2] and that the tendency toward nonlinear transfer exhibited by this system should be retained in the full three-dimensional system [1]. This second assertion is a conjecture. However, there is a precise feature of gyrokinetics that motivates it. The peculiar spectral transfer that occurs in two-dimensional ideal fluids, leading e.g. to self-organization into large vortices and inverse cascade, can be traced to the existence of global integrals that are conserved under nonlinear interactions in two dimensions, but not in three dimensions. The nonlinearity of gyrokinetics takes the same form in both two and three dimensions, and for this reason the nonlinear invariants of two-dimensional gyrokinetics are retained in three dimensions -that is, they continue to be conserved under nonlinear interactions. In particular, these quantities are preserved separately for each drift plane, (the plane perpendicular to the magnetic field).

Overview and results
The structure of the paper is as follows. Section 2 discusses the basic assumptions and then provides the definitions and equations that will be needed throughout the paper. In Section 2.3 we discuss energy generally in gyrokinetics. This material only serves as background for the remainder of the paper but should be interesting to a broad audience as the subject of energy and conservation laws in gyrokinetic theory is a matter of longstanding debate. We note that although the parallel nonlinearity introduces a non-trivial energy invariant (closely related to the electrostatic energy on which we focus) this quantity is not suitable as a measure of fluctuation energy as it is not quadratic in fluctuations and so does not provide a useful constraint on spectral energy transfer. We then demonstrate how the quantity we call electrostatic energy enters in "physical" energy balance, showing that although the energy of the electric field is negligible in a non-relativistic plasma, changes in the electrostatic energy do track the flow of physical kinetic energy between the parallel and perpendicular directions. Section 3 is concerned with the spectral representation for the velocity-space dependence of the distribution function. Because we put considerable focus on spectral energy transfer, the spectral theory is a very central element to the work and we provide as much detail as possible, without sparing technical aspects. We find that the basic requirement that finite free energy be finite significantly constrains the space of allowable distribution functions and enables us to derive an appropriate set of basis functions. Establishing the space of functions also leads to limits on the ratio of free energy to electrostatic energy. We call this ratio the "κ factor." It is a quantity that enters frequently in the following analysis and so the rigorous bound proves useful.
In Section 4 we present Fjørtoft's theory, generalized to gyrokinetics. This is in line with the analysis presented in Plunk and Tatsuno [1] but the scope is broadened to include scales above and below the Larmor radius and also the case of a modified adiabatic response that leads to preferential generation of zonal flows. In Section 4.1 we first review the elementary three-scale transitions considered by Plunk and Tatsuno [1]. In Section 4.2 we generalize the arguments to energy transfer involving an arbitrary number of scales, which is an important extension for application to turbulence. The point of Section 4.1 and Section 4.2 is to establish precise limits on the spectral redistribution of electrostatic energy subject to a fixed amount of free energy. A quantity that naturally arises in the derivation is the κ factor, which we identify as a predictor of nonlinear-transfer direction. The strength of these derivations lies in their simplicity and freedom from assumptions. The weakness is that the result only establishes constraints and limits on how the energy transfer may proceed. In Section 4.3 we extend these results with additional arguments to make a prediction of cascade direction. We review the arguments of Plunk and Tatsuno [1] and also discuss our expectations for scales above the Larmor radius.
In Section 5 we turn our attention to scales larger than the Larmor radius, giving special attention now to a modified Boltzmann or adiabatic response, which captures the enhanced role of zonal flows in closed flux surface geometry. (Note that we do not introduce any non-uniformity in the background but simply represent the "flux surface" average as an average in the y-direction.) We make some general comments about the long-wavelength limit and then proceed to the case of the modified response. This presents an opportunity to compare the so-called generalized Hasegawa-Mima (GHM) equation with gyrokinetics and in particular contrast the role of inverse cascade as a mechanism for zonal flow regulation. We show that the GHM equation generates zonal flows by inverse cascade but that the appearance of finite Larmor radius (FLR) effects in gyrokinetics can drastically modify the nonlinear behavior allowing for both quiescent states (in the absence of collisional dissipation) composed only of stationary zonal flows and states of suppressed zonal flows characterized by large fluctuations.
In Section 5.2 we investigate a simple two-field gyrofluid model, derived from the two-dimensional gyrokinetic equation but with the addition of ad-hoc linear drive terms. We discuss the derivation of this model in detail in Appendix D and conclude that the long wavelength limit k 2 ρ 2 1 (with the modified Boltzmann electron response) is a singular limit, casting doubt on the (quantitative) validity of any gyrofluid model that relies on a small argument expansion of the Bessel function. Nevertheless, we argue that such simple models can be used to study qualitative behavior if finite Larmor radius (FLR) terms are retained at sufficient order. We perform direct numerical simulations of these gyrofluid equations, and find a nonlinear critical transition associated with the presence of the relative amplitude of the temperature fluctuations. We interpret the results in terms of our generalized Fjørtoft theory: What might otherwise be identified as a reversal in the sign of the turbulent viscosity on the zonal flows, we interpret as a cascade reversal induced by the increase of the κ factor of the unstable eigenmodes.
In Section 5.2 we investigate spectral transfer by linearizing the gyrokinetic equation about a monochromatic initial condition. We modify the theory of Plunk [15] to allow for arbitrary κ factor of the initial condition, κ p . We consider three cases: (1) the sub Larmor scales with zero response (for comparison with Plunk and Tatsuno [1]) (2) super Larmor scales assuming a standard Boltzmann response and (3) super Larmor scales assuming a modified response. For this final case we assume a zonal mode initial condition (this problem was previously investigated by Rogers et al. [16] and the instability termed "tertiary instability"). In each case we find a transition associated with the parameter κ p , which we interpret in terms of the Fjørtoft theory developed in Section 4. Previously reported properties of this tertiary instability are given a new explanation in terms of basic considerations of mode coupling and energy conservation.
In Section 7 we present previously unpublished results of nonlinear gyrokinetic simulations corresponding to the numerical runs of Plunk and Tatsuno [1]. We compute the shell-filtered spectral transfer function, which is based on a standard summation of the elementary triad interaction terms used in computations of neutral fluid turbulence. The results demonstrate that the spectral evolution of free energy observed by Plunk and Tatsuno [1] was due to nonlinear interaction and clearly show the signatures of the three modes of spectral transfer: (1) nonlocal inverse transfer, (2) local inverse transfer and (3) forward transfer.
In Section 8 we discuss the results of this paper and point out possible applications and future work. We argue that two-dimensional gyrokinetics is relevant to driven systems with nontrivial magnetic geometry and point to recent evidence in support of this view.

The Larmor scale
For a given particle species, we assume there is a characteristic velocity associated with fluctuations in the distribution function. We take the characteristic velocity to be the thermal velocity of the background distribution, v th = T /m. We make this assumption by conjecture, but argue as follows that it is reasonable: First, we require on physical grounds that the free energy of the fluctuations (henceforth referred to simply as the free energy) be a finite quantity. Therefore the perturbed distribution function must fall off in velocity space reasonably fast as compared to the background distribution function, which itself falls off on the scale of v th . (At minimum, finite free energy requires |δf | 2 exp(v 2 /(2v 2 th )) → 0 as v → ∞.) Also note that the drive terms in the gyrokinetic equation are proportional to the background distribution function (i.e. they are "Maxwellian-like") and thus so are the unstable linear eigenmodes modes, which stimulate the turbulence. Although the distribution function will evolve (by nonlinear interaction) away from the form of linear modes, the distribution function should fall off in velocity space in a manner that is consistent with Maxwellian-like source terms. Indeed, modification of the distribution function by the nonlinear term is done by the incompressible flow of gyrocenters around the domain, which leaves the density of gyrocenters unchanged along the motion of the flow. This implies that a distribution function evolving only under the influence of the nonlinearity, must retain the amplitude set by an initial condition, locally in velocity space. For example, consider the motion of a collection of tracer particles, representing positive and negative gyrocenter density, distributed initially in velocity (a) Particle motion.
(b) Evolution of local distribution function. Figure 1: Advection of passive particles by gyro-averaged E ×B motion due to a random static electrostatic potential: The process is depicted with (a) the motion of a sample of tracer particles and (b) initial and final distribution of tracer particles as measured at the central (red) point. The local gyrocenter distribution of tracers is initialized to be Maxwellian, with density proportional to local value of the electrostatic potential, which is illustrated by contour plot and colored level sets. Tracer particles which arrive simultaneously at the central red point are selectively plotted, with paths shown in white. The tracer distribution function at the central point is shown for initial and final times. Phase mixing evolves distribution of traces away from the initial Maxwellian, but a Maxwellian envelope is retained.
space as a local Maxwellian and having magnitudes proportional to the value of a random electrostatic potential. As depicted in Figure 1b, the tracer distribution function will develop structure in velocity space as particles are advected by E × B motion, but an envelope is retained that reflects the initially local Maxwellian shape. Thus, we expect that for turbulence driven by "Maxwellian-like" source terms, the distribution function will attenuate in velocity space on a scale comparable to the Maxwellian.
The assumption of the characteristic velocity scale v th implies a corresponding spatial scale, the thermal Larmor radius ρ. This scale marks the transition between two asymptotic regimes, the sub-Larmor and super-Larmor ranges. We will give some results that apply in both regimes, and also study them separately.

Normalized Equations and Definitions
Let us consider a two-dimensional slab geometry, where fields vary only in the direction perpendicular to a mean magnetic field, which is itself uniform and points in theẑdirection. Wherever integration over velocity space is present, integration over the velocity component parallel to the guide field v is implied; the same goes for integration over position space where integration over z is implied. We consider a single species to be kinetic, with the second species satisfying a simple adiabatic response model. Thus the regime of applicability is limited to scale ranges where an adiabatic response is valid to reduce the dynamics to a single kinetic species, that is k ∼ ρ −1 i . Henceforth, we use the normalized variables and notation of Plunk et al. [2].
T /m) is the normalized perpendicular velocity and the normalized wavenumber is k ⊥ ρ → k where thermal Larmor radius is ρ ≡ v th /Ω c and Ω c = qB/m. The two-dimensional gyrokinetic equation in nondimensional form is written as follows in terms of the gyrocenter distribution function g(R, v, t), where R =xX +ŷY is the gyrocenter position: where the Poisson bracket is {a, b} =ẑ × ∇a · ∇b and the gyro-average is defined A(r) R = 1 2π 2π 0 dϑA(R + ρ(ϑ)), where the Larmor radius vector is ρ(ϑ) =ẑ × v = v ⊥ (ŷ cos ϑ −x sin ϑ) and ϑ is the gyro-angle. (Note also that the quantity inside of the collision operator is h = g + ϕ R F 0 .) We leave the collision operator unspecified, other than noting that it is in general an integrodifferential operator (with appropriate conservation properties) that acts to smooth phase-space structure of the distribution function, evolving the plasma to thermodynamic equilibrium. Quasi-neutrality, under the assumption of an adiabatic response, yields the electrostatic potential ϕ(r, t), where r =xx +ŷy is the position-space coordinate: where the angle average is defined A(R) r = 1 2π 2π 0 dϑA(r − ρ(ϑ)), and the term T ϕ is the adiabatic density response. The constant T is typically the temperature ratio of the background plasma, but can also be taken as an operator to capture modifications to the conventional Boltzmann response. The operator where I 0 is the zeroth-order modified Bessel function. Note that this is the only explicit place (other than the collision operator) where the background distribution function enters the theory. Indeed, the k-dependence ofΓ 0 contributes to the transition in the physics of the turbulence across the Larmor scale. However, to satisfy finite (perturbed) entropy, as discussed later, the distribution function g also must have dependence on the characteristic scale of the background v th , which imposes a characteristic spatial scale, the thermal Larmor radius ρ = v th /Ω c . Using Equation (3), we can write quasineutrality in Fourier space: where β = 2π We use the term "nonlinear invariant" to mean a quantity that is conserved under the sole action of the nonlinearity, i.e. in the absence of collisions, linear instability or linear collisions damping. These quantities are exact invariants of the collisionless twodimensional gyrokinetic system with a homogeneous equilibrium (uniform background Maxwellian and uniform guide magnetic field). The kinetic free energy is a nonlinear invariant, as is any suitably weighted velocity integral vdvG(v)w(v), where w(v) is a weighting function; indeed, the volume average of any power of g is a nonlinear invariant but we focus here on quadratic invariants. The electrostatic energy is also invariant under nonlinear interactions, The presence of these two quadratic invariants establishes the analogy with twodimensional fluid turbulence that has inspired work of this and other papers. Note that our arguments for the tendencies of spectral transfer, based on those of Fjørtoft, depend only on the existence of these invariants (and the spectral representations of the following section) but do not depend on details of the actual dynamics; indeed the gyrokinetic equation, Equation (1), is not needed for these arguments and did not even appear in Plunk and Tatsuno [1].

Energy in gyrokinetics
In this section we discuss the role of energy in gyrokinetic turbulence and explain why our choice is appropriate for the study of turbulence. By comparing different meanings of "energy" in gyrokinetics we also place our work in a broader context of gyrokinetic theory. The remaining parts of the paper do not depend on this discussion, but it is recommended for those readers who are generally interested in gyrokinetic conservation laws. We also recommend general discussions of energy by other authors such as those found in Refs. [7,5,17,18] We have identified two quantities, G and E, each which represent a kind of fluctuation energy, being quadratic measures of fluctuations. The quantity G is related to the free energy W , which is the traditional focus of what is called the turbulent "free energy cascade" (or entropy cascade) in the electrostatic limit (for purely electrostatic fluctuations, this quantity is proportional the perturbed or incremental entropy): where The electrostatic energy E is a nonlinear invariant of collisionless 2D gyrokinetics, but has more general meaning in gyrokinetics.
There are two general camps in gyrokinetic theory. One we will call Hamiltonian gyrokinetics (some use the terms Lagrangian or "modern" gyrokinetics) [19,20,21]. The other is sometimes referred to as "iterative gyrokinetics" [22] (since the asymptotic derivation is done by iterative procedure, order-by-order) and its extension to transport is called "multiscale" gyrokinetics [23,24,25,18] because it assumes scale separation additional to that of traditional gyrokinetic ordering. These approaches differ in the treatment of dissipation and conservation laws.
Multiscale gyrokinetics gives energy balance in the form of transport equations by extending the gyrokinetic derivation to an order higher than that needed to solve for fluctuations. Collisional dissipation is included at each order. Energy balance is derived in a straightforward fashion by taking the kinetic energy moment of the Vlasov-Maxwell equations in the phase-space variables of the particles, i.e. s d 3 v mv 2 2 (Df s /dt = r C[f s , f r ]), after which Poynting's theorem can be used to evaluate the exchange terms that describe flow between particle kinetic energy and electromagnetic field energy. (In the limit we consider, i.e. the non-relativistic electrostatic limit in the presence of a static guide field, this energy is only due to particle kinetic energy as the energy of the electric field is negligible.) This is not a procedure for deriving dynamical invariants of the system. It is a way to establish the balance of what, for lack of a better term, we will call "physical" energy (a quantity that is already known to be conserved by the collisional Vlasov-Maxwell system) under the application of the various ordering assumptions of gyrokinetics.
As is standard with Hamiltonian theories, Hamiltonian gyrokinetics includes collisional dissipation after the equations of motion have already been derived. In this approach, invariants (analagous to those of the Vlasov-Maxwell system) are derived in the absence of collisional effects and written in terms of gyrokinetic variables, i.e. the phase-space variables of quasi-particles. This leads to a pleasantly self-contained theory, but also one where familiar quantities such as energy and momentum take on a more abstract meaning. One might ask whether these quantities are the same as the "physical" conserved quantities of the Vlasov-Maxwell system, just simply written in different variables. This is not the case for the very simple example below.
Extending the homogeneous 2D gyrokinetic equation, Equation (1), into 3D (where g = g(R, z, v ⊥ , v )) we may write where we have retained the so called parallel-nonlinearity, which has an explicit factor of = ρ/L due to the normalization we have taken. In accordance with the asymptotic limit → 0, iterative gyrokinetics doesn't include this term at this order but includes it at higher order to correctly calculate the energy balance. Indeed, numerical studies confirm that this formally small quantity does not have a significant effect on the turbulence [26]. However, by including the term here, a non-trivial energy is conserved † To demonstrate that H is conserved, combine the v 2 /2 moment of Equation (10) with Equation (14). Note that despite the factor of −1 = L/ρ, these terms are of the same order if we take the spatial integral of g to vanish at dominant order (which we can as the evolution of this quantity is order ). Thus g must absorb part of δf 2 to enforce the conservation of H. Although a conserved quantity like H may be useful for numerical application, we note that it contains a term that is linear in fluctuations. It thus does not constitute a useful measure for constraining spectral energy transfer in the way quantities like the free energy and electrostatic energy do; see also Hallatschek [5] who discusses the value of measures that are quadratic in fluctuations. Also, it is worth noting that H is not the physical energy, which in the quasi-neutral approximation is just the kinetic energy of the particles. We can identify the terms involving ϕ in Equation (11) as the electrostatic energy E defined in Equation (7) for our 2D system. What is this quantity E? It may seem paradoxical that there is an energy associated with the electric potential, since the physical energy of the electric field (∝ |E| 2 ) of a non-relativistic plasma is negligible compared to the other contributions to energy (kinetic energy or energy of magnetic fluctuations, if included). But in gyrokinetics the electrostatic field does carry some kind of "energy." In what we've just presented, and generally in Hamiltonian gyrokinetics (cf. [28]), E is part of the total conserved energy and in that context it can be interpreted as the interaction energy of gyrocenters (or simply the gyrocenter potential energy), from which the E × B nonlinear term (and also higher order nonlinear terms) in the gyrokinetic equation originates.
Multiscale gyrokinetics also has something to say about energy conservation. It provides a framework for tracking irreversible and reversible energy flows on both †In absence of the parallel nonlinearity, the first order physical energy is trivially conserved 2 2 δf 1 . Note also that we consider only a very simple case of gyrokinetics and the subject of exact conservation laws for general δf gyrokinetics -that is, gyrokinetic theory that treats fluctuations separately -has been covered extensively in the recent work of Brizard [27]. turbulent and heating timescales. The electrostatic energy E can be interpreted in this context as well. We assume periodicity in position space (thus fluxes give no contribution in the global energy balance). A key observation is that, as a simple consequence of charge neutrality and continuity, the electrostatic field E = −∇ϕ can do no work on the particles: For this reason, the electrostatic work does not enter the global energy balance equation (nor does the electrostatic energy). However, separating out the part of energy balance parallel to the guide field (i.e., the mv 2 /2-moment of the Vlasov-Maxwell equation), one finds where E = −∂ z ϕ is the parallel electrostatic field and K is the parallel kinetic energy of the particles (summed over species) ‡. The work done by the parallel electric field also appears in the balance equation for E derived by multiplying the gyrokinetic equation by ϕ R , integrating over the velocity and spatial domains and summing over species: Note that evidence from driven [29] and decaying [30] simulations indicates that the collisional dissipation of E, written on the right-hand side of this equation, seems to be exceedingly weak in comparison to that of W §. On the other hand, the parallel work term is quite substantial in the overall energy balance as it must balance input (not included here) of electrostatic energy in stationary state. We expect this term to be positive, on average, when Landau damping of the electrostatic field is operating, but can also be negative in the presence of instability. Balancing the change in gyrokinetic electrostatic energy against the parallel work, we see by Equations (13) and (12) that "collisionless" damping of E must be accompanied by a simultaneous flow of (physical) kinetic energy between the parallel and perpendicular directions (acceleration of particles in the direction parallel to the magnetic field and deceleration of particles in the perpendicular direction). Since the electrostatic energy is quadratic in fluctuations, the kinetic energy that it balances with must appear at higher order in the distribution function, i.e. δf 2 fluctuations, or on timescales much longer than those of the turbulence. ‡The quantity K formally contains contributions from the bulk plasma (slow evolution of background) and second order fluctuations on the turbulent timescale, i.e. the fluctuations in δf 2 for which one traditionally does not solve. As the volume average of δf 1 is invariant, the total particle kinetic energy contributed at first order is invariant and generally taken to be zero. §The weak collisional damping of E is expected from the cascade theory of W because the amount of E that reaches the dissipation scale is asymptotically less than the amount of W , as the collision rate is sent to zero.
In other words, gyrokinetics shows a tendency toward anisotropization of the kinetic energy of the particles. And it is this anisotropization that provides a physical interpretation of both the gyrokinetic electrostatic energy E and the parallel nonlinearity. In terms of physical energy, the role of E is to track the higher order exchange of particle kinetic energy between the parallel and perpendicular components, as we've just described. Inclusion of the parallel nonlinearity forces this exchange to be accounted for in small fluctuations δg ∼ g, though it does not affect the conservation of physical energy, which is satisfied automatically in iterative gyrokinetics at each order in . ¶ Having put the energetics into context, let us now return to the two-dimensional system introduced in Section 2.2. We are interested in how this system nonlinearly redistributes energy among modes and so we will first need to establish some details of a spectral theory.

Discrete Spectral Representations
In Plunk et al. [2], a spectral formalism for velocity space was introduced based on the Hankel transform, which has a Bessel function as its kernel. This approach is mathematically convenient due to the frequent appearance of Bessel functions in gyrokinetics. However, physically realizable systems are finite, so the continuousvariable Fourier and Hankel transforms are, in some sense, more than is needed (but are useful sometimes for simplifying mathematical arguments). In this section we will present discrete-variable spectral representations for velocity space. In addition to aiding in theoretical arguments, these representations may prove useful for numerical solutions of the gyrokinetic equation.
In position space we assume finite volume, V = L 2 , where L is the size of the two-dimensional domain. The velocity space domain, however, is formally semi-infinite [0, ∞) but there are physical constraints, such as finite density and energy moments, which put definite limits on the distribution function. The constraint we are interested in is finite free energy, which implies W g0 ≡ vdvG(v)/F 0 (v) < ∞. This defines the space of functions in which g(v) must be contained. In Section 3.1 we will use this condition to construct a simple Bessel series representation of g(v). In Section 3.2, we will present another spectral representation based on orthogonal polynomials that span the space of functions g(v) having finite W g0 . We will arrange the notation so that either representation can be used with the subsequent results of the paper. ¶Now that we have examined the electrostatic case, the electromagnetic version of Equation (14) can be interpreted in the same manner by identifying work terms due to magnetic fluctuations in the parallel energy balance equation. Because the magnetic field does no work on particles, these work terms again function as anisotropic energy exchange.

Bessel Series
In Plunk and Tatsuno [1] we introduced a discrete spectral theory for velocity space using a Bessel series. It proved useful for studying sub-Larmor fluctuations, and gives an intuitive definition of scale due to its close relationship to the Fourier series. The Bessel-series representation relies on the approximation that the distribution function is zero above a cutoff in velocity space v cut . We justify this approximation, as noted above, by reasoning that if W g0 is finite then g(v) must fall off at least as fast as a Maxwellian at large velocity. Thus we take v cut 1, and argue that the error introduced should be negligible, falling off exponentially in v cut . In Section 3.2 we will introduce an exact spectral representation that does not assume a velocity cutoff.
The Bessel series is built on the orthogonality relation where λ n is the nth zero of J 0 (x). We use the unit step function Θ in the Bessel series expansion of g to provide the velocity cutoff explicitly: where p j = λ j /v cut . Now, taking the limit p j v cut 1 , we may simplify Equation (16) by evaluating J 1 in the coefficient of the expansion using the large argument expression J n (x) ≈ 2/(πx) cos(x − nπ/2 − π/4) and correspondingly λ j ≈ π(j + 3/4). Thus we obtain the form of the Bessel series used in Plunk and Tatsuno [1], Using Equation (17), the free energy W g1 = 2π ∞ 0 vdv G(v) can be written compactly as a summation over the spectral density in k-p space: A subtlety now arises. We require J 0 (kv cut ) = 0, so that quasi-neutrality, Equation (4), can be written in terms of a single component of the Bessel series. (The reason for this will become apparent in Section 3.3.) This, however, cannot be satisfied uniformly for all k with a single value of v cut . Thus, we actually need k-dependence in the This may be justified by confining attention to sub-Larmor scales k 1 and ordering p ∼ k. Alternately, we may assume v cut to be large enough to ensure p j v cut 1 for all p j of interest.
cutoff, v cut = v cut (k). However, we can ensure this dependence is weak by writing cut and δv cut is the smallest quantity that satisfies p j = k for some j * * . We argue that if the system we are studying is well approximated with a finite velocity cutoff, then it will remain well-approximated under small k-dependent adjustments to this cutoff velocity. Combining Equations (15) and (16) we find that quasi-neutrality becomes simplŷ Note that to obtain Equation (19) we need not modify the bounds of integration in Equation (2) because the step function in Equation (16) provides explicit velocity cutoff. From Equation (19) we can express the spectral density of electrostatic energy as follows Now, taking the limit k 1 and v (0) cut /v cut ≈ 1, and combining Equations (18) and (20) we can write which is equivalent to Equation (6) of Plunk and Tatsuno [1].

Representation Based on Orthogonal Polynomials
As noted above, physical realizations of the distribution g correspond to finite free energy, i.e. finite W g0 . Equivalently, the normalized distribution function g = g/F 0 must be a member of the weighted Hilbert space with inner product defined (g 1 , g 2 ) = ∞ 0 e −u g 1 (u)g 2 (u)du, where u = v 2 /2. That is, g has finite norm ||g || = (g , g ) < ∞. It follows that we can decompose g in a series of orthogonal functions that are a basis for this space. One choice is the Laguerre polynomials. However, we would also like a simple relationship between the distribution function and the electrostatic potential as provided by the Bessel series; see Equation (19). We can achieve this by constructing a set of orthogonal functions using the Bessel function J 0 (kv). The set is denoted where P is a normalized polynomial of degree n. We construct P is determined by requiring it has unit norm and is orthogonal to P (k) 0 ; the higher order polynomials P (k) n are then determined, in increasing order, by requiring orthogonality with lower order * * Since the period of oscillation of J 0 is roughly 2π, the quantity kv cut can be made a zero of J 0 with a magnitude of δv cut that is at most π/(2k). Thus, since v cut 1, we have δv cut v cut as long as k is not too small. If k 1, as assumed in Plunk and Tatsuno [1], then δv cut /v n−1 . In Appendix A we prove that the set G is complete. We may now expressĝ as a series in these functions. Defining u = v 2 /2 we havê and, recalling F 0 = e −u /2π, we have Plugging Equation (23) into Equation (2), we have another compact expression of quasineutrality,φ which implies (see Equation (7)) In addition to W g1 and W g0 , a third "free energy" quantity will be useful, which we denote W z because its role appears analogous to enstrophy (conventionally denoted Z) at both sub-Larmor and super-Larmor scales. For constant T , the following quantity is an invariant: Let us accordingly define W z (k, j) = W g0 (k, j) − T δ(j)E(k) so that W z (k, j) = W z . Then, because of the asymptotic form ofΓ 0 , Equation (B.3), it is easy to see that W z (k, j) ≈ W g0 (k, j) for k 1. However, the two quantities diverge at k 2 1.

The Fjørtoft Constraint
Many approaches are available for explaining energy flows in 2D fluids. Examples include absolute statistical equilibria of the ideal (inviscid) fluid [31,32,33,34,35], the dual cascade (including non-zero viscosity) [32,36,37] and variational methods that minimize enstrophy while conserving energy and other constants of motion [38]. Such approaches generally seek equilibrium solutions subject to some non-trivial assumptions. Impressively, precise predictions have been validated in experimental systems, though none universally due to the non-universal applicability of the fundamental assumptions, e.g. ergodicity or conservation laws. We argue that Fjørtoft's theory remains an attractive way to view tendencies of energy transfer because it lays bare the basic mechanism of constrained nonlinear energy transfer without additional assumptions. Supplemented with simple hypotheses, this constraint yields a quick and (mathematically) intuitive prediction of transfer direction. We are interested in the particular relationship between the spectral densities of free energy and electrostatic energy E. We call this relationship the "Fjørtoft constraint." This constraint affects the tendencies of spectral energy transfer. However, a spectral representation is non-unique, in the sense that we can choose (or invent) any one we like. Furthermore, each representation, along with each weighted integral of G(v), or free energy, yields a different relationship or constraint. Thus, to keep the discussion general let us define a generic "free energy" quantity W, which can represent any of those free energies defined, i.e. W g1 , W g0 or W z , each having been assigned an appropriate spectral representation. In Section 4 we will then be able to consider the constrained spectral transfer of E and this generic free energy W. † † We denote the spectral density of free energy as W(k, j), where j is an index for the velocity-space spectrum and j = 0 represents the density component. In the case of the Bessel representation this is the p = k component. We will also call this component the "density component," departing from the convention of Plunk and Tatsuno [1]. The previous terminology ("diagonal" component) was justified because this component is due to integration over velocity space with a factor of J 0 (kv) in the integrand (see Equation (4)), and so represents an effective velocity-space wavenumber p = k. However, our use of the generic spectral index j in this paper makes the new terminology more appropriate. With this general notation we may write the Fjørtoft constraint as This establishes a constraint on how nonlinear interactions can spectrally redistribute E and W simultaneously. We would like to understand precisely how it does this; this is a central goal of this paper. The asymptotic form of q at small and large scales affects the physics of the turbulence at those scales. We calculate this for the various spectral densities and collect the results here for reference. For the Bessel series, we are only interested in the k 1 limit. Re-arranging Equation (21) we have Evaluating W g0 (k, j) at j = 0 and taking the ratio with E(k), using Equation (5), we obtain the expression (30) † †The question may be raised, "why limit consideration to only one free energy quantity when there are an infinite number of choices?" There is no rigorous justification. However, we note that the full hierarchy of invariants is not retained under spectral truncation [39]. For instance the invariance of W g0 is retained under the formal assumption of a finite extent of the spectral domain [j min , j max ], but other measures of free energy are not. Thus W g0 can be considered "rugged" [35] and perhaps in some sense more important.
Note that unlike Equation (29), Equation (30) is valid at all scales. Using the asymptotic form ofΓ for small and large argument (see Equation (B.3)), we can obtain the following asymptotic forms of this ratio Note that at large scales, k 2 1, the ratio tends to a constant. By subtracting the part of W g0 that is proportional to E at small k, we can obtain a free energy that has non-trivial scaling q(k) at all k. This quantity is W z , defined already in Equation (27). We have which for k 2 1 implies W z (k, 0)/E(k) ≈ (1 + T )k 2 , and for k 1 the ratio becomes (1 + T ) √ 2πk as expected from Equation (31), since W z (k, 0) ≈ W g0 (k, 0) in that limit. It is comforting to note that at sub-Larmor scales, all of these definitions of q have linear scaling in k ‡ ‡, giving consistency across arbitrary definitions.

On the spectral redistribution of energy
The main object of study in this work is energy transfer by the nonlinear interaction via the quadratic nonlinearity of the gyrokinetic equation. Spectrally, this occurs by particular three-wave interactions. We will present numerical work in Section 7 that examines nonlinear transfer directly in terms of a sum of three wave interaction terms. We approach the problem another way in Section 6 by solving a linearization of the gyrokinetic equation directly to find unstable modes; these solutions are then examined to draw conclusions about spectral energy transfer. The question of energy transfer can also be approached in an "equation-free" sense by simply considering how energy transfer is constrained by the existence of the two invariants, W and E. This approach, initiated by Fjørtoft [14], is the subject of Section 4. We are not concerned here with how precisely the transfer occurs (in terms of three-wave interactions), but only with the quantities of energy transfered between different scales. We begin by revisiting the three-scale theory of Plunk and Tatsuno [1] and then in Section 4.2 we will generalize it to an arbitrary number of scales.

Three-scale transitions
In Plunk and Tatsuno [1], we considered energy transfer among three scales. We repeat this analysis here, but with some simple modifications. As mentioned we would like to keep this discussion general, so let us refer to the general quantity W, along with the general constraint given by Equation (28). Let us define a scale, denoted by the pair (q i , j i ), to be the collection of Fourier components having wavenumbers k satisfying q(k) = q i and velocity-space index j i . We keep the form of q(k) general but note that for the isotropic β(k) defined in Equation (5), there is a mapping q → k since q is a monotonic function of the modulus k. However, in order to handle anisotropic cascades (i.e. those which generate zonal flows; see Section 5.1), it is convenient to allow for anisotropic q.
We assume that, by some nonlinear interactions, some amount of W (and the corresponding amount of E) is transfered between three different scales, (q 1 , j 1 ), (q 2 , j 2 ) and (q 3 , j 3 ). We define the free energy at ( Conservation of W and E implies that the changes sum to zero: As depicted in Figure 2 there are three types of transitions possible . Without loss of generality we take q 1 < q 2 < q 3 ; for isotropic cases (that is, for β = β(|k|), these q correspond to k 1 < k 2 < k 3 , so q serves as proxy for k = |k|. In type I transitions (see Figure 2a) all three scales correspond to density components, j 1 = j 2 = j 3 = 0. In this case, transfers are constrained in precisely the same way as considered by Fjørtoft, with the substitution k 2 i → q i = q(k i ). We call these Fjørtoft-type transitions. From Equations (28), (33) and (34) we can obtain the expressions equivalent to those obtained in [14]: The quantities in parentheses are positive (because q 1 < q 2 < q 3 ). Thus, as noted by Fjørtoft, it is only the intermediate component, q 2 , which can be a source for both the two remaining components. Type II transitions (Figure 2b) involve two density components (j 1 = j 2 = 0 but j 3 = 0) and also lead to inverse transfer of E but only if the quantity of free energy transfered to the non-density component is positive. To see this, first note that ∆E 3 = 0 because j 3 = 0. Solving Equations (33) and (34), we find ∆E 1 = −∆E 2 and That is, any transfer of free energy from density to non-density components must be accompanied by a simultaneous inverse transfer of E (i.e., free energy, along j = 0, to scales of smaller q); note this is unlike the case of a transition involving three density components, where some non-zero amount of energy travels to both smaller and larger k Clearly, transitions involving only one density component (having non-zero energy change) are forbidden as they do not conserve E. Type III transitions (Figure 2c) involve no density components, so are unconstrained. Of course, actual turbulence involves a large number of scales simultaneously and in the next section, we will generalize these arguments to an arbitrary number of scales.

Transitions involving an arbitrary number of scales
Using Equation (28), we can derive a bound on the ratio of the invariants as follows. For fixed k we clearly have W(k) = j W(k, j) ≥ q(k)E(k), with equality only when all of the free energy is in the density component. Thus the ratio of the invariants κ is bounded below by the extreme case where all of the free energy W is in the density moment j = 0 (that is, W(k, j) = 0 for j = 0): The quantity on the right-hand side,q, is an energy-weighted average of the function q(k). Note that from the large-k asymptotic form of q(k) given in Equation (31),q is proportional to the energy centroid for fluctuations at sub-Larmor scales, i.e. for W g0 we haveq ≈k √ 2π(1 + T ) wherek = kE(k)/ E(k). We now consider redistribution of E and W involving an arbitrary number of scales. Let us assume an ordering q 1 < ... < q n < q n+1 < ... < q M , where q M is the maximum q. We have defined κ = W/E, which, being the ratio of two invariants, is also an invariant itself. In hydrodynamics, the analogue to this quantity is the enstrophy divided by the energy, which is equal to the energy-averaged squared wavenumber. The fact that q ∝ k at sub-Larmor scales suggests that κ may, analogously, be interpreted as something like a wavenumber there.
Let us define q N , which we will use to divide the system into small-and largewavenumber components. Our goal is to establish an upper bound on the fraction of energy that can be found above q N , as a function κ. We write E as a sum of large-and small-scale components We rewrite the free energy, separating the sum into the low-q j = 0 terms (domain , and the remainder of wavenumber pairs (domain P): Equation (41) can be rewritten as where q is defined to be the ratio of the first sum in Equation (41) to the first sum in Equation (40), and q is defined to be the ratio of the second sum in Equation (41) to the second sum in Equation (40). Combining Equations (40) and (42), we find Let's consider the consequences of Equation (43). It is easy to see that q <q < q (and so q < κ) and q ≤ q N < q . Thus we may infer that the maximum of F N is reached by taking q → q 1 and q → q N . Thus we obtain the bound From this result we conclude that the fraction of energy that can be found at large (effective) wavenumbers q > q N is limited by the parameter κ and becomes negligible with increasing q N . Note that although Equation (44) (and also Equation (43)) is valid for any value of κ, it is trivial if κ > q M since, by definition F N ≤ 1. On the other hand, this bound is strong for κ q M and indicates that the electrostatic energy content at large (effective) scales is preserved during dynamical evolution; it can be inferred also that if a turbulent cascade is driven with a sufficiently small κ, then E will cascade inversely (to smaller q).

Interpretation
Let us now discuss the implications that the expressions derived in the previous section have for the direction of spectral energy transfer. The original work of Fjørtoft, concerning two-dimensional fluid turbulence, is simple to interpret: Energy at a wavenumber k, if it is to be redistributed spectrally, must be transfered simultaneously to both smaller and larger wavenumbers. If a turbulent cascade develops, such as that proposed by Kraichnan [32] (with a driven stationary state characterized by a constant fluxes of energy/enstrophy through inertial scales larger and smaller than the injection scale) nearly all of the energy flux will be carried to large scales and nearly all of the enstrophy flux to small scales. This is a basic consequence of the constraint Z NS (k) = k 2 E NS (k) where E NS (k) and Z NS (k) are the spectral densities of energy and enstrophy respectively, for two-dimensional Navier-Stokes turbulence. Viscosity acting at large wavenumbers dissipates asymptotically more enstrophy than energy, due to the factor of k 2 , and so asymptotically more enstrophy must flow to large k in the steady state.
In fact a similar conclusion for gyrokinetics can be drawn from Equation (28). That is, the amount of free energy cascading to fine (collisional) scales must exceed that of E by at least a factor of q ∝ k so that we can expect E to be effectively conserved in the weakly collisional limit, even if the collisional dissipation of W tends to a non-zero constant in this limit.
In the case of gyrokinetics, the Fjørtoft constraint only governs transfer among components that contribute to particle density, i.e. density components. Thus, we can infer that Fjørtoft-type transitions (see Equations (35)-(36)) will promote inverse transfer of E, but it is not a priori obvious how the kinetic-type transitions, involving non-density components, will affect this tendency. From Equation (37), we can see that free energy transfer from density to non-density components will (further) cause inverse transfer of E, i.e. to components of smaller q. But it is also possible for free energy to flow in the opposite sense, from non-density to density components, inducing the reverse effect in the transfer of E.
In Section 4.2, we identified the ratio of free energy to electrostatic energy κ as a control parameter that limits spectral redistribution of E. The maximal fraction of free energy F N identified by Equation (44) corresponds to an extreme configuration where all the free energy is in the density components, all of the energy at large scales (small q) resides at the absolute largest scale available to the system q 1 and all of the energy at small scales (large q) is found just above the cutoff q N . This extreme distribution of energy is not likely to be spontaneously generated, and so we generally expect the fraction of E that will be transfered to small scales to be significantly smaller than the maximum of Equation (44). Generally, we will see that the parameter κ seems to be a good predictor of transfer direction. We argue that this is because it measures the relative distribution of free energy between density and non-density components and so it determines whether the kinetic type transitions (see Equation (37)) will occur in the positive or negative direction (i.e. positive or negative ∆W 3 ).
The limits set by the Fjørtoft constraint are clearly not sufficient to alone predict transfer direction. To complete the argument for transfer direction of E, a simple conjecture was made in Plunk and Tatsuno [1] that free energy that is initially concentrated in the density component about a single wavenumber k 0 will spontaneously "spread out" in k-p space. Because of the volume factor of k in the two-dimensional (k, p)-plane, we estimated that this would lead to a distribution of free energy W (k) ∼ kq(k)E(k) ∼ k 2 E(k) for k ∼ k 0 (this is also the expected scaling from entropy cascade theory [6,2]). Thus, for initial conditions composed of energy around k 0 , we identified the parameter κ/k 2 0 for the sub-Larmor range to delineate three regimes; see Section 6 and Section 7 for more details. When this quantity is much less than 1, we found that there is strong nonlocal inverse transfer of E. As κ/k 2 0 approached 1 the behavior changed to a more conventional local inverse cascade. For κ/k 2 0 > 1, a transition was found where the transfer completely changed directions.
In the present work we are also interested in what happens at k < 1. This is an important range for fusion plasmas, where instability at scales somewhat larger than the ion Larmor radius drive turbulence. Generally speaking, we expect that low-κ forcing should lead to inverse transfer of E while high-κ forcing should cause a cascade reversal. It is, however, not obvious what constitutes "high" and "low" κ at k < 1. We investigate this question in Section 5 and Section 6 and find evidence that the transition occurs at κ ∼ O(1).

Super-Larmor scales
At scales larger than the Larmor radius, nonlinear phase mixing loses potency. This is because the Larmor orbit of a typical particle does not allow it to sample much variation in the electrostatic potential, and nearly all particles move with approximately the same E×B velocity. Thus, nonlinear interactions take on a fluid character. However, generally speaking, the long-wavelength limit does not admit rigorous closures in the fluid moment hierarchy, and the phase-space cascade is preserved in some form. To what extent this cascade remains relevant is a subtle question that we will explore in this section and in Appendix D.
In a particular limit, 2D gyrokinetics reduces to a single-field fluid equation (the HM equation), which possesses a nonlinear invariant (enstrophy) additional to those exhibited by 2D gyrokinetics [2]. This fact warns us generally against applying gyrokinetic results directly to the long-wavelength regime without considering how the system is reduced in that limit. In the special HM limit, the non-density components of the distribution function (i.e. those which give zero particle density) are dynamically decoupled from the density components [2]. That is, the mechanism for transfer of free energy between density and non-density components is removed entirely. Thus W z (see Equation (27)) can be written as a sum of two quantities, one being the enstrophy, that are individually conserved.
For finite T , the components do not decouple and the non-density components can become dynamically important, controlling the spectral transfer of E (and, crucially, breaking the conservation of enstrophy that occurs in the HM limit). In taking moments of the gyrokinetic equation, one encounters an infinite hierarchy of equations (moment hierarchy) where the different fields are coupled by finite-Larmor radius effects (phase mixing terms). This fluid moment hierarchy is, of course, equivalent to the full gyrokinetic equation and no reduction is necessarily achieved this way. (However, viewing the system in this manner shows that the coupling between velocity components becomes a formally local process, in that the effect of nonlinear phase mixing is to couple neighboring moments in the moment hierarchy.) We will consider this moment hierarchy in detail in Appendix D in a special limit (modified Boltzmann/adiabatic electron response) where zonal flows are preferentially generated. But first, let us consider the form of the Fjørtoft constraint in the limit k 1. We have from Equation (32) that W z (k) ∝ k 2 E(k). It could be argued that the fact that k 2 appears in this relationship (instead of k in the sub-Larmor case) should strengthen the tendency toward inverse cascade of E at super-Larmor scales. Indeed, consider three-scale transitions involving a fixed set of wavenumbers k 1 < k 2 < k 3 . For Fjørtoft-type transitions, Equations (35)- (36) imply that the ratio of energy transfered inversely to that transfered forward is . However, though the stronger scaling of q at k 2 1 may promote inverse cascade of E, the weakening of phase mixing at k < 1 may act counter to this by suppressing transfer of free energy from density to non-density components. Thus, it is not a priori apparent what to expect at k < 1.
Let us now proceed to the case of modified Boltzmann electrons, where zonal flows play a special role in the turbulence. This is especially relevant for fluctuations at scales somewhat larger than the ion Larmor radius, which strongly affect the confinement properties of tokamaks and other devices with closed flux-surface geometry.

Zonal Flows
The so-called adiabatic response (contained within the constant β(k)) strongly affects the asymptotic analysis of the long-wavelength limit. The adiabatic response of electrons (i.e. the response to ion-scale gyrokinetic fluctuations) can be corrected to take into account the special role of zonal components [40,41] (see also section 8.1 of [42]). The correction dramatically changes the nonlinear dynamics, rendering the turbulence strongly anisotropic. We can include this correction by giving k-dependence to T , where δ is the discrete delta function and τ is a constant. For small τ a simple fluid model may be derived called the generalized Hasegawa Mima (GHM) equation [43,44]. This system exhibits zonal flow generation by an anisotropic inverse cascade, which can be understood as a simple extension of the inverse cascade that occurs in fluid turbulence. We review this mechanism in Appendix C, noting that it lacks an essential mechanism to bring about zonal flow saturation, and so we now turn to gyrokinetics for a more nuanced description of how cascade dynamics can induce and regulate zonal flows. The subject of zonal flow generation and regulation in magnetized plasma turbulence is a subject that has been studied intensely. We note, however, that a clear description is still lacking of the precise role of inverse cascade (or more generally, constrained spectral transfer of energy) for generation and regulation of zonal flows in a fully gyrokinetic system. This description requires that we consider, once again, the constraint placed on spectral energy transfer by the gyrokinetic conservation laws. Let us introduce a free energy quantity suited for the long-wavelength regime with the modified Boltzmann response. We defineW z = W g0 − τ E and find Although the effect of the modified Boltzmann response is modest at k 1, it is very important at k < 1. Indeed, for k 2 1 we have 1 +τ which can be compared with Equation (32). § § Thus the long-wavelength zonal component corresponds to the smallest possible value of q and thus represents the largest effective scale of the system. So, we conclude that if an inverse cascade occurs then it is anisotropic, leading to the accumulation of E into components with k 2 x 1 and k y = 0. However, just as we have seen with the isotropic sub-Larmor cascade [1], the inverse cascade is not guaranteed, but in fact could be tempered or reversed by an appropriate choice of forcing. We will demonstrate this in Section 6.3 using linear instability theory. Below we demonstrate this with simulations employing a simple fluid model.

Gyrofluid simulations of stationary driven turbulence
Unlike the small-τ limit of the HM equation, the finite-τ long-wavelength limit has a closure problem in the fluid moment hierarchy. Still, low order truncations or other closure schemes can yield a simple set of equations, which may provide physical insight into plasma behavior.
In the long-wavelength limit (kρ i 1), so-called finite Larmor radius (FLR) terms account for nonlinear phase mixing and the cascade in velocity-space formally becomes local in the sense that the evolution equation for each moment involves only neighboring moments in the hierarchy. We write down this moment hierarchy and describe some § §Note that by comparing Equations (32) and (47), one can identify the well-known difference between the nonlinear physics of electron-temperature-gradient (ETG) and ion-temperature-gradient (ITG) driven turbulence that originates from the modified Boltzmann response, favoring generation of strong zonal flows in the ITG case. of its properties in Appendix D. The important conclusion is that there are critical problems with performing an FLR expansion because it will fail to model nonlinear phase mixing in a quantitatively correct way. Nevertheless, we derive a gyrofluid system, which truncates the fluid moment hierarchy in an ad-hoc manner, as a simple toy model to address basic questions.
In this section we use this gyrofluid model to demonstrate the "cascade reversal" phenomenon in a driven and anisotropic context. The model satisfies exact nonlinear conservation of an approximate version of E and approximate nonlinear conservation of a "truncated" version of W g0 ; see Appendix D. The gyrofluid model is where the zonal and non-zonal parts of ϕ are denoted ϕ andφ, respectively (see Equation (C.1)), and we define and Note that the so-called finite Larmor radius (FLR) terms are contained in Equation (51); these nonlinearly couple the potential and the ion perpendicular temperature perturbations. The operator D = ν D ∇ 2 L D (where ν D is a constant) provides highk dissipation to the system by filtering out |k| < k d , this filtering being performed by L D . Note that we implement dissipation so that it does not act on ϕ; this allows us to focus on nonlinear damping of zonal flows as a saturation mechanism. The linear terms involving operators A 11 , A 12 , A 21 and A 22 on the right-hand side of Equations (48) and (49) are constructed to give (for ν D = 0) linear modes having complex frequencies and growing eigenmodes with components that have the ratiô The left-hand-sides of Equations (48) and (49) are derived by truncating the moment hierarchy above the perpendicular temperature in such a way as to approximately conserve a truncated version of the free energy W g0 , i.e. the first two terms in the summation of Equation (D.23). The model is related closely to the gyrofluid model of [16] (derived from gyro-fluid equations of [40]) but deviates (somewhat arbitrarily) in this closure (and also in the ad-hoc linear drive terms).  We solve Equations (48) and (49) by a fully spectral method. For the runs of Figure 3 we use 220 Fourier modes (with a spectral domain corresponding to the upperhalf plane in k x -k y space with grid space of 0.1 and maximum wavenumber 1.0) and linear drive parameters v * = 1, G = 0.3, φ 0 = −π/2, k w = 0.25, k d = 0.3 and ν D = 0.05. Viscous dissipation is provided at large wavenumbers but set to zero for the zonal component of the potential.
We find a nonlinear critical transition associated with the parameter R 0 of the linear drive. Because the growth-rate spectrum is unchanged in these simulations, this transition demonstrates the importance of the relative injection of free energy to electrostatic energy (i.e. the κ factor) in determining the overall behavior of the turbulence. In Figure 3 we plot the saturated energy values, each point corresponding to a separate run. In Figure 4 we show a representative snapshot of the electrostatic field for sub-critical and super-critical cases.
We note that this transition is absent if the FLR terms are neglected; see Figure D2. The behavior of the gyrofluid system at zeroth order is qualitatively very different from the behavior of the system with the appropriate FLR terms retained (even when k 2 1 is satisfied for all Fourier components). Thus we conclude that k 2 1 is a singular limit of gyrokinetics (with the modified Boltzman response for electrons) in the sense that formally small (FLR) terms remain important as the size of these terms tends to zero. This is due to the nonlocal interactions in k-space, e.g. due to the tertiary instability that regulates the zonal flows (see Section 6), which has very fine scale contributions to its eigenfunction. Actually, this fact casts some doubt on the quantitative validity of any fluid models of gyrokinetics that use the expansion k 2 1.

Linearization and Secondary/Tertiary Instability
The linearization of the dynamical equations about an exact solution (a monochromatic wave, which in our case is a stationary solution) gives another way to investigate the spectral evolution of fluctuations. If an instability is present, its growth rate may be calculated analytically. If the modes of largest growth rate are found at scales larger (smaller) than those represented in the initial condition, then it may be argued that inverse (forward) spectral transfer should be expected generally to result from such initial conditions. This line of thinking represents an alternative way to address the problem of spectral transfer direction, based upon actual dynamics -though the calculation is made only with a linearization of the dynamical equations and thus strictly applies only to a nearly monochromatic initial condition. The Fjørtoft argument has the advantage that it applies to the fully nonlinear problem, with arbitrarily complex states.
An instability theory has been investigated previously for gyrokinetic linear eigenfunctions corresponding to local ion/electron temperature gradient (ITG/ETG) instability [15] under the name "secondary instability theory." In that case, the velocity dependence of the initial condition (primary mode) is determined by the ITG/ETG linear theory, and thus the κ of the primary mode, which we call κ p , is also determined. Here we will take an artificial parameterization of the initial condition that allows us to arbitrarily set κ p .
We follow [15] with the modification that the velocity dependence of the initial perturbation is chosen by hand to give the desired ratio κ p in the initial condition. Let us define k 0 to be the wavenumber of the initial condition (or 'primary' mode). We will examine the problem for k 0 > 1 and k 0 < 1, considering both the Boltzmann response and the modified Boltzmann response for the second case.
6.1. k 0 > 1 First let us consider sub-Larmor scales k 0 = |k 0 | 1. We find that the instability is strongly affected by κ p in a way that agrees with expectations from the results of Plunk and Tatsuno [1]. We take the 'no response' limit (T = 0), which is isotropic (β = β(|k|)) so we arbitrarily choose the wavenumber to be in the y-direction, k 0 = (0, k 0 ), without loss of generality. The primary mode g p is written ¶ ¶ The terms proportional to a 0 and a 1 are meant to provide the density and non-density contributions to free energy, respectively. The constant σ = 1 is a factor that determines the effective velocity-space wavenumber of the non-density component. Plugging this expression into Equation (4), and using Equation (A.4), we calculate the electrostatic potential of the primary mode ϕ p as Using Equation (B.2), we see that for large arguments the second term is proportional to exp[−k 2 0 (1 − σ) 2 /2] and so is negligible for sufficiently large k 0 . The free energy W g0 (see Equation (24)) of this mode is Using Equation (7) the electrostatic energy E may be computed as The primary mode has a definite amount of W and E, the ratio of which establishes the modal κ factor κ p . Taking the ratio of the energies computed in Equations (56) and (57) and neglecting the terms proportional to a 1 in E, we have ≈ κ min 1 + a 2 1 a 2 0 σ , (59) ¶ ¶Note that the convention used in [15] is to express the distribution function using h instead of g. Translating into that form we have where κ min = (1 + T ) √ 2πk 0 . By setting a 1 = 0, we are thus able to reach the absolute theoretical minimal value of κ as predicted by Equations (31) and (39); this corresponds to all of free energy being focused in the density component. With a non-zero choice of a 1 , we can make κ p arbitrarily large.
Returning the calculation of the instability, note that g p is a stationary solution of Equation (1), being composed of a single Fourier mode. The stability of perturbations to this solution is what we are concerned with. Taking g = g p + g s , where g s g p is the "secondary" mode, the gyrokinetic equation reduces to a linear equation that represents the driving of g s by g p . We assume a normal mode solution, that is g s proportional to e iωst+ikxX and ϕ s proportional to e iωst+ikxx with ω s the complex frequency (recall that x is the coordinate in position space and X is the coordinate in gyrocenter space). The y-dependence of the mode is captured by an eigenmode, which is composed of harmonics of the primary wavenumber k x ; see Appendix D.5 for a simple illustration of this type of problem. An integral equation can be derived that can be solved for the eigenmode and the corresponding eigenvalue ω s ; see Eqn. 12 of [15]. We solve this integral equation numerically and plot the solutions in Figure 5.
Generally the instability depends on the parameters a 0 , a 1 , σ and k 0 . However, we focus on the single measure κ p /k 2 0 , which seems to be a good predictor of the general features of the growth rate curve. The case of minimal κ p (that is, a 1 = 0) exhibits familiar features [15]. The growth rate curve varies smoothly from k x = 0, reaches a single clearly recognizable maximum after which it falls again to zero at a cutoff wavenumber equal to the primary wavenumber k 0 . It can be concluded from this evidence that the tendency of turbulence driven by modes like this g p will be to produce large-scale features in the electrostatic potential, i.e. cascade the electrostatic energy inversely.
As the free energy of the primary mode is increased, we expect the instability to transform; this is indeed what happens. From the arguments of Plunk and Tatsuno [1] (see also Section 4.3) our expectation is that the behavior will be most strongly affected by the control parameter κ/k 2 0 . This quantity measures the relative distribution of free energy between accessible density and non-density modes.
In Figure 5, we show how the growth rate curve of the instability varies with κ p /k 2 0 . Although the instability does vary somewhat with other parameters like σ, the case plotted in Figure 5 (T = 0, k 0 = 40.0, σ = 0.75) seems representative of the overall shape (e.g. peak, magnitude, cutoff) of the growth rate curve for the cases that we calculated. We observe that (1) as κ p /k 2 0 is increased, the growth rate is strengthened at the large-wavenumber part of the spectrum; (2) at a critical value of κ p /k 2 0 ∼ O(1), the mode is destabilized above the cutoff k 0 ; (3) as κ p /k 2 0 is increased further, the global peak of the spectrum eventually appears above the primary wavenumber k 0 , signaling a shift from inverse spectral transfer to forward spectral transfer (transfer from low-k to high-k). These observations echo the transition observed in the fully nonlinear numerical experiments observed by Plunk and Tatsuno [1] and thus serve as further confirmation of the theory. Note that the appearance of a larger growth rate at lower k x may appear We solve for "secondary" modes that are perturbations to a large-amplitude monochromatic "primary" mode with a specified energy ratio κ p . Plotted is the growth rate spectrum of such modes for 6 values of κ p ranging from the theoretical minimum case (black) to the supercritical case (blue). The black curve corresponds to a primary mode with no density-moment free energy, i.e. a mode of minimal κ p . For all cases, we take T = 0, k 0 = 40 and σ = 0.75.
to be in contradiction with our general conclusions. This is not so. As they become more unstable, the sidebands (k y = ±k 0 ) and higher harmonics of these low-k x modes grow to larger amplitudes so that over half the energy in the mode actually resides at k = k 2 0 + n 2 k 2 x > k 0 , where n is the harmonic number of the secondary eigenfunction; see [15]. This is in contrast to the small-κ p case where the sidebands have small relative amplitudes and most of the energy is contained in the k y = 0 component; See Figure 6.
Other curious features are observed, such as multiple branches of the instability, ranges in k x of stability, and oscillations in the growth rate curve. We did not give much attention to these features; one might argue that such details are less important in fully developed turbulence than the overall shape of the growth rate spectrum. Our focus is on the qualitative trend of the growth rate on what appears to be the strongest control parameter. We conclude that the parameter κ p /k 2 0 , although not alone sufficient to completely characterize the instability, does constitute a reasonable predictor for the direction of spectral transfer.  Figure 6: The relative amplitudes of sidebands (and higher harmonics not shown) of the secondary mode are higher for large κ p . These figures correspond to the growth rates computed for Figure 5 at k x = 9. This effect is even more pronounced in the quasi-singular k 0 1 limit and the inclusion of higher harmonics becomes necessary to accurately compute the secondary growth rate.

k 0 < 1 (Boltzmann electron response)
Now let us turn to scales larger than the Larmor radius. We first consider the conventional Boltzmann response for the non-kinetic species. This is most relevant for scales a bit larger than the electron Larmor radius (but much smaller than the ion Larmor radius), e.g. the range of energy injection for the electron temperature gradient (ETG) driven turbulence in a tokamak. We represent the non-density free energy using P (k) 1 , defined in Section 3.2, instead of the Bessel function used in the previous section. We again consider an initial condition with the mode aligned arbitrarily in the y-direction, k 0 = (0, k 0 )): and recall that P The function P can be easily computed explicitly to give By construction, the term proportional to χ in Equation (60) gives no contribution to ϕ p . The electrostatic energy of the primary mode is simply and the free energy is So the κ factor for the initial mode is κ p = 2π(1 + χ 2 )/(βΓ 0 ), which for k 2 1 takes the form κ p ≈ (1 + χ 2 )T . See Figure 7 for a plot of the growth rate curve as a function of κ(χ). We take T = 1 and k 0 = 0.3 and vary χ. For small χ, instability is observed only for k x < k 0 and gradually weakens as χ is increased from zero. We observe absolute stability at about χ = 0.7. As κ p is increased further, the instability returns and presents at k x > k 0 for sufficiently large κ p . For large κ p , the growth rate peaks at k x > k 0 (and also develops very fine-scale structure in the y-dependence of its eigenmode) signaling a reversal in spectral transfer direction like the one observed for k 0 1 in Plunk and Tatsuno [1].

k 0 < 1 (modified Boltzmann response)
Taking T =τ = τ (1 − δ(k y )) as in Section 5.1, we once more examine the case of a modified Boltzmann response for electrons. This is the case corresponding to ion Larmor scale turbulence set with an equilibrium magnetic field lying in closed flux surfaces. The modified response introduces anisotropy to the problem, so the direction of k 0 is now important.
It is informative to briefly revisit the simple case of the GHM equation. Let us consider Equation (C.3), but neglect the linear terms (i.e. v * = L * = 0). If we take an initial condition with wavenumber in the y-direction, ϕ p = e ik 0 y + c.c., and consider perturbations about that, we find unstable modes of the form ϕ s = e i(kxx−ωst) nφ n e ink 0 y , which are unstable for k 2 x < τ + k 2 0 . There are a number of ways of deriving this instability criterion but we find an elegant method in terms of the Fjørtoft analysis of Section 4. See Figure 8. They key idea is that the three-scale result of Equations (35)- (36) is actually applicable to any three sets of contiguous scales, given that the energy changes at scales within a set are all the same sign. One simply replaces q 1 , q 2 and q 3 with the energy-weighted averages for each of the three sets,q 1 ,q 2 andq 3 . Recalling the definition of q GHM given in Equation (C.6) we note that the harmonics of the perturbation ϕ s have q GHM (k xx + nk 0ŷ ) = τ (1 − δ(n)) + k 2 x + n 2 k 2 0 . Let us take the first set to be the zeroth harmonic of the secondary mode,q 1 = q GHM (k xx ) = k 2 x , the second set to be the primary mode,q 2 = q GHM (k 0 ) = τ + k 2 0 , and the third to be all the remaining harmonics of the secondary mode, i.e.q 3 is the energy-weighted average of q GHM for remaining harmonics -we do not need to evaluate this explicitly but just note that this quantity must be at least as big as the first harmonic, i.e. q 3 ≥ q GHM (k xx + k 0ŷ ) = τ + k 2 x + k 2 0 . By Fjørtofts argument, it is a simple consequence of energy and enstrophy conservation that only an intermediate scale can be a source to other scales involved in energy exchange. Thus instability can only occur if the zeroth harmonic of ϕ s is at larger effective scale than ϕ p and so the instability criterion is q 1 <q 2 or k 2 x < τ + k 2 0 . Now if we take the initial condition to be zonal, i.e. ϕ p = e ik 0 x + c.c., we may apply the same logic to show that perturbations are absolutely stable for k 0 < τ . This is because the mode ϕ s = e i(kyy−ωst) nφ n e ink 0 x is composed of harmonics that all have  ( ...) Figure 8: The energy flow associated with the growth of a secondary instability must satisfy Fjortoft's constraint. The primary mode (red dot) behaves as an energy source for the harmonics of secondary mode (blue dots). This leads to the instability criterion q 1 <q 2 or equivalently k 2 q GHM (k) > q GHM (k 0 ) since the zonal components represent the largest effective scale of the system. We will find below that the strict stability of zonal modes is not enforced for gyrokinetics: instability can indeed occur if the zonal mode has sufficiently large κ factor. Let us take another detour and examine the instability driven by a poloidal k 0 = (0, k 0 ) mode (secondary instability) using our gyrofluid model; the fully gyrokinetic version of this instability has been studied and was not found to be especially sensitive to kinetic effects [15]. Nonlinear transfer function plots from the numerical simulation of Bañón Navarro et al. [11] suggests that part of the electrostatic energy transfer is made via local inverse cascade. We believe this part is composed of non-zonal fluctuations interacting to build up the zonal component; this is indeed what was observed for the gyrofluid simulations presented in Section 5.2. Thus the "zeroth-order" model (Equations (D.6), (D.9) and (D.10)) derived in Appendix D should be valid for the calculation of this instability as it is derived in the k 2 1 limit under the assumption of local interactions. We derive the growth rate of the instability exactly in Appendix D.5. We find that the growth rate of the mode is proportional to the factor √ 1 − r − r * where r = T ⊥0 /ϕ 0 is the ratio of the complex amplitudes of the temperature and potential of the primary mode. This factor does show that the relative amplitude and phasing of the primary mode can have a significant effect on the secondary mode. Note that the mode is most unstable when the real part of r is large and negative Now we return to the fully gyrokinetic calculation. We focus on the instability driven by a zonal k 0 = (k 0 , 0) mode (tertiary instability). As reported by Rogers et al. [16] the tertiary instability is very sensitive to the presence of non-density moments in the zonal mode (they study temperature fluctuations). Let us consider an initial condition composed of the zonal mode given by Equations (60) and (61) with k 0 = (k 0 , 0).
The tertiary mode has the following features. It is uniformly stable at sufficiently small κ p . Above a critical value of κ p (This value is numerically challenging to determine precisely due to the fact that a large number of harmonics are needed to resolve the mode near the critical κ p ; see Figure 10.) an instability presents that generally peaks at k y > k 0 . The gyrofluid model of Rogers et al. [16] was found to peak at k y ∼ √ k 0 k 0 , which we find to be consistent with our observations but we note that the constant of ( ...) Figure 9: The energy flow associated with the growth of a secondary instability must satisfy Fjortoft's constraint. This means that non-density free energy must be present in the zonal mode (red dots) in order for it to function purely as a source of electrostatic energy for the tertiary mode (blue dots). The zonal mode resides atq 1 and the tertiary mode is atq 2 (zeroth harmonic) andq 3 (remaining harmonics).
proportionality is quite sensitive to κ p . The radial eigenfunction of the mode has no contribution from the k x = 0 component. So, the instability is composed exclusively of harmonics satisfying |k| > k 0 and thus we expect long-wavelength zonal modes to release their energy in a forward cascade. We can explain this behavior as follows. As described in Section 5.1, zonal wavenumbers at k < 1 represent the largest effective scales of the system. Thus, energy flow from the zonal component at k < 1 to the non-zonal component constitutes spectral transfer in the "forward" direction and generally needs a reservoir of non-density free energy to draw from. This follows from the same kind of three-scale argument as made above for the GHM system; see Figure 9. If the zonal mode has only free energy in the density component, then the energy flow must occur by Fjørtoft-type transitions (see Figure 2b) and thus the zonal mode cannot serve purely as a source for the tertiary mode. With sufficient free energy in the zonal mode, this restriction is lifted by the inclusion of Kinetic-type transitions (operating in the reverse sense of Figure 2b). Furthermore, the instability must have k k 0 because the FLR terms are required in the k 2 1 limit in order for electrostatic energy to flow from zonal to non-zonal components; this is explained in detail in Section Appendix D.

Method
Our simulations of decaying gyrokinetic turbulence were done using the MPI-parallelized f95 gyrokinetic code AstroGK [45]. AstroGK solves initial value problems of the gyrokinetic equation, Equation (1) (with a collisional term added to the right-hand side), in the straight slab plasma with periodic boundary conditions perpendicular to a uniform background magnetic field in the z-direction. To keep the problem two-dimensional, uniformity is enforced along z, i.e. ∂/∂z = 0. The system size is L x = L y = 2π in order to focus on sub-Larmor scales. The highest resolutions employed in our runs are 256 2 in position space and 192 2 in velocity space, respectively, which required about 20 hours using 9,216 processor cores.
Initial conditions are constructed in Fourier space using the following functions. A density component may be constructed by usinĝ This corresponds to free energy focused narrowly about the diagonal component p = k 0 . Non-density components are introduced using the initial condition For each k, the complex coefficient a 0 is chosen with a random phase and random amplitude from (0, A 0 ). The coefficients a j , j > 0 are also randomly phased with amplitudes chosen from (0, A 1 ). The p j 's are chosen randomly from the normal distribution of averagep and dispersion p w . We combine the two functions for a general initial condition We focus on three main cases: the small κ/k 2 0 case (see Figure 13a) uses k 0 = 40, k w = 1, A 0 = 1 and A 1 = 0. The medium κ/k 2 0 case (see Figure 13b) has k 1 = 20, k w1 = 8, N rand = 50, A 0 = 0, A 1 = 1,p = 20 and p w = 4. The large κ/k 2 0 case (see Figure 13c) has k 0 = 5, k w = 1, k 1 = 20, k w1 = 5, N rand = 50, A 0 = 0.04, A 1 = 1,p = 0 and p w = 3.

Spectral Evolution
We focus on the sub-Larmor limit of Plunk and Tatsuno [1]. We adopt the conventions of that paper in this section: the free energy quantity is W = 2v (0) cut (1+T ) W g1 , and the spectral representation is the simplified Bessel series of Equation (17). We reproduce the main figures of Plunk and Tatsuno [1] in Figure 11 and Figure 12.
The initial (t = 0) spectral density is concentrated on the diagonal for the small κ/k 0 case (see Figure 11a), broadened for the medium κ/k 0 case (see Figure 11c) and then further extended to the off-diagonal components for the large κ/k 0 case (see Figure 11e).
As time proceeds, the first two cases (κ/k 0 2) show inverse cascade along the diagonal; however, the largest κ run (κ/k 2 0 = 29) shows the opposite behavior. In this case, the initial spectra around k ∼ 20 does not contribute a significant density component. At a later time shot, t = 0.27 (see Figure 11f), the free energy becomes distributed in k-p plane, but it has a much lower density on the strip along the diagonal as required for the conservation of E. The initial diagonal component around k ∼ p ∼ 5 is the one that constitutes initial E, but not at a sufficient magnitude to support the conventional inverse cascade behavior. Thus the peculiar behavior of cascade reversal may be attributed to electrostatic energy "starvation." That is, the system is forced into a new mode of operation for lack of a basic resource. Figure 12 shows the time evolution of the spectral density of E(k), from which the locality and direction of the nonlinear interaction may be inferred. As shown in Figure 12a, there is a secondary peak around k ∼ 15 at t = 5.8, which is separated from the initial one at k = 40. This shows nonlocal interaction, which is expected from the linear instability whose growth rate is indicated with the dotted curve; see Section 6.1. On the other hand, Figure 12b does not show any secondary peak, and the gradual change of the spectrum indicates local inverse cascade. Figure 12c and Figure 12d demonstrate nonlocal and local forward cascades of E. The initial E spectrum for the case of Figure 12c is peaked at k = 5, but a secondary peak appears around k ∼ 12 at t = 0.11, which is separated from the initial one. In Figure 12d a mixture of forward transfer and inverse transfer of E is observed.
We can analyze spectral transfer more quantitatively in terms of a transfer function for the electrostatic energy. This construction is a particular sum of triad interaction terms that produces a function having two arguments, a "from" wavenumber and a "to" wavenumber. There is no unique way to construct this two-scale transfer function (some authors prefer to examine the three wavenumber triad interaction terms directly) but the definition we employ is standard and also a "natural" definition as we explain below; see also Bañón Navarro et al. [11] and Nakata et al. [46] who use this type of transfer function to study simulations of tokamak turbulence.
We first introduce a shell filter in Fourier space. Following Tatsuno et al. [10], we define the shell-filtered electrostatic field as where K = {k : K − 1/2 ≤ |k| < K + 1/2}. Likewise, we may define the shell-filtered distribution function The electrostatic energy at shell K is defined Multiplying the collisionless gyrokinetic equation, Equation (1) with the collision operator neglected, by ϕ K R and integrating over R and v, one obtains an evolution where we have expanded ϕ = Q ϕ Q , g = P g P and introduced the three-argument electrostatic energy transfer rate T (E) (Q, P ; K), defined This function represents the rate of change of energy in shell K due to three-wave interactions involving wavenumbers contained in shells P and Q. The transfer function T (E) (P, Q; K) is quite general (we need not even partition the spectral domain using shells) but we note that it does not satisfy symmetry under exchange of P and Q like the triad interaction function of [46], though such a symmetrized transfer function can be constructed from it. For our purposes, the deficiency in T (E) (P, Q; K) is that there is no clear identification of a source and sink for the energy, just three interacting shells. It turns out that by requiring three simple properties a two-scale transfer function T (E) (Q, K) can be uniquely determined. These properties are The first condition is a basic requirement of energy conservation under shell-to-shell transfer: energy accumulates in shell K due to loss at shell Q. The second condition is just the requirement that when the transfer function is summed over all shells Q, nothing that contributes to the energy change at shell K is missing. The third condition is that the transfer function should be constructed only out of the three-shell transfer terms explicitly present in equation for the evolution of E K , i.e. those terms defined by Equation (72). Note that if this last condition is relaxed, a more general class of transfer functions T (E) (Q, K) can be constructed as a sum of terms T (E) (P, S; R) (that is, the final index is not fixed to be K) [47]. These three conditions are satisfied by simply summing T (E) (P, Q; K) over the index P , i.e. by taking η Q P,S,K = δ Q,S . Thus we arrive at the definition for the electrostatic energy transfer rate from shell Q to shell K, T (E) (Q, K): As shown in Figure 13, we have computed the spectral transfer function for the cases studied in Plunk and Tatsuno [1]. These correspond to three regimes: (1) nonlocal inverse transfer (2) local inverse transfer and (3) forward transfer. These transfer plots offer a direct glimpse of the energy flows between different scales and confirm that the spectral evolution observed by Plunk and Tatsuno [1] was indeed due to nonlinear interaction.

Discussion
Let us review the key questions posed by this work and discuss the results we find. As argued in Section 2.3, the free energy and electrostatic energy are measures that constrain gyrokinetic turbulence, with the former admitting some freedom in the exact definition. The formulation of a spectral representation (see Section 3) leads to a precise statement of this constraint via Equation (28). We then are able to systematically extend the arguments of Fjørtoft to establish the basic mechanism that governs the direction of spectral energy transfer. Building on the work of Plunk and Tatsuno [1], we show that our constraint supports inverse cascade of electrostatic energy (spontaneous transfer to large scales), but we also find that there is additional freedom that allows for reversal of the cascade direction. As evidence of these conclusions, we present numerical evaluation of the spectral transfer function, corresponding to the simulations of Plunk and Tatsuno [1]; see Section 7. By linearizing the nonlinear gyrokinetic equation about a monochromatic initial condition, we also calculate an instability that exhibits the same transition predicted by the Fjørtoft arguments; see Section 6. We have also extended our analysis to scales larger than the Larmor radius (see Section 5) and have included the modified Boltzmann response, an essential effect for application of gyrokinetics to the closed flux surface geometries of magnetic confinement experiments like tokamaks and stellarators.
By attempting to derive a simple fluid reduction of the gyrokinetic system (see Appendix D) we arrive at an important conclusion, namely that the long-wavelength limit is singular in the nonlinear interactions between fluctuations and zonal flows. Thus the regulation of zonal flows by turbulence appears to be an inextricably kinetic phenomenon. That is, nonlinear phase mixing persists in the long-wavelength regime, acting as a channel for the transfer of free energy, and the turbulent viscosity on zonal flows can in principle be changed from positive to negative by the same basic mechanism as that which caused the "cascade reversal" observed by Plunk and Tatsuno [1]; we demonstrate this reversal in Section 5.2 with simulations using a simple gyrofluid model. These findings casts doubt on the validity of any gyrofluid model that treats FLR effects asymptotically. However, it leaves open the possibility of a more sophisticated gyrofluid model with special care taken to model kinetic nonlinear interactions between zonal and non-zonal fluctuations.
In focusing on nonlinear spectral transfer, the Fjørtoft theory avoids questions involving sources and sinks (in fully gyrokinetic turbulence). Does the dual cascade persist under these conditions? Or, more generally, is the constraint on spectral transfer a useful tool for understanding the behavior of driven and dissipated gyrokinetic turbulence? These questions are generally open, but there is some evidence suggesting that the answers are both yes. There is numerical evidence that the sub-Larmor inertial-range theory [6,2] gives the correct result in ion temperature gradient (ITG) tokamak turbulence [11]. There is also evidence that gyrokinetic turbulence [48] obeys an inertial-range scaling in the super-Larmor range, where most of the fluctuation energy and transport physics resides. The scaling theory of Barnes et al. [48] may be obtained by assuming a two-dimensional (k x -k y ) cascade and adding to this the addendum that structure in the third dimension (k ) will develop so as to always keep the parallel streaming term large enough to compete with the nonlinear term. However, despite the fact that this term remains large enough to be dynamically relevant, the effective damping of the electrostatic field by parallel phase mixing (linear Landau damping) seems to be sub-dominant to the nonlinear cascade rate in the inertial range. If this were not the case, the energy spectrum would decay exponentially in a local cascade; see i.e. Howes et al. [49]. Indeed, numerical diagnostics by Navarro et al. [29] show the parallel damping to be localized to the outer scale, with the effective parallel damping rate decreasing with k. Thus, at least in ITG tokamak turbulence, the electrostatic energy appears to retain the status of invariant in the inertial range (along with the free energy), and the inverse cascade of E is indeed observed in simulations by Bañón Navarro et al. [11].
In the present work, we have identified a control parameter for gyrokinetic turbulence, namely κ. The role of κ must be tested in realistic gyrokinetic systems, where characteristics of the magnetic field geometry and background plasma come into play. We have focused on the nonlinear term in the gyrokinetic equation. Other terms, associated with the background temperature and density gradients, and magnetic field geometry, may be considered generally as energy sources for the turbulence. It is natural to then ask what the relative input of free energy to electrostatic energy is for these sources; in other words, at what "level of κ" is the turbulence driven. One way to approach this question is to calculate the κ factor for linearly unstable eigenmodes. An evaluation of the local gyrokinetic dispersion relation for the ITG mode seems to indicate that κ generally increases with the strength of the background ion temperature gradient, 1/L T . Thus, in this case, the effect of increasing κ may not be observed independently from the effect of increasing the growth rate of the instability. But such a local calculation does not take into account the influence of plasma shape and the question of how κ depends such system parameters is open.
The control parameter κ may help explain how the behavior of small-scale turbulence depends on large-scale features of plasma; it could also have predictive capability, especially for classes of plasma configurations that have a large parameter space, e.g. stellarators, where optimizing the shape of the magnetic field for turbulence could lead to enhanced performance. While it is natural to seek optimized magnetic configurations that minimize instability by reducing growth rates or increasing the domain of stability, the present study suggests that it is also desirable that instabilities present with optimal κ. In the case of ion-scale turbulence (see Section 5.1), small κ seems desirable to facilitate the generation of zonal flows and thereby reduce the amplitude of the turbulence in steady state. On the other hand, in cases where the formation of large scale structures are seen to increase turbulent transport (i.e. formation of streamers in ETG turbulence), one might seek to maximize κ. A third possible case was identified by [1] where high-κ fluctuations at sub-Larmor scales served as an energy sink for turbulence driven at large scales. Could small-scale instabilities (like trapped particle modes) be optimized to damp the more deleterious large-scale turbulence? Of course, further work is necessary to explore these ideas.
Other open areas include the generalization of the results of this work to systems with electromagnetic fluctuations and multiple kinetic species. In principle, both of these extensions can be made in a straightforward fashion by considering the generalization of the electrostatic energy and free energy; the multi-species version of E is given already in the appendix of Schekochihin et al. [7]. Electromagnetic fluctuations will bring qualitatively new physics but it is still worthwhile investigating constrained spectral energy transfer in this context.
Perhaps the biggest open challenge is to make quantitative predictions about the spectrum of steady-state turbulence, driven by physical instabilities. In particular we would like to know what sets the characteristic amplitude and dominant scale of the turbulence. We would also like to characterize the anisotropy of the turbulence -at what scale do zonal flows (or streamers, etc.) form and what is the relative amplitude of these structures compared with the non-zonal fluctuations? These issues are for the most part beyond the scope of the present work. Still, we can make some speculative remarks: It seems that sufficiently small scales, to a certain extent, behave as if governed by inertial range physics. As such, the spectrum of these fluctuations should fall off in k-space as a universal power law, set by the constancy of the nonlinear flux of free energy. We are left with the challenge of describing the turbulence at the scale of energy input. For ion-scale turbulence, it seems plausible, given the present study, that the amplitude of fluctuations relative to zonal flows depends on κ, as measured by the linear instability. In other words, if the zonal flow amplitude is set by the usual saturation rule of γ E ∼ γ L (where γ E is the E × B shearing rate and γ L is the linear growth rate), then the amplitude of non-zonal fluctuations may be found to be proportional the zonal flow amplitude, with the constant of proportionality being an increasing function of κ. This seems to be anecdotally supported by the gyrofluid simulations of Section 5.2; note the trend of increasing non-zonal energy at constant zonal energy for super-critical R 0 shown in Figure 3. Perhaps similar conclusions could be drawn for other types of gyrokinetic turbulence, such as ETG or trapped-particle-driven turbulence. It is our hope that future work will uncover such basic effects of constrained spectral energy transfer on turbulent steady states in gyrokinetics.

Acknowledgements
constant is within the span of G. We drop the superscript (k) in what follows.
First let us define a set of polynomials {Q 1 , Q 2 , Q 3 , ...} such that Q n (u) = q n + u n where q n is a constant. We construct this set in terms of P 1 , P 2 , etc., as follows. We take Q 1 = P 1 , then it is easy to see that Q 2 can be constructed in terms of Q 1 and P 2 . Then Q 3 can be written in terms of Q 1 , Q 2 and P 3 and so forth. Now note the power series expansion for J 0 (x): Using this and recalling the definition P 0 (u) =Γ , we can see that the quantity To show that c is nonzero we simply multiply Equation (A.2) by e −u P 0 and integrate over u (the Q m are orthogonal with P 0 by construction) to obtain c = λ 0Γ 1/2 0 (k)/Γ 0 (k, 0), where the generalized functionΓ 0 (k 1 , k 2 ) is defined Note that c/λ 0 will become quite large at k 1 becauseΓ 0 (0, k) is exponentially small (see Equation (B.2)). Thus, the set G may be a more practical representation for k 1.

Appendix B. Asymptotic Forms of Bessel Functions
The nth-order Bessel function of the first kind J n (x) has the following asymptotic form for Using the identity I 0 (x) = J 0 (ix), we can infer the large argument form of I 0 to be I 0 (x) ≈ e x / √ 2πx. Applying this to the Equation (A.4), we can obtain: and, sinceΓ 0 (k) =Γ 0 (k, k), we also havê For small argument, the asymptotic expressions for J 0 and Γ 0 are

Appendix C. Generalized Hasegawa-Mima Equation
To illustrate a simple and familiar example of zonal flow generation by anisotropic inverse cascade, ket us consider a long-wavelength (k 2 1) limit of gyrokinetics that yields the HM equation, but with the modified electron response described in Section 5.1. This is called the generalized Hasegawa Mima (GHM) equation [43,44]. We define the zonal component of ϕ as the average over the system in the y-direction and the non-zonal part as the rest of ϕ.
The GHM equation can be derived in the limit τ ∼ k 2 1. It is written where we have included a source term on the right-hand side that gives linear instability.
In the absence of this term, this system conserves two quantities, E GHM and Z GHM , whose spectral densities are given These spectral densities satisfy the constraint This ratio sets an effective wavenumber that corresponds to the square root of the quantity q GHM . Thus, the development of anisotropic flows in this system is not only due to the anisotropy of the drift waves (introduced by the term v * ∂ yφ ) but also by anisotropy in the nonlinearity (and nonlinear invariants). This effect is very strong. In fact, the inclusion of unstable linear modes in this model leads a system with pathological behavior, which can be seen directly from energy enstrophy balance equations as follows.
For the remainder of this section we drop the subscript on q GHM .
Consider the following thought experiment. Stationary turbulence of the driven GHM system must satisfies energy and enstrophy balances and Now let us partition the spectrum using q(k): Large-scale zonal flows reside at q ≤ τ . We assume there is instability (γ(k) ≥ 0) from τ (because non-zonal modes have q > τ ) to q and damping (γ(k) < 0) at larger q. The rate of change of energy at these scale ranges due to the linear source term can be expressed Then by energy and enstrophy balance we have ε p + ε Z + ε D = 0, (C.12) and q p ε p + q Z ε Z + q D ε D = 0, (C. 13) where it can easily be shown that (C.14) A trivial solution of Equations (C.12) and (C.13) is ε p = ε Z = ε D = 0; this can be realized if neutrally stable zonal flows grow, by nonlinear interaction with non-zonal fluctuations, to large amplitude and quench non-zonal fluctuations, yielding a state of stationary zonal flows and no fluctuations. Non-trivial solutions to Equations (C.12) and (C.13) (i.e. those with non-zero fluctuations) require ε p , ε Z , ε D = 0. This should all be quite familiar, following the discussion around Equations (35)- (36). By analogy with those results, we can conclude that if injection at the intermediate scales is nonzero, i.e. ε p > 0 (this must be true of any components in the drive range are at non-zero amplitude), then we must have damping at both the zonal scales ε Z < 0 and the dissipation scales ε D < 0. This result should not be surprising. Indeed, if energy injection remains positive, then it is expected to continually cascade to larger and larger scales unless there is some sink to absorb the flux.
This mechanism of zonal flow generation by inverse cascade makes the GHM equation a questionable candidate for modeling tokamak turbulence because linear processes other than collisional dissipation do not damp all zonal flows [51]. In gyrokinetic turbulence, finite Larmor radius (FLR) effects couple the zonal flow dynamics to fluctuations in the ion temperature, which break the enstrophy conservation satisfied by the GHM nonlinearity. This opens up a channel for energy flow from the zonal flows back into the non-zonal fluctuations, which provides a nonlinear mechanism to limit the zonal flow amplitude and allows for a (non-trivial) saturated state to exist in absence of linear damping on the zonal flows.
Appendix D. Two-dimensional gyrokinetics in the long-wavelength limit, and including zonal flows In this Appendix, we examine the fluid moment hierarchy that arises in the longwavelength limit of the two-dimensional gyrokinetic equation. We find it useful to examine this in detail because an asymptotic limit can impose new features on the problem and these features may not be anticipated from a general understanding of the gyrokinetic system. An interesting feature of this limit is that it is quasi-singular in the sense that the electrostatic energy of non-zonal fluctuations is perfectly conserved at zeroth order (and the zonal flows give no contribution to energy). One must include high order terms in order to to capture nonlinear phase mixing and the energetics of the zonal flows. This is reminiscent of energy conservation in the gyrokinetic equation (for fluctuations), where a higher order term, the parallel nonlinear (PNL), must be included to give non-trivial energy conservation; see the discussion in Section 2.3. In that case the PNL term is not dynamically relevant (but its appearance in the gyrokinetic equation may be justified by other considerations such as the value of conservation laws for certain numerical schemes). Here, however, we argue that it is asymptotically consistent to include the additional FLR terms and that the energy balance that this captures is an important part of the dynamics. This is due to the fact that the limit k 2 1 is singular in that ostensibly small terms are made relevant by nonlocal interactions.
We define the small parameter lw = k 2 (which is distinct from and subsidiary to the gyrokinetic expansion parameter = ρ/L lw ). Let us expand the gyrokinetic equation (neglecting collisions), where we use the small-argument forms of J 0 and Γ 0 of Appendix B. Including terms up to O( 2 lw ) we obtain * * * ∂g ∂t and Equation (2) becomes where the zonal and non-zonal components have been defined in Equation (C.1) and we have taken the modified Boltzmann response for electrons by setting T =τ in the definition of β: ) and δ is the discrete delta function having value 1 when its argument is zero. An important feature of this asymptotic limit is that there is a singular relationship between the zonal parts of ϕ and g: which implies that the zonal component of ϕ is due to small corrections in g: where we have expanded ϕ = ϕ (0) + ϕ (1) + ... and g = g (0) + g (1) + ... with superscripts indicating ordering in lw . Because of the singular relationship between ϕ and g, we will find the dynamics of ϕ at one order higher in the expansion of Equation (D.1).
Appendix D.1. Zeroth-order system The fluid moment hierarchy begins with equations for ϕ found by integrating Equation (D.1) over velocity and using Equation (D.2). We find first the equation forφ at dominant order Then at next order, where T ⊥ is half the (ion gyrocenter) perpendicular temperature, Now we may obtain a dynamical equation for ϕ (0) by averaging Equation (D.7) The equation for T (0) ⊥ is found directly taking from the zeroth order part of Equation (D.1), One is tempted to stop here. Indeed, Equations (D.6), (D.9) and (D.10) comprise a closed zeroth-order system (ZOS), i.e. a system having dependence only on zeroth order quantities ϕ (0) , T ⊥ and not coupling to any other moments in the fluid moment hierarchy. However, such a set of equations has no mechanism for energy transfer from the zonal flows to the non-zonal fluctuations. This can be demonstrated (though we do not do it here) by solving the tertiary instability problem [16] (which asks how non-zonal fluctuations can spontaneously grow in the presence of a large amplitude zonal flow). Another way to see this is to note that the energy ofφ (0) is conserved under interactions with ϕ (0) as is evident from Equation (D.6). Indeed, the electrostatic energy under this expansion is at dominant order just the energy of the zeroth-order non-zonal field, In other words the ZOS conserves just the non-zonal energy (the first term in Equation (D.11)), so although the zonal flows (being energetically irrelevant) can grow (or be damped) under the influence of the non-zonal fluctuations, they cannot feed back in a way that causes significant growth or decay of those fluctuations. (Indeed, Diamond et al. [52] refer to zonal flows as "modes of minimal inertia.") In the ZOS, the effect of zonal flows on non-zonal fluctuations is only conservative shearing, as captured by Equations (D.6) and (D.10). Note that whatever order the fluctuations are calculated (i.e. ϕ (0) , ϕ (1) , ϕ (2) , etc.), there will always be an energetically irrelevant part of ϕ that is, however, dynamically relevant. However, the tertiary instability, by which energy flows from zonal to non-zonal fluctuations, has been shown to be an important mechanism for regulation of zonal flows [16]. As the role of zonal flows is so central to the turbulent state (see for instance [53]), processes that regulate zonal flows will regulate the full turbulent state. † † † We will later give a simple two-field system that includes FLR terms that capture this tertiary "energy channel", but let us complete the computation of the gyrofluid system under the naive k 2 1 expansion to include the dynamics of first order fields.

Appendix D.2. Higher-order systems
Now we introduce some more fields in the moment hierarchy and then give equations for these quantities up to first order. We define the next two perpendicular velocity † † †Note that the ZOS may be adequate for regimes where the tertiary instability is not important, such as below the nonlinear critical gradient for ITG where zonal flows can completely quench the turbulence on timescales smaller than collisional damping. ⊥ , χ (0) , etc., cascade independently to larger k, though all moments higher than T (0) ⊥ do so passively (without affecting ϕ). Thus, at zeroth order, each moment has an independent quadratic invariant V −1 d 2 r (T (0) ⊥ ) 2 , (χ (0) ) 2 , ... , that contributes to the total free energy. The mixing of these energy channels is obtained by including FLR terms. Continuing the expansion we have and so forth. Here the nonlinear phase mixing appears, opening a channel for the flow of free energy. From this moment hierarchy we now wish to obtain a simple two-field system that conserves W g0 . To do this, we employ the Laguerre polynomials as a basis for g (see Appendix D.4) and truncate at second order in the Laguerre hierarchy. Note that our solution for ϕ (0) ,φ (1) and T ⊥ = T (0) ⊥ is equivalent to solution of g 0 and g 1 at zeroth and first order; see Equations (D.25)-(D.28). We may then truncate the hierarchy by setting g 2 = 0, g 3 = 0, etc., and so by Equation (D.29) we can set χ (0) = 2T

Appendix D.3. Simple gyrofluid model
There is a fundamental problem in solving the system of equations, up to first order fluctuations, separately as presented in the previous section. Taking Equation (D.7) at face value, we see that the crucial FLR terms affect the evolution of small corrections ϕ (1) . Ifφ (1) is really dynamically relevant then it must feed back on ϕ (0) , which it does not do in this system. In fact, the system comprised of Equations (D.6), (D.7) and (D.10) is actually not closed since ϕ 1 , which appears in the third term of Equation (D.7), is undetermined (the dynamical equation for ϕ 1 can be obtained at one order higher but we will not give it here).
We can justify the inclusion of the FLR terms in the equation for ϕ 0 , however, by allowing for "nonlocal interactions." We argue that they contribute at dominant order and so they are not really higher order. Implicit in derivation so far is an assumption of locality of interactions, i.e. the ordering in terms of k 2 is assumed to be true "scale-by-scale," and nonlinear interactions are assumed to occur among fluctuations on comparable scales. This is actually not correct for the tertiary instability, which couples large-scale zonal flows to fine-scale fluctuations. It was shown by Rogers et al. [16] that the tertiary growth rate peaks for k t ∼ √k , where k t is the characteristic wavenumber of the tertiary instability andk is the wavenumber of the zonal flow. Thus we include the FLR terms in the equation for ϕ (0) and, dropping superscripts, write Combining the zeroth order and first order equations for T ⊥ and employing the truncation as described above, we obtain which is Equation (49) with the forcing terms on the right-hand side omitted. In summary, Equations (D.18) and (D.20) comprise a model for nonlinear interactions in 2D gyrokinetics that includes energy flow from zonal to non-zonal components, allowing for saturation of zonal flows by nonlinear interaction. Note that although the invariant E gf is exactly conserved, the inclusion of FLR terms in these equations means that the truncated version of W g0 is now only approximately conserved. This is due to the appearance of higher order terms in the equation under the change of variables between fluid moments ϕ, T ⊥ , etc. and the Laguerre components g 0 , g 1 , etc.. Schematically, the picture of electrostatic energy flows consistent with this ordering may be summarized by Figure D1, where we consider local forcing at the largest scale of the system. This picture is also compatible with the spectral transfer of E observed in gyrokinetic ITG simulations [11], which shows the coexistence of nonlocal forward transfer and local inverse transfer (though not separated into zonal and non-zonal components).
By direct numerical simulation we have compared the gyrofluid model that includes FLR effects with the ZOS (Equations (D.6), (D.9) and (D.10)). These runs were made with just 60 fourier modes but the linear drive terms and corresponding parameters are otherwise the same as those reported in Section 5.2. Figure D2 shows the striking   difference in the behavior of these systems. (Note that although for these runs the spectrum peaks around the instability drive k 0.25, we have found that the difference does not diminish even if the magnitudes of the wavenumbers of the system are decreased further to represent the asymptotic limit k 2 → 0.) The nonlinear critical transition is absent for the ZOS and the amplitudes of the fluctuations at saturation are different by well over an order of magnitude.

Appendix D.4. Fluid moment hierarchy in Laguerre polynomials
A convenience of the long-wavelength limit is that the Bessel function may be expanded as a low-order polynomial and one can easily use a standard set of orthogonal polynomials to represent the fluid moment hierarchy and decompose the velocity space dependence of g. We choose the Laguerre polynomials, which have the desired weighting function so that the free energy W g0 can be written as a simple sum of squared Laguerre coefficients. The Laguerre polynomials, denoted L n (u) where u = v 2 /2, are orthogonal with due −u L n (u)L m (u) = δ(n − m) (D. 21) It follows that if we define g = 1 2π n L n e −u g n (R, u), (D. 22) then the quantity W g0 is simply 1 + {ϕ (1) , g 1 } + {ϕ (0) , g 1 } +{∇ 2 ϕ (0) , − 1 2 g 2 } = 0, (D. 33) which upon truncating the hierarchy, g 2 → 0, is equivalent to the gyrofluid equations of the previous section.
we consider a monochromatic primary mode, with wavenumber k 0 , but now include a corresponding temperature wave T ⊥p = T ⊥0 exp ik 0 y + c.c. (D. 38) where T ⊥0 is a complex amplitude. This introduces two new parameters: δθ = Arg[T ⊥0 ], the phase difference between the perturbed temperature and electrostatic potential, and |T ⊥0 |/ϕ 0 , the relative amplitude of the perturbed temperature. The secondary mode now consists of two parts, ϕ s ϕ p and a temperature perturbation T ⊥s T ⊥p . The secondary instability equations are where z = γs kxk 0 ϕ 0 and r = T ⊥0 ϕ 0 .