A fully-neoclassical finite-orbit-width version of the CQL3D Fokker–Planck code

The time-dependent bounce-averaged CQL3D flux-conservative finite-difference Fokker–Planck equation (FPE) solver has been upgraded to include finite-orbit-width (FOW) capabilities which are necessary for an accurate description of neoclassical transport, losses to the walls, and transfer of particles, momentum, and heat to the scrape-off layer. The FOW modifications are implemented in the formulation of the neutral beam source, collision operator, RF quasilinear diffusion operator, and in synthetic particle diagnostics. The collisional neoclassical radial transport appears naturally in the FOW version due to the orbit-averaging of local collision coefficients coupled with transformation coefficients from local (R, Z) coordinates along each guiding-center orbit to the corresponding midplane computational coordinates, where the FPE is solved. In a similar way, the local quasilinear RF diffusion terms give rise to additional radial transport of orbits. We note that the neoclassical results are obtained for ‘full’ orbits, not dependent on a common small orbit-width approximation. Results of validation tests for the FOW version are also presented.


Introduction
Finite-difference Fokker-Planck (FP) codes are essential computational tools for the theoretical interpretation of existing plasma RF and NBI heated experiments and for projections to new experiments. All present-day FP codes use various simplifications because a blunt solve of 6D kinetic equation (3 spatial + 3 velocity) is a formidable problem even for modern super-computers. Useful reduction of dimensionality can be achieved for most of the plasma volume in high temperature tokamaks and other toroidally symmetric fusion energy devices, by averaging over periodic particle trajectories. A bounce-averaging (BA) procedure for toroidally symmetric magnetic geometry is based on assumption that bounce time (more generally, poloidal transit time) τ b of charged particles is much smaller than the collision time or a characteristic time associated with any other diffusive process. This is commonly the case for energetic particles and most of the bulk electrons and ions in tokamaks. It is also assumed that time variations of plasma parameters and the distribution function are much slower than the bounce time and gyro-period. Then, the fundamental kinetic equation can be averaged over all three periodic motions of particles, therefore reducing the dimensionality from 6D to 3D [1][2][3][4].
The CQL3D (Collisional QuasiLinear 3D) conservative finite-difference bounce-averaged Fokker-Planck equation (FPE) solver [5,6] is built on the collision model in the 2D-in-velocity, 0D-in-radius, zero-orbit-width (ZOW) bounce-average CQL code [7,8]. The ZOW CQL3D was motivated by the FPP code [9], and is similar to other BA FP codes [10][11][12][13][14][15][16], but is different with regard to: greater generality, being multispecies, having a fully relativistic option and having larger number of synthetic diagnostic tools. It uses flux-conserving differencing, and thus conserves particles, has a fully nonlinear Coulomb collision operator, and derives quasilinear RF diffusion terms from wave data that is usually exported from the general, all-frequencies GENRAY ray-tracing code [17]. It also models a neutral beam source of ions, and is coupled with RF full wave solver [18] and transport codes as part of the SciDAC Integrated Plasma Simulator project [19]. Until recently CQL3D was restricted by a ZOW approximation which neglects guiding-center drift perpendicular to magnetic flux surfaces; the present work describes the new version of the code with the guiding-center finiteorbit-width (FOW) capabilities.
The outline of the paper is as follows. In section 2, a general view of the FOW-FPE is summarized, first in terms of action-angle variables which make orbit averaging simple, then transforming to more 'convenient' variables. In section 3, the derivation of transformation coefficients is given. Section 4 provides the Jacobian for the transformation. In section 5, the bounce-average coefficients are derived in terms of the local diffusion tensor components and transformation coefficients. In section 6, we show that the FOW-FPE converges to the ZOW form in the limit of thin banana orbits. The particle source operator and the RF quasilinear operator are described in sections 7 and 8. Questions of the initial setup of the distribution function and its rescaling at later time steps are discussed in section 9. The boundary conditions are discussed in section 10. Test results, based on National Spherical Torus Experiment, NSTX [20] equilibria, are provided in section 11. Section 12 is the conclusion. Implementation of the internal boundary conditions connecting trapped and passing orbits is discussed in appendix.

General formulation of CQL3D-FOW
In the ZOW approximation, all orbits that cross the midplane at a given major radius coordinate (out-board point of a magnetic flux surface) are confined to the same flux surface; therefore, the bounce-averaging is reduced to averaging over poloidal angle [8]. Then, the bounce-average FPE can be expressed through the computational coordinates (ρ, u 0 , θ 0 ) where ρ is the flux surface label, u 0 is the particle speed (relativistically, momentum per rest mass) and θ 0 is the pitch angle at the outer-board midplane point (minimum B point) for a given flux surface. Subscript '0' refers to the minimum-B point for each given surface 'ρ'. The details of BA procedure in the ZOW approximation can be found in Killeen [8] and the CQL3D code manual [6].
In contrast to the ZOW limit, bounce-averaging of the FPE with finite-width (FOW) orbits cannot be reduced to averaging over poloidal angle. A general method for derivation of the FPE in this case is based on two main steps [1][2][3]: (1) Using a Lagrangian formulation, write the 6D kinetic equation in the canonical action-angle space (J, Θ) (note that the Jacobian of such transformation is one); (2) Perform integration over canonical angles, which correspond to the three periodic motions: gyration around magnetic field, orbital transit or bouncing in poloidal angle, and toroidal precession of orbits. The reduced equation is a function of the remaining three action variables J, where the summation convention is used. A particle source such as from NBI is represented by S. The 'orbit-averaged' (canonical-angle averaged, to be exact) diffusion coefficients D ij and advection-like coefficients F i are expressed through the corresponding spatially local values D uu and F u at points along orbit The local coefficients D uu may include the collisional plus RF diffusion coefficients, while F u coefficients are from the collisional drag (and/or toroidal dc electric field). In these equations, u is the particle local momentum-per-rest-mass, enabling a relativistic treatment.
In the equations, one of the canonical action variables J corresponds to the magnetic moment μ, another-to the canonical toroidal angular momentum p φ , and the third represents the toroidal magnetic flux enclosed by an orbit in the poloidal cross-section. The last one is not particularly suitable for numerical coding, especially for setting a computational grid. In general, another set of invariants can be selected that identifies each orbit and which is convenient for setting the computational grids. A number of choices for such a set of invariants had been considered in literature. One straightforward choice (not used here) is I = (u, μ/u 2 , p φ ). The derivation of the Jacobian of transformation from the canonical action variables to these type of I variables is shown in [21]. One should notice, however, that this set of variables does not contain an isolated configuration space coordinate. Instead of the mixed velocity-radial variable p φ , other authors use a selected value of a radial-type coordinate along the guiding center orbit, such as the maximum value of the poloidal or toroidal magnetic flux along orbit [15,22], or the orbit-average value of poloidal flux [4], or the flux surface radius at the innermost point of a passing orbit and radius at the bounce point of trapped particle orbit [23].
For the FOW version of the CQL3D, our choice of 'convenient' variables is dictated by the desire to keep computational grids as close as possible to the original ZOW version, and to maintain direct visualization of the results in velocity and configuration space. Thus, we adopt I = (u 0 , θ 0 , R 0 ), where R 0 is the major radius coordinate along the equatorial plane (the midplane of tokamak, in case of up-down symmetrical equilibrium); for each given R 0 point, the value of particle speed (momentum-per-mass) u 0 and value of the pitch-angle θ 0 at this point determine a unique orbit. The same as in CQL3D-ZOW notation, subscript '0' refers to points on the midplane. Such choice of I also yields an intuitively clear meaning of the solution of FPE: the distribution function f (R 0 , u 0 , θ 0 ) is the local distribution function of guiding centers at point R 0 on the midplane, as shown in figure 1. For any other point away from the midplane, the distribution function, which is constant along each orbit, can be readily reconstructed from the set of computed distributions f (R 0l , u 0l , θ 0l ) at radial-grid points R 0l . It is seen that each orbit contributes to two distribution functions on the midplane: f (R 0a , u 0a , θ 0a ) and f (R 0b , u 0b , θ 0b ), where 'a' and 'b' refer to the two points where orbit crosses the midplane (two 'legs' of an orbit), and, of course, the value of f must be equal at these points.
The transformation from canonical action space J to the adopted computational space I = (u 0 , θ 0 , R 0 ) results in the general form of the bounce-averaged FPE for the particle distribution function f 0 (using notations of [4]): The Jacobian for the transformation, / I ≡ ∂ ∂ J I, is such that f 0 is the number of particles in the volume element I I d 3 . Distinctive from equations (2.1)-(2.3), the bounce-averaged diffusion coefficients and advection terms are now expressed through the transformation coefficients ∂I/∂u, rather than ∂J/∂u, where tensor D uu describes a local diffusion, such as caused by collisions or resonant RF heating, and vector F u corresponds to the collisional friction. Note that Coulomb collisions and RF QL diffusion are spatially local phenomena, changing only particle velocity u, not position. However, scattering locally in velocity changes the spatial trajectory, and through this change the bounce-average radial diffusion terms are formed.
In general, the rhs of equation (2.4) may include other terms. For instance, the presence of dc toroidal electric field at particle g.c. position adds the following term into the local FPE, In addition, equation (2.4) also includes the particle source term corresponding to NBI. It must be expressed through the same set of variables as f (R 0 , u 0 , θ 0 ). The derivation of the source term is given in section 7.

Transformation coefficients
The physical meaning of the transformation coefficients in equations (2.5) and (2.6), for our choice of I = (u 0 , θ 0 , R 0 ), is to describe how an infinitesimal change in local u-space at an arbitrary point (R, Z) of particle orbit modifies the values of (u 0 , θ 0 , R 0 ) at the corresponding orbit position at the equatorial plane. The derivation of transformation coefficients is based on unperturbed orbits specified by the conservation of energy, magnetic moment and toroidal angular momentum, which we write in normalized form as (assuming no radial electric field potential) = u u 0 (3.1a) where m is the mass, q is the charge, R is the major radius, b φ = B φ /B is the ratio of the toroidal component of the magnetic field to the magnetic field strength, and Ψ pol is the poloidal magnetic flux normalized by 2π. For brevity, equation (3.1c) for p φ conservation is written for the case of positive plasma current I φ and positive toroidal field B φ (counter-clockwise, if viewed from top). In the general case for I φ and B φ directions, the pitch-angle θ between particle velocity and the direction Guiding center orbits representing distribution function at a given radial grid point R 0 = 135 cm, for one energy level E = 15 keV. Particles are launched with different initial pitch angles equi-spaced over [0; π] range. Illustration is made for D + ions in an NSTX equilibrium field. The dashed line corresponds to the last closed flux surface, and the magnetic axis is marked with the '+'. Also shown is the reference flux surface that passes through the R 0 = 135 cm coordinate on the equatorial plane. Orbits that reside inside the surface are started as co-current-going particles. of the magnetic field is defined in such a way that cosθ > 0, if the particle travels in co-current direction.
In the matrix form, the components of ∂I/∂u are The lower row is composed of zeros because the gyro-phase is an ignorable coordinate in the local u space. The elements in the top row are found by applying ∂/∂u to each side of equations (3.1),   Also, in the absence of the radial electric field, / θ ∂ ∂ u 0 = 0.

The Jacobian for the transformation
It is convenient to consider a transformation from the canonical action variables to an intermediate set of invariants, and then a subsequent transformation to our set I = (u 0 , θ 0 , R 0 ). In relativistic form, the unperturbed Hamiltonian can be written as and the action variables J 1 and J 3 are In the above, m is the rest mass and q is the particle charge, with sign. For the J 2 , we only need to know that / / π τ ∂ ∂ = H J 2 2 b , the bounce (or transit) frequency. Also, is the gyro-frequency; / ∂ ∂ H J 3 is the angular frequency of orbital toroidal precession-the exact expression is not needed.
Consider now the intermediate set of invariants K = (H, qJ 1 /mH, mJ 3 ), and calculate the Jacobian of the transformation from this set to the canonical action space J. We have and the other terms are not needed. Then, the Jacobian is Finally, the resulting Jacobian for the transformation to I is found from equations (4.4) and (4.9) as Note that the expression under the absolute value sign, if divided by cosθ 0 , is the same as the denominator in the transformation coefficients, as in equation (3.4). This expression becomes zero for the stagnation orbits-the g.c. drift orbits that collapse to a point in the (R, Z) plane but still move toroidally because ∥ u is not zero [24]. On the other hand, the numerator in equation (3.4) or (3.6) also becomes zero for such orbits. To avoid singularity in the code, the transformation coefficients like ∂R 0 /∂u or / θ ∂ ∂ R 0 are set to zero in the proximity of the stagnation orbits' surface in (u 0 , θ 0 , R 0 ).

The components of the diffusion tensor
We now return to the equations (2.5) and (2.6) for the bounceaverage diffusion tensor components. Adopting notations used in ( [8], chapter 2) or CQL3D manual [6], the components of the local diffusion tensor are written as Also, for the collisional drag term, we have For example, in the case of diffusion by Coulomb collisions, the coefficients A, B, C, D, F are expressed through derivatives of the 'Rosenbluth potentials' [25]. A background local distribution function f b (R, Z, u, θ) at a given point of particle orbit is expanded in Legendre polynomials as c os . are further used to define integrals like where / γ = +u c 1 2 2 . These integrals are incorporated into the definition of the Rosenbluth potentials and their derivatives. Details on the definition of the local collision coefficients A through F can be found in ( [8], chapter 2) or the CQL3D manual [6]. The background species in equation (5.3) may include not only Maxwellian ions and electrons, but also the species for which the FPE is being solved (referred to as 'general species' in the code manual). In this case the collision operator becomes non-linear. The complexity of its evaluation in the code comes from the fact that for the orbit averaging, the local distribution f b (R, Z, u, θ) must be known at many discretized (R, Z) points along each orbit in the plasma cross-section, while the solution f 0 ≡ f (R 0 , u 0 , θ 0 ) is only obtained at the equatorial plane. To accelerate this calculation, the reconstruction of the local f from the equatorial f 0 is accomplished by a fast mapping procedure without the need of tracing an orbit from the local point back to the midplane. Essentially, the procedure utilizes a set of arrays defined over uniform grids in (u, μ, p φ ); these arrays store information on radial coordinates of orbit legs at the midplane (R 01 and R 02 ) and corresponding pitch-angles (θ 01 and θ 02 ). For a given local (R, Z, u, θ) point, the values of (μ, p φ ) are readily evaluated, and the nearest indices in the (u, μ, p φ ) grids are identified, 'pointing' to the values of (R 0 , u 0 , θ 0 ), whence the local f b (R, Z, u, θ) is reconstructed.
For the RF-induced quasilinear diffusion, the local coefficients B, C and F are given in section 8. Here, we focus on the general form of the bounce-average coefficients in terms of the local components and transformation coefficients. Based on ∂I/∂u form given by equation (3.2) and D uu tensor in equation (5.1), the components of bounce-average tensor 23 32 5f) The components of bounce-average collisional friction term The first two coefficients, F 1 and F 2 , affect the shape of the distribution function in velocity space (u 0 , θ 0 ), while F 3 coefficient can modify the radial profile of f 0 (R 0 , u 0 , θ 0 ), so it is an advection/pinch term. For the collision coefficients, the bounce-averaging is performed numerically by evaluating the values inside … brackets at a set of points along the unperturbed orbits. Typically, around 100 points are selected along each orbit. In contrast to particle simulation codes, these orbits are only used as a framework for calculating the bounceaverage terms, finding the loss cone in (R 0 , u 0 , θ 0 ) space, etc. Therefore, the orbits are only needed to be traced once, and the number of orbits is relatively small-determined by the grid sizes in (R 0 , u 0 , θ 0 ). For the RF diffusion operator, if being constructed from ray-tracing data, the bounceaveraging procedure does not require the data on the whole orbit. The local (u, θ) space at the intersection of a given orbit with a ray element can be mapped directly onto the midplane, without the need of other points along the orbit. More details are given below.
In the absence of a radial electric field, the expressions for the D ij and F i can be simplified by recalling that ∂u 0 /∂θ = 0 and ∂u 0 /∂u = 1. The other transformation coefficients in this case are given by equations (3.4)-(3.7).
Finally, we rewrite the FPE from equation (2.4) as This form for the bounce-average FPE, together with equations (5.5) and (5.6), provides a general prescription for adding any type of local velocity-space diffusion process into the equation, under the condition that it can be defined similar to the form of equation (5.1), with local coefficients being a function of (R, Z, u, θ).
The above expressions exhibit the basic physics of FOW neoclassical transport in our constants of motion (COM) coordinates: the local collision/RF velocity diffusion leads to scattering in (u, θ) space at each point along the orbit, weighted by the dependence of the COM coordinates on local velocity, ∂I i /∂u, and multiplied by the local collision and RF diffusion coefficients. We emphasize that this formulation of neoclassical transport applies to the full g.c. particle orbits, with no assumption, as is the usual neoclassical theory, that g.c. orbits have a narrow radial extent compared to the plasma radius. Moreover, the radial electric field is amenable to this treatment, but is reserved for future work. We expect that examination of the individual particle flows in our COM space will provide additional insight into neoclassical transport, beyond what is obscured by the distribution function averaging in moment theory; for example, net radial particle, toroidal momentum, and energy transport may be comprised of low and high speed particle flows in opposite radial directions as discussed in 2D (u, ρ) space [26].

ZOW limit
It is instructive to obtain a ZOW form of FPE from the FOW form, to compare with the original version in CQL3D [6]. Starting from equation (3.1c) for the conservation of p φ , we can simply set the terms with b φ or b φ0 to zero, resulting in conservation of Ψ pol along particle orbit. In this case, the numerator in equations (3.4) and (3.6) becomes zero, and then θ θ sin cos sin cos . Also, ∂u 0 /∂u = 1, as before. All other transformation coefficients are equal to zero. Then, with other terms being zero. Then the FPE casts into (source term is omitted) where the Jacobian becomes It is clear that in the ZOW limit, the FPE is reduced to a 2D differential equation, to be solved in (u 0 , θ 0 ) for each radial coordinate. In the Jacobian, the factors that are not dependent on u 0 or θ 0 can be canceled out. After substituting coefficients from equations (6.2) and using u = u 0 , the ZOW-FPE takes the form of This equation can be recognized as the one used in the CQL3D-ZOW version [6]. From comparison of the FOW form given by equation (5.7) with the ZOW form in equation (6.3), one can see what kind of new physics has been added under the FOW formulation. We will refer to the new terms F 3 , D 13 = D 31 , D 23 = D 32 and D 33 as the 'radial transport terms', or 'R-transport' terms, for brevity. These terms give rise to the neoclassical bootstrap current, radial diffusion, and also pinches (term F 3 ), caused by orbit modifications by collisions, RF heating, and also by toroidal electric field. During tests, we can easily disable those R-transport terms to study their role. This is another advantage of using the explicit radial coordinate in our set of I = (u 0 , θ 0 , R 0 ).

The bounce-averaged NBI source operator
The particle source operator must be formed in terms of the equatorial coordinates I = (u 0 , θ 0 , R 0 ) in which the FPE is being solved. Suppose a particle is born at a point (R, Z) in plasma cross-section, with velocity u. In modeling of a neutral beam source of fast ions, the birth points for each of many particles, for each of several beam energies, are weighted by a particle source rate which reflects the geometry and physics considerations in the neutral beam injection/deposition module. In CQL3D, the neutral beam module is based on the Monte Carlo beam ionization code NFREYA [27], generalized in beam geometry and with updated ionization cross-section data. A single ion birth point source can be written in Cartesian coordinates as . After transforming to the canonical action-angle variables (J, Θ) (with and performing averaging over angles Θ, we have an averaged source s k , . The physical meaning of equations (7.1) and (7.2) is not the same because the former is 'counting' individual particles, while the latter is counting orbits as a whole. It may happen that two particles born at different (R, Z) points have exactly same J k vector, i.e. same set of invariants, and thus the same guiding center orbit. In the code, each of these particles contributes to the bounceaveraged operator, effectively accumulating the weight factor mentioned above.
The next step is a transformation to our set I = (u 0 , θ 0 , R 0 ), but before we proceed, a discussion on our technical realization of a delta-function such as used in equation (7.2) is useful. Numerically, the delta-function can be represented by a linear hat function (considering one dimension for clarity): In another example, with twice larger ΔJ ε = 2ΔJ grid , the hat function contributes to the three adjacent grid points, and we have Switching now from J space to I = (u 0 , θ 0 , R 0 ), the source function is transformed into . In contrast to equation (7.2), where an orbit is detected at a single point J k , the source function in equation (7.4) contributes to the two points that correspond to the two legs of an orbit at the equatorial plane. We can write for leg 'a' on the midplane, and similarly-for leg 'b'. In equation (7.5) all quantities are evaluated at the midplane (subscript '0' is omitted here, for clarity). Numerically, the source strength assigned to each leg of an orbit must be same because in the bounce-averaged FPE the source is a function of COM, and thus it takes one value along an orbit with given J k . This condition can be satisfied by the following consideration. The representation of delta-functions by hat functions is not independent at the two legs of orbit. Numerically, an orbit is represented by an orbit having finite extent ΔJ ε . At the equatorial plane, the 'footprint' of such orbit is ΔI ε,a = ΔJ ε /|∂J/∂I| a at leg 'a', and ΔI ε,b = ΔJ ε /|∂J/∂I| b at leg 'b'. Since the value of ΔJ ε is the same at two legs, while the values of Jacobian |∂J/∂I| are different, the 'footprints' ΔI ε,a and ΔI ε,b are also different. Based on this consideration, the following procedure is used for the definition of hat functions. First, the values of ΔR grid |∂J/∂I| are compared at the two legs of an orbit. Suppose that ΔR grid,a |∂J/∂I| a > ΔR grid,b |∂J/∂I| b . Then, at leg 'a' we set ΔR ε,a = ΔR grid,a so that the source value is assigned to the nearest radial grid point l = l a , In the above, we also assumed that the half-width of the hat function in u and θ is Δu ε,a = Δu grid,a and Δθ ε,a = Δθ grid,a . At leg 'b', however, the half-width of the radial hat function should be consistent with the requirement ΔR ε,a |∂J/∂I| a = ΔR ε,b |∂J/∂I| b (and we keep Δu ε,b = Δu grid and Δθ ε,a = Δθ grid where the grid sizes in u and θ are the same for all radial points). Thus, ΔR ε,b = ΔR grid,a |∂J/∂I| a /|∂J/∂I| b .
Then, the source value assigned to the nearest radial grid g rid grid 3 (7.7) It can be seen that it is equal to the source at l = l a in equation (7.6). However, since ΔR ε,b > ΔR grid,b , which follows from the assumption ΔR grid,a |∂J/∂I| a > ΔR grid,b |∂J/∂I| b , there are also contributions from the same orbit (of width ΔJ ε ) to several other radial points next to l = l b , given by The largest value of L in the above is determined by With this procedure, the condition of having the same source value assigned to each leg is satisfied, and also we recover the normalization ( ) / at leg 'a' and at leg 'b', if the integral (in numerical sense) is taken over the 'footprint' of the orbit, i.e. over the grid points [l b − L; l b + L]. This approach allows solving the FPE at each leg independently. As an alternative approach, the source operator (also the diffusion and advection terms) could be defined at one leg of an orbit only, and the solution found at such leg could be simply copied to the other leg. However, this means that at many radial grid points the distribution would be assembled out of numerous pieces taken from different radial and pitch-angle grid points. Not only does this approach introduce a technical difficulty in determining which part of the distribution function at a particular radial coordinate should be found by solving the FPE and which should be copied from elsewhere, but also it can add an excessive numerical noise from discontinuities in such patched distribution function.
Finally we note that in the limit of 'thin' orbits, legs 'a' and 'b' of an orbit fall into the same radial bin, and we have (see equation (6.4)) This expression can be recognized as the source operator in CQL3D-ZOW [6].

Quasilinear operator
The general form of the bounce-averaged diffusion coefficients for the RF operator is the same as that for the collision operator, as specified by equation (5.5). The difference is, of course, in the particular expressions for the local diffusion coefficients in equation (5.1). Here, we consider how to obtain the quasilinear diffusion coefficients based on data passed to CQL3D from a ray-tracing code, such as GENRAY [17]. The details of such procedure in the ZOW case can be found in the CQL3D manual [6], and here we generalize to the FOW form. We assume that each ray trajectory has an effective perpendicular width Δw in the poloidal plane; Δw is associated with the spectral distance between rays, and can be found from the power ( ) ∥ ∥ ∆ = ∆ P P k k flowing in a ray channel and the energy flux S pol in the poloidal plane, The factor 2πR comes from the assumption that the ray channel is toroidally uniform. The effective width Δw is measured perpendicular to S pol . Then, where S pol is the energy flux [28] per unit component of the discretized wave spectrum. The values of S pol , ΔP, ∥ ∆k and RF electric field components along each ray are obtained from ray-tracing data.
In addition, each ray is discretized into spatially local segments ('ray elements') in the poloidal plane. The length Δl of a given element is generally chosen so that ray radial extent is less than the smallest radial distance between flux surfaces that go through the computational radial grid points. For each ray element, the local quasilinear coefficients to be used in equations (5.1) and (5.5) can be expressed through the parallel/perpendicular components as s in 2 cos sin c os 2 2 (8.3c) where θ is the local pitch angle. In accord with the random phase approximation for each wave-particle interaction, summation of the diffusion coefficients from all the elements gives the global RF diffusion coefficient to be used in equation (5.7). For the ion species, in the non-relativistic limit, the parallel/perpendicular components can be written as [29,30] ( where the argument of the Bessel functions J n is /ω ⊥ ⊥ k u c , ω c is the cyclotron frequency, ω is the wave angular frequency, ⊥ k and ∥ k are the wavenumber components perpendicular and parallel to the ambient magnetic field, and RF field comp onents E x and E y are in the direction perpendicular to the ambient magnetic field. The summation is meant to cover all negative and positive numbers, but the resonance condition sets practical limits for the harmonic number n to be within a narrow range. Relativistic expressions (used in CQL3D for electron heating applications) are given in [31], and analysis of the affected momentum space regions-in the CQL3D manual [6].
For bounce-averaging, we consider a crossing of a given ray element with a drift orbit. The local orientation of the orbit path is described by the guiding center drift velocity V gc,pol (the projection onto the (R, Z ) plane), while the direction of the ray element is characterized by the direction of the energy flux in the poloidal plane S pol which is proportional to the ray group velocity projected onto the poloidal plane. The contribution of each ray element to the bounce-averaged coefficient is determined by the portion of orbit path overlapped by the ray element. For instance, if the ray is crossing a particle orbit at right angle, the length of the overlapping path is Δw. The integral over the orbit is reduced to where D loc is a combination of the local coefficients B, C, F from equations (8.3)-(8.5) multiplied by proper transformation coefficients, i.e. one of expressions inside the angle brackets in equation (5.5). Since the length of the ray element is smaller than the distance between flux surfaces, several ray elements may contribute to the same radial bin, each with a weight factor proportional to Δl/ΔR 0l where ΔR 0l is the radial grid size of the radial bin 'l' at the midplane that contains the leg of the orbit that goes through the ray element. Therefore, the contribution from a given ray element must include the Δl/ΔR 0l factor as well as the Δw factor; so, the contribution is proportional to the area of ray element, ΔlΔw. In fact, the orientation of the ray element with respect to the orbit path is not important. If the angle between S pol and V gc,pol is less than 90°, the overlapping path becomes larger than Δw but the projection of the ray element onto the midplane (following the orbit) becomes smaller. The overall contribution is still proportional to the ray element area ΔlΔw. Then, for further derivations, we assume that the crossings are at right angles. For a proper mapping of a ray element length onto the midplane, attention should be paid that the volume in configuration space (and also in phase space) associated with an orbit changes from a local (R, Z) point to the corresponding equatorial point. For a given ray element, we consider two orbits passing through the two end points of the element, so that they are separated by distance Δl. At the midplane, the separation between the two orbits is Δl 0 . By expanding the toroidal angular momentum at the midplane and also at the ray element position, we find where the numerator is evaluated at the ray element position, e norm being the unit vector normal to the orbit path (such that e norm · V gc,pol = 0) and e R being the unit vector in the R-direction; the denominator is evaluated at the midplane. Note that in the ZOW limit, the terms with b φ are absent, and the unit vector e norm is parallel to the gradient of Ψ pol . Then, equation (8.6) is simplified to Δl 0 /Δl = (RB p ) ray-el /(RB p ) 0 , which is incidentally a description of how the distance between the flux surfaces is changed with the poloidal distance. Going back to equations (8.4), we notice that the physical values, including components of wavenumber, RF field components and cyclotron frequency are specific to a given ray element, and they are obtained from a ray-tracing code. On the other hand, the delta-function in these equations defines a resonant line in the local velocity space, which may span over a large range in energies and pitch angles. We can write , a fixed value for a given ray element. Numerically, the delta function is replaced by the linear hat function where Δu ε is the half-width of the hat function, and 1/Δu ε is its height. The value of Δu ε is set to be large enough to cover the biggest grid cell in computational velocity space. The hat function defines a strip in the local (to ray element) velocity space. This strip region can be populated with ( ) ⊥ u u , points or, alternatively, with ( ) θ u, points that are mapped onto the midplane by following the particle that has ( ) θ u, velocity components at the ray element position. Thus, in contrast to the ZOW case, a ray element may contribute to different radial bins at the midplane, depending on the particular value of ( ) θ u, within the local resonance strip. In a summary, the contribution from a given ray element, from a single point ( ) θ u, of the local resonance strip is given by The indices ( j, i, l) correspond to (u 0j , θ 0i , R 0l ) grid point at the midplane to where an orbit is traced, for a particle with (u, θ) at the ray element position. The R 0l grid point is associated with the radial bin width ΔR 0l . The additional factor (∂θ 0 /∂θ)Δθ corresponds to mapping of the local θ-grid bin to the midplane. We assume that the local θ-grid is over-dense, so that several θ-points from the resonance strip can contribute to the same (u 0j , θ 0i , R 0l ) point. Then, the contribution from one θ-point is proportional to the ratio of the Δθ 'footprint', which is (∂θ 0 /∂θ)Δθ, to Δθ 0,j,i,l , which is the accumulated value of (∂θ 0 /∂θ)Δθ, contributed by other θ-points. There is no similar factor for the Δu → Δu 0 mapping because here we assume that u = u 0 , and the u-grid is the same at the ray element and at the midplane. Note that the D loc (u,θ) coefficient includes the weight factor from equation (8.8). To calculate the total contribution from one ray element, the equation (8.9), without Δθ 0,j,i,l , is evaluated for each point in the resonance strip, with values accumulated in corresponding (u 0j , θ 0i , R 0l ) grid points at the midplane; afterwards, the result is divided by Δθ 0,j,i,l . The procedure is repeated for all ray elements and rays, with values of D loc added up at the midplane grid points, to form the quasilinear operator for the FPE equation. The described procedure is quite different and more complicated than that for the ZOW form of QL operator [6]. For instance, in the ZOW version, there is no need to form a local θ-grid within the resonance strip-all points are mapped into one radial bin on the midplane, so that the mapping procedure can be easily done in reverse, i.e. from the midplane θ 0 -grid points back to the local θ-space. In spite of the technically different approach, tests have shown, as consistency requires, that in the ZOW-limit the FOW-QL diffusion coefficients converge to the original ZOW-QL coefficients with good accuracy.
In a typical application of CQL3D to RF ion heating, the distribution function of ions (the 'general' species) may develop a strong non-Maxwellian tail, and therefore the power deposited to ions may well exceed the value calculated in the ray tracing code via linear damping mechanism (with all species being Maxwellian). The power deposited to ions at a given radial bin R 0l is determined by the power flowing in ray channels, which makes up the QL diffusion coefficients, and also by the shape of the ion distribution function, i.e. the solution of FPE found at each time step. As the distribution function evolves, more power is absorbed by ions and consequently less power in ray channels becomes available for the remaining portion of ray elements. The code iterates between calculation of the QL coefficients, the consistent distorted distribution functions, and the damping of the ray energy along the ray channels, to achieve self-consistent non-Maxwellian distributions and damping of the RF.
In the final note to this section, we would like to underline the main difference in formation of the collision and quasilinear operators. The BA collision operator utilizes a set of unperturbed 'sampling' orbits traced for each grid point (u 0j , θ 0i , R 0l ), plus the COM-table that maps a local (R, Z, u, θ) point back to the midplane point(s) for the reconstruction of the local distribution function. In contrast, the formation of the quasilinear operator does not require the data on a set of points along each orbit. The QL operator is formed by a direct COM-mapping from a local (u, θ) space at each ray element position back to the midplane points where the operator is formed. Thus, every ray element is accounted for, as long as the resonance condition is satisfied.

Initial distribution function and rescaling
The initial distribution function over the (u 0 , θ 0 , R 0 ) computational space must be set in such a way that it has the same value at both legs of each particle orbit. This requirement is dictated by the fact that in general (at the bounce time scale) the solution of FPE must be a function of COM, and as such, it has the same value at any point along orbit. If we want to set the initial f 0 (u 0 , θ 0 ) at each given radial point R 0 as a Maxwellian type distribution, the density and temperature in such function can only be a function of COM, rather than the local coordinate R 0 . Technically, this is achieved by the following procedure. For each grid point (u 0j , θ 0i ) at a given radial point R 0l , the value of bounce-average poloidal flux Ψ pol is found during initial orbit tracing and stored in an array over grid indices ( j, i, l). We also assume that the initial profiles of density n and temperature T are set as a function of the generalized radial coordinate ρ, which can be uniquely determined from Ψ pol . Then, we use the value of Ψ pol = Ψ j i l pol , , to determine n(Ψ pol ) and T(Ψ pol ), and to define the value of the Maxwellian-like distribution function for the particular orbit. This value is attributed to both legs of the orbit. It should be noted that with such procedure, the local distribution function f (u 0 , θ 0 ) at each given radial point R 0l is not isotropic in θ 0 because orbits that start at R 0l have different values of Ψ pol (see figure 1). In fact, this deviation from the isotropic distribution accounts, in a typical case, for about 25% of the total bootstrap current, as discussed in section 11, and can be interpreted as a finite orbit width magnetization current; but it is not the actual bootstrap current until the collisional transport (particularly the radial terms) 're-adjusts' the distribution.
The requirement of having the same value at both legs of each orbit must be also maintained when a rescaling of the solution is done during time iterations. In the ZOW version, such rescaling is applied to the distribution functions at each radial coordinate independently, so that the radial density profile remains unchanged during simulations. In the FOW variant, the values of the distribution function at different radial coordinates that are coupled by the same COM must be scaled by the same factor. The easiest way to satisfy this condition is to rescale the distribution function at every radial coordinate by one and the same factor based on total number of particles in plasma volume. Such rescaling keeps the total number of particles in plasma volume unchanged; however, the profile of density calculated from f 0 is allowed to evolve. For a given solution of FPE f 0 (u 0 , θ 0 , R 0 , t), the total number of particles in plasma can be calculated from where the integral over R 0 includes only one leg of each orbit. The integrand in equation (9.1) times (du 0 dθ 0 dR 0 ) gives the number of particles in a 'particle tube' defined by du 0 , dθ 0 and dR 0 at the (u 0 , θ 0 , R 0 ) point and the particle trajectories. The initial value of total number of particles, N(t = 0), and the value obtained at a given time step, N(t), are used to rescale the distribution function f 0 (u 0 , θ 0 , R 0 , t) by the N(t = 0)/N(t) factor-same at each radial coordinate.
It should be noted that in ZOW limit, the Jacobian in equation (9.1) is reduced to the form given by equation (6.4), and then we have 0 0 can be interpreted as a field-line density for a flux surface with out-board radial coordinate R 0 , that is, the total number of particles in a flux tube defined by the perpendicular-to-B area at R 0 .

Boundary conditions
In general, the differencing of the FPE is formulated through the particle flux in the (u 0 , θ 0 , R 0 ) space. The expressions in the square brackets of equation (5.7) are the corresponding three components of the flux. For an internal point (i, j, l) of the (u 0 , θ 0 , R 0 ) grid, we consider a trapezoidal cell with faces at ± ± ± i j l , , 2 half-mesh points where the advection and diffusion coefficients F-D from equation (5.7) must be defined. For the cells at the edge of each grid, we impose the peripheral boundary conditions. They are same as in the ZOW version [8,32], except the fluxes in u 0 and θ 0 direction now include radial terms, and also there is an additional radial flux. As a rule, the fluxes through the outer face of the edge cell are assumed to be zero. For example, for the last point in u 0 grid, j = j max , the coefficients F 1 , D 11 , D 12 and D 13 at j = j max + 1/2 point are set to zero, which implies a zero flux leaving or entering the upper boundary in u 0 . An exception from this rule is F 1 coefficient in case when it includes a dc toroidal electric field in addition to the collisional friction, and the effect of particle acceleration is larger than the collisional drag-in this case a free streaming is allowed off the velocity grid [32]. For the θ 0 = 0 and θ 0 = π edge points, an additional condition is imposed in form of is added. For the borders in the radial grid, we impose a zero radial flux condition at the outboard edge. At the first radial grid point, which is usually selected to be close to the magnetic axis, we do not add the radial transport terms F 3 , D 31 , D 32 and D 33 . This can be justified by assuming / ∂ ∂ = f R 0 0 0 at the plasma center, and also assuming that the radial derivatives of the F 3 , D 31 , D 32 and D 33 terms are also close to zero in the plasma core. In future work, we plan to extend the radial grid to the inner edge of the plasma. This is dictated by necessity to include the counter-passing orbits that reside at R 0 < R axis ; those are the orbits close to O-type stagnation orbits [24]. With such extension of the radial grid, the boundary will be at the plasma edge only. Here, we only gave a brief summary on the peripheral boundary conditions, as the approach to their formulation is not so different from the ZOW case.
The main complication in the FOW formulation is rather the Internal Boundary Conditions (IBC) which must be applied to the regions in (u 0 , θ 0 , R 0 ) space that are connected by the pinch orbits [24]. A set of orbits converging to the pinch orbit is shown in figure 2. In the figure, the barely trapped 'kidney'-shaped orbit with legs at R 0 = 145 cm and R 0 ≈ 68 cm can scatter by collisions into a pinch orbit, and then further into a counter-passing orbit with legs at R 0 ≈ 112 cm and R 0 ≈ 71 cm. In this example the IBC connects fluxes of particles at the (R 0a = 145 cm, θ 0a ≈ 45°) point with fluxes at the (R 0b ≈ 112 cm, θ 0b ≈ 129°) point. The IBCs are form ulated in such a way that the total flux of particles across the two boundaries is conserved, i.e. the flux of particles leaving the trapped region must equal the flux entering the passing region. Also, the symmetry of fluxes at the trapped side of each boundary is used-these fluxes are equal in magnitude but opposite in θ, as they describe the same trapped particle being scattered towards (or away from) each boundary at equal rate. An important simplification in the ZOW case is that the trapped-passing borders are straight lines in (u 0 , θ 0 ) space, and that each IBC connects two boundaries at the same given radial coordinate. In the FOW case, the two legs of the largest banana orbit cross the midplane at different radii; hence, the IBCs should connect the fluxes at different radial grid points.
Before discussing the IBC in more detail, it is useful to describe the other important boundaries in (u 0 , θ 0 ) space for each given R 0 , and their correspondence to the boundaries in a more commonly used COM space (E, μ, p φ ) of energy, magnetic moment and toroidal angular momentum [33][34][35][36].
Let us start with the (p φ , μ) E map and specify the boundaries in it. The subscript E signifies that this map is a slice of (E, μ, p φ ) space at a given energy level. Based on equations (3.1), we can eliminate cosθ 0 , and express μ as a function of p φ , In this form, p φ and μ are the variables, and R 0 (and u 0 ) can be considered as parameters. Parameters R, Ψ pol and B-field are subscripted by '0' as a reminder that they are evaluated at the midplane (although in general, they could be evaluated at any other (R, Z) point). In the (p φ , μ) E space, this equation defines an inverted parabola; the position of its vertex, which corresponds to a cosθ 0 = 0 point, depends only on R 0 . The parabola for orbits started at R 0 = 131 cm is plotted with the dashed curve in figure 3. The left arm of the parabola corresponds to counter-going orbits (including counter-passing and trapped orbits started with cosθ 0 < 0 at R 0 = 131 cm); the lowest point at this arm corresponds to cosθ 0 = −1. Similarly, the right arm of the parabola corresponds to co-going orbits (co-passing and trapped orbits started with cosθ 0 > 0 at R 0 = 131 cm); the lowest point at this arm corresponds to cosθ 0 = +1. In configuration space, the orbits corresponding to the dashed line are shown in figure 4. Also, in figure 3, two other important parabolas are plotted: a smaller one (centered at abscissa value −0.93)-for orbits started at the smallest plasma radius, R 0 = R min , and the large one-for orbits started at the largest plasma radius, R 0 = R max . The major radii points R min and R max mark the equatorial positions of the vacuum chamber wall. In the smaller parabola, the right arm represents co-going orbits that are passing out of plasma, and therefore are lost, while in the larger parabola, the left-arm/counter-going orbits are lost.
The stagnation point-like orbits are marked with o1 and o2 lines. We follow the designations used in [24,37] and in particular [38]. Line o2 corresponds to the point-like orbits that exist at R 0 > R axis region; these orbits become ∥ u = 0 orbits in the ZOW limit. Line o1 corresponds to the point-like orbits that are located in part of region at R 0 < R axis . The x1 line corresponds to the pinch orbits. In the (R, Z) plane, each pinch orbit can be viewed as being composed of two orbits passing through the one and same X-point on the midplane: a smaller counter-passing orbit, and a larger orbit that looks like a copassing but actually a 'kidney'-shaped trapped orbit (the sign of ∥ u alters not far from the midplane, as shown in figure 2). These two orbits have the same values of (E, μ, p φ ); therefore, they are represented by the same line in figure 3 marked x1  (x2, x3), meaning that the 'complete' pinch orbit x1 is the combination of the two orbits x2 and x3 described above. However, one of the orbits (x3) can be lost to the chamber wall, and then the pinch orbit becomes 'incomplete', having only the x2 part. Therefore the length of x2 and x3 lines in figure 3 is not the same. In fact, the only portion of the line where orbits x3 are confined is between points A and S in the figure.
We also add one more line that corresponds to orbits that have ∥ u 0 = 0 at the equatorial plane. By setting cosθ 0 to zero, the equations for such line become In figure 3, this line is designated as 'tp' (for 'trappedpassing'); it is plotted by varying the value of R 0 in the above equations. It consists of two arms-the lower arm corresponds to the range R 0 < R axis , and the upper arm to R 0 > R axis . It is seen that the upper arm closely approaches the o2 stagnation   line. Therefore, the stagnation orbits in this region have the values of pitch angle close to π/2 ( ∥ u 0 ≅ 0); the distinction of their pitch angle from π/2 grows with energy. Also, the lower arm of the 'tp' line follows closely parallel to the x1 pinchorbit line. The separation between these two lines reflects the fact that the point where ∥ u becomes zero along the 'kidney' part of the pinch orbit is not exactly at the equatorial plane but rather at some finite distance off the midplane.
The points where the dashed parabola crosses other curves are of particular importance. For example, point 'a' in the figure separates trapped from co-passing orbits. Points 'b' and 'd' are two other crossing points of the dashed line with the 'tp' line. The narrow region between 'b' and 'd' corresponds to a group of tiny co-passing orbits (in R, Z space) that are nested on both sides from the point-shaped stagnation orbit 'c'. Orbit 'd' starts with ∥ u 0 = 0 at R 0 = 131 cm, and encloses the group of tiny co-passing orbits just to the right of orbit 'c'. Orbit 'e' is the last confined trapped orbit. All orbits between 'd' and 'e' are trapped orbits with the first leg being at R 0 = 131 cm, and the other leg-somewhere at R 0 > 131 cm. For the orbit 'e', the other leg is at R 0 = R max . Note that, on the dashed curve, point 'e' is the trapped orbit started as a counter-going particle, while on the parabola for R 0 = R max orbits, it is the same orbit but it is started as a cogoing particle. Point 'f' is the intersection of the dashed line with the pinch-stagnation line x1. Points below 'f' on the dashed line are the counter-passing orbits (in figure 4, these orbits are labeled as 'ff'). Points between 'e' and 'f' are lost banana orbits, with starting leg at R 0 = 131 cm, and the other leg being outside of plasma (R 0 > R max ). Point 'f' itself corresponds to the pinch orbit and two other associated orbits, as discussed above: the pinch orbit launched from R 0 = 131 cm with the other leg being outside plasma; the counter-passing orbit with starting leg at R 0 = 131 cm and the other leg at R 0 = 34 cm (the location of the X-pinch point); the 'kidney' orbit that has one leg at R 0 = 34 cm and the other-outside of plasma. The intersection of the dashed line with 'tp' line just below the point 'f' is not important in present context; it marks an orbit that has ∥ u 0 = 0 at the midplane at R 0 ≅ 34 cm; however, the other leg of this orbit is not at R 0 = 131 cm, but rather somewhere outside of plasma; it is close in size to the 'kidney' orbit mentioned above. The very same point also represents a counter-passing orbit with legs at R 0 = 131 cm and R 0 = 35 cm (no ∥ u = 0 point along orbit), i.e. one of 'ff' orbits in figure 4.
Most of the complication in describing of the (p φ , μ) E space comes from the fact that a single point may represent two orbits, which might have no common points in configuration space. Also, the (p φ , μ) E map contains no information on the orbit leg positions at the midplane.
Our computational space ( ) θ u , R 0 0 0 is shown in figure 5, for the selected radial point R 0 = 131 cm at the midplane. Each node of the (u 0 , θ 0 ) 2D grid is depicted with a point (these may fuse visually into radial lines), except in the loss region. Solid lines are calculated with the help of (p φ , μ) E maps by numerically finding the intersection points between the parabola representing orbits at R 0 = 131 cm with all relevant boundary lines such as tp, o1, x1, and the two limiting parabolas (orbits launched from R min and R max ). This procedure is repeated for a set of (p φ , μ) E maps with different energy levels, i.e. different u 0 . The values are saved and then plotted in ( ) θ u , R 0 0 0 map as a parametric function (u bndy (u 0 ), θ bndry (u 0 )), for each boundary line. The only exclusion from this procedure is the stagnation line o2; it is found analytically. In comparing to the (p φ , μ) E map, figure 3, we should follow the dashed line '50 keV' in the figure. The points labeled a through f correspond to the same orbits in both maps. Orbits between points a and b are the trapped orbits with the other leg at R 02 < 131 cm, and orbits between points d and e are the trapped orbits with the other leg at R 02 > 131 cm. Orbits between b and d are the tiny passing particle orbits, including the stagnation orbit c. Orbits between e and f are the lost orbits (banana orbits with the other leg outside of plasma). There are no 'complete' pinch orbits at energy level 50 keV, in the sense of line 'S-A' in figure 3. Orbit f can be viewed as the half of pinch orbit-the counter-passing branch that is confined, while the 'kidney'-shaped branch is lost. The 'complete' pinch orbits appear at lower energies in figure 5, where dark triangular regions are present. The size of the right-hand-side triangle, in the co-passing cone, is determined by the length of x3 line in (p φ , μ) E space, and this length depends on u 0 . Note that in the (p φ , μ) E space, although all three borders x1, x2, x3 are shown with a single line, the length of x3 is shorter than that of the other two; line x3 is only present over the dark triangle, between points A and S.
The analysis of ( ) θ u , R 0 0 0 map is important for the FOW model. It gives an insight on how the IBC should be defined. In figure 5, the trapped-passing boundaries (they start at the origin of coordinates and pass through points a, b, d, and f ) are seen to be curved and limited in space. However, as discussed above, we are not generally interested in the boundaries between trapped and passing particles, but only in those associated with pinch orbits. For example, point a in the figure corresponds to a smooth transition from copassing elliptical orbits to D-shaped orbits that have ∥ u 0 = 0 point. These D-shaped orbits shrink in size as the pitch angle approaches point b in the figure, where the tiny D-shaped orbits smoothly transform into tiny elliptical orbits of passing particles. There is no pinch orbit involved at points a, b or d, so there is no need in IBC. In figure 3, the region where 'complete' pinch orbits exist is shown with the short line 'S-A', but, if the energy dimension is added, it becomes a surface that expands towards low energies and vanishes at high energies. The projection of this surface onto our ( ) θ u , R 0 0 0 space in figure 5 is shown with the line that starts at the origin and ends at point B; we designate this boundary as '0-B'. It appears to be an initial part of the '0-a' line, but, to be exact, the two lines do not coincide; they merge together only at zero-energy limit. Any selected point within '0-B' boundary corresponds to the co-going, 'kidney'-shaped branch of a pinch orbit. By collisional pitch angle scattering a barely passing orbit on the co-current side can be transformed into a 'kidney' orbit and then into the 'kidney' branch of the pinch orbit. At the X-point on the midplane, the particle may jump onto the counter-going branch of the pinch orbit, and then further transform into the counter-passing orbit. Such a counter-passing orbit has its starting leg at different radius R 02 , smaller than the given R 0 = 131 cm. Therefore, the IBC should connect a point at the '0-B' boundary with the counter-passing side boundary, such as '0-C' in figure 5, but at a different radial coordinate. The other-side radial coordinate depends on the energy of a particle, implying that the boundary '0-B' at each given radius is linked to many other radial points, except in the low-energy ZOW limit when they are at the same radial coordinate.
For the formulation of IBC, for a given energy (u 0 value) we consider the two pitch-angle grid nodes that are nearest to the '0-B' physical boundary on two sides of it. Starting from one of these nodes at one side of the '0-B' line, the value of the distribution function is extended into the 'virtual point' on the other side of the '0-B' line. The same procedure is carried out for the opposite side of the '0-B' line. This procedure allows formulating the pitch-angle fluxes at each side of the boundary. As shown above, the boundary '0-B' is connected to the boundary '0-C' at another radius, where the same consideration of fluxes across boundary is made. Thus, for each given energy, we have four unknown values of virtual distribution function. The four equations that are needed to find these values consist of the two continuity conditions for distribution function across each boundary (one at '0-B' line and one at '0-C' line), the symmetry of θ 0 -flux on the trapped-particles side of '0-B' line with that of '0-C' line, and the conservation of total θ 0 -flux through the '0-B' and '0-C' boundaries. More details are given in the appendix.

Application of the FOW version for ion heating
For illustration and verification of the full-FOW capabilities, we consider a typical NSTX scenario of plasma heating by means of NBI and high harmonic fast waves (HHFW). Fast ions produced by NBI can effectively interact with HHFW through wave-particle resonant finite-Larmor-radius effects, that is, through the The general results are presented in figure 6, showing average-energy profiles, which are obtained by integration of the distribution function, i.e. the computed solution of FPE, over velocity space at each R 0 midplane coordinate. Although not a flux-surface averaged (FSA) characteristic, such diagnostic quantity is a good reflection of a particular distribution at a given R 0 coordinate; the surface averaging would mix together the midplane distribution functions from many R 0 radial points. As seen in the figure, after the NBI+HHFW heating is applied, the energy is increased in the plasma core from 1.5 keV (which is energy ≅ 3/2 T i at t = 0, before heating) to about 3 keV; the increase is from non-thermal tail ions. In computations without the radial transport terms F 3 , D 13 , D 31 , D 23 , D 32 and D 33 ('no R-transp' in the figure), the energy profile exhibits strong variations of magnitude at different radial points. Similar energy profile variations are also observed in ZOW runs and are a consequence of localized ion heating by several resonances present in the plasma cross-section. However, when the R-transport terms in the col lision and QL operators are added, the isolated peaks in energy profile are spatially diffused and the profile becomes very smooth (bold line in figure 6). In general, the effect of the R-transport terms is a reduction of energy content due to additional transport caused by D 13 , D 31 , D 23 , D 32 and D 33 terms in equation (5.7); without these terms, the particles are lost only through pitch-angle diffusion into the loss cone.
More insight into the effects of radial transport can be gained from inspection of the distribution function plots, such as shown in figures 7 and 8. With the radial terms being For the radial coordinate ρ = 0.5 (R = 138 cm) the effect from the R-transport terms is a reduction of the high-energy tail at specific pitch angles. However, in the plasma core, as seen in figures 8(a) and (b), the radial transport seems to enhance the tail; apparently, it is populated by fast ions from neighboring radial points. It is also instructive to look at the plots of the radial diffusion coefficients. In figure 9, the contour levels of the radial diffusion term D 33 from equation (5.7) are plotted. The collision diffusion coefficient, shown in part (a), is largest at near-thermal energies, where it reaches 10 4 cm 2 s −1 . At about 3V th,i it drops to 2 × 10 3 cm 2 s −1 for the trapped particles. For the passing particles, it is typically below 100 cm 2 s −1 . In contrast, the RF QL radial diffusion coefficient, shown in part (b), exhibits several peaks at energies around 50 keV with values up to 4 × 10 4 cm 2 s −1 and vanishes at thermal energies. The isolated peaks are related to the presence of several IC resonances in plasma. In general, at energies above 20 keV the radial diffusion by RF heating of ions seems to prevail. It should be noted that the shape and the magnitude of the radial transport terms D 13 , D 23 and D 33 in the QL operator strongly depends on the particular wave type and on the injected RF power, however, in most types of RF ion heating one would expect elevated values in the high-energy tail. Another important characteristic obtained in the FOW calculations is the plasma current. To facilitate the comparison with available models, we calculated the FSA profile of plasma current density ∥ j FSA . This is done by reconstructing the local distribution function at a set of poloidal points for each magnetic surface, from the computed solution of the FPE at the midplane. It is clear that for a given magnetic surface, the orbits from different radial grid points at the midplane can contribute to the value of ∥ j FSA , in contrast to the ZOW limit in which each surface confines orbits that have one and the same radial coordinate at the midplane. The profile of ion current density ∥ j FSA in the case of NBI+HHFW heating is shown in figure 10(a). Most of the current at ρ < 0.2 is produced by NBI, while at ρ > 0.6 it is due to bootstrap current. In calculations presented in figure 10(b), the NBI and RF heating were turned off, therefore the calculated ∥ j FSA is only the bootstrap current. This is compared to the profile of current based on the analytical fit model for the bootstrap current [39,40], considering in our case the ion component only. For the analytic model, the profiles of FSA density and energy must be provided; those are obtained from the FOW run (the initially nearly parabolic profiles are slightly changed by the radial transport, especially at the plasma edge). Remarkably, the two current profiles are quite similar. In the FOW calculations, all radial transport terms D 13 , D 31 , D 23 , D 32 and D 33 were enabled. Without them, the profile of ∥ j FSA from the distribution function is significantly lower, up to only 40-50% at ρ = 0.4-0.8. A deviation from the analytical model profile can be attributed to wide ion orbits in the NSTX-level equilibrium field. It is noteworthy that the current computed with CQL3D is lower than that from the model at ρ < 0.8. This is in agreement with results obtained with the NEO code [41], particularly at low collision rate. A more detailed comparison between CQL3D-FOW and NEO using the same plasma profiles and equilibrium data is in consideration for a future work. Further, figure 10(c) shows the results for the ion bootstrap current in a modified equilibrium B: both the poloidal and toroidal magnetic field components are multiplied by a factor of four. The larger B-field makes the orbits four times narrower, and now the match between the analytical model (assumes narrow orbits) and ∥ j FSA derived from solution of FPE is almost perfect. The peak at the edge appears to be caused by the losses of counter-current going ions, which are not accounted by the analytical model.
Interestingly, the computed profile of bootstrap current is not very sensitive to imposing of internal boundary conditions, the implementation of which was described in the previous section. Most of the current is generated by merely enabling the D 13 , D 31 , D 23 , D 32 and D 33 terms, while the IBC are disabled. This result appears to be consistent with the simplified model for bootstrap current as a correction to ZOW-type bounce-averaged FPE [42,43]. Effectively, in terms of our FOW equations, the simplified model only considers one radial term, namely D 23 = D 32 given by our equation (5.5e), and only the last term in it, which legs of trapped orbit. Therefore, the change of the [(∂θ 0 /∂θ) (∂R 0 /∂θ)] factor from one leg of the trapped orbit to another is of order the banana width, and we recover the simplified bootstrap-current generating term in [42], which is proportional to the width of the trapped orbit at trapped-passing boundary and bounce-average pitch-pitch diffusion coefficient. Thus, it appears that merely enabling the radial term D 23 is sufficient to generate the bootstrap current, at least in the limit of narrow orbits. The weak dependence of the calculated current on the four extra equations corresponding to the internal boundary conditions (see the end of section 10) suggests that the presence of the D 13 , D 31 , D 23 , D 32 and D 33 terms largely enforces the conservation of flux and other conditions that were used in the formulation of IBC.

Discussion and conclusions
The finite-difference bounce-averaged Fokker-Planck code CQL3D has been upgraded to include the finite-orbit-width (FOW) effects. This is achieved by transforming the FP equation written in canonical action variables to another set of COM coordinates. The distinctive feature of our approach from that considered by other authors is the selection of the major radius at the equatorial plane as one of the COM coordinates. Derivations are provided for the transformation coefficients from a local point along orbit to the midplane COM coordinates, and also for the Jacobian of transformation from the canonical action variables to our coordinates. It is shown that they converge to a familiar form used in the ZOW limit of bounce-average FPE. The details are given for the computational implementation of the FOW forms of the col lision operator, quasilinear RF operator and the NBI particle source. Also we discuss how to start the initial distribution function and to rescale the distribution function at each time step, in order to compensate for the particle losses through the radial and pitch-angle diffusion. The outlined rescaling procedure is consistent with the requirement of keeping the value of distribution function constant along the orbit, but as a consequence, the profile of plasma density must be allowed to evolve; this is different from rescaling the distribution function in the ZOW version when the profile could be kept unchanged. A significant portion of this paper is dedicated to the identification of internal boundaries associated with the pinch orbits, which connect the fluxes of nearly trapped co-and counter-passing particles at different radial coordinates. The analysis shows how the different types of particle orbits and boundaries in our (u 0 , θ 0 ) space at a given radial coordinate R 0 are related to boundaries in a more commonly used ( p φ , μ, E) space. Effectively, this analysis helps to understand the forms of the loss cone, trapped particle regions and other features of the local distribution at a given radial coordinate. The implementation of the IBCs turned out to be the most difficult technical problem in development of the FOW version. Improvements are ongoing. The main challenges are related to the fact that the physical boundaries in (u 0 , θ 0 ) space do not match the grid lines corresponding to constant pitch-angle, and also that the trapped orbit width sizes generally are not commensurate with the distance between the radial grid points. However, there are several properties of boundaries that lessen the impact of complications with the IBCs. First, we have shown that the number of links between the relevant (u 0 , θ 0 ) boundaries at different R 0 coordinates is quickly reduced with an increase of ion energy. For NSTX conditions, there are no links and therefore no IBC needed at energies above 57 keV. Second, we observed that the results of simulations are not very sensitive to the presence or absence of IBC. As a consequence, a numerical implementation of IBC may allow a certain leeway.
In contrast, the radial transport terms D 13 , D 31 , D 23 , D 32 and D 33 that appear in the FOW formulation are found to be crucial in calculation of the bootstrap current and the radial diffusion. We emphasize that these radial terms are evaluated using realistic drift orbits rather than a first-order expansion of orbits around flux surfaces. In general calculations, all four possible coefficients related to the radial coordinate must be included, which are F 3 (the advection/pinch term), D 13 = D 31 , D 23 = D 32 and D 33 . Similar radial terms (except F 3 ) appear in the bounce-average QL operator, but their structure in (u 0 , θ 0 ) velocity space is very different from that in the collision operator.
The fully-neoclassical FOW version of CQL3D has been applied to an NSTX ion heating scenario with NBI and HHFW sources. The fast ions produced by NBI are slowing down by collisions, but on the other hand they are diffused to higher energies by high harmonic fast waves. The simulation results demonstrate the code capability to describe the physics of transport phenomena in plasma with auxiliary heating, in par ticular, the enhancement of the radial transport of ions by RF heating and the occurrence of the bootstrap current. Solution of the full 3D coupled difference equations, giving a large sparse set of order a million equations, is accomplished iteratively using the SPARSKIT package [44]. Because of the bounce-averaged type of the FPE, the results are obtained in a relatively short computational time compared to higher dimensional solutions. A typical FOW run time is 1 h using 140 MPI cores. Due to a time-implicit differencing, calculations with a large time step (tested up to dt = 0.5 s) remain stable.

Appendix. Implementation of IBC
Schematically the internal boundaries in (u 0 , θ 0 ) space are shown in figure A1. The dashed lines represent the physical boundaries, such as lines '0-B' and '0-C' in figure 5. The solid lines designate the θ 0 -grid nodes. We designate the physical boundary with θ 0 < π/2 as the 'lower trapped-passing boundary', and the nearest θ 0 -grid points as 'ipl' for the last passing particle and 'itl' for the first trapped particle. Strictly speaking, the orbit corresponding to pitch angle θ 0 (ipl) may not be a passing particle orbit but rather a 'kidney' shaped trapped particle orbit, as discussed in section 10, but this is not important for the formulation of IBC. Similarly, we designate the physical boundary with θ 0 > π/2 as the 'upper trapped-passing boundary', and the nearest grid points as 'itu' for the last trapped particle and 'ipu' for the first passing particle. Although the upper and lower boundaries are pictured together in one plot, in general those boundaries that are linked by IBC are at different radial coordinates, and the radial distance between such boundary points becomes larger with particle energy. This is illustrated in figure A2, where the IBC 'links' are shown for 30 keV deuterium ions in NSTX equilibrium conditions. The vertical dashed lines mark the position of radial grid points. The pitch angle grid could be shown by a set of horizontal lines, but these are omitted for clarity. It is seen that for 30 keV particles, there are only several links present between the upper and the lower trapped-passing boundaries. Each link connects the two legs of the largest trapped particle orbit. We designate the radial grid index at the lower boundary as 'l_a', and the radial grid index at the upper boundary as 'l_b'.
Returning to the diagram in figure A1, we consider the pitch-angle fluxes H at each boundary. This is the expression under the ∂/∂θ 0 operator in equation (5.7) that incorporates partial derivatives of the distribution function, such as ∂f 0 /∂θ 0 , and therefore requires the knowledge of the distribution function at 27 adjacent grid points (three for each dimension) for the flux value of H(R 0 , u 0 , θ 0 ) at a given grid point. In particular, for the 'ipl' point, we need to use the adjacent ipl − 1 and ipl + 1 points. The distribution function at ipl + 1 point is not the value of f 0 at the 'itl' point, but rather it is a virtual ('ghost') value to be found from the conservation of H fluxes. We use a star symbol to distinguish such virtual points, as in -0. H itu+0. 5 and H ipu-0.5 Figure A1. Schematic diagram for internal boundaries. The lower (right-hand side) and upper (left-hand side) portions of pitch-angle nodes are generally at different radial coordinates, depending on particle energy. At the low-energy limit, these boundaries are at the same radial coordinate.
vector of solution to be found, while the matrix of coefficients contains four extra rows corresponding to the IBC above; these four extra unknowns and four extra rows are present in each (R 0 , u 0 ) grid section.