The chirally improved quark propagator and restoration of chiral symmetry

The chirally improved (CI) quark propagator in Landau gauge is calculated in two flavor lattice Quantum Chromodynamics. Its wave-function renormalization function $Z(p^2)$ and mass function $M(p^2)$ are studied. To minimize lattice artifacts, tree-level improvement of the propagator and tree-level correction of the lattice dressing functions is applied. Subsequently the CI quark propagator under Dirac operator low-mode removal is investigated. The dynamically generated mass in the infrared domain of the mass function is found to dissolve continuously as a function of the reduction level and strong suppression of $Z(p^2)$ for small momenta is observed.


I. INTRODUCTION
The quark propagator is one of the fundamental objects in Quantum Chromodynamics (QCD). The mass function of the quark propagator reveals the value of the running quark mass in the deep ultraviolet (UV) where interactions are weak due to the asymptotic freedom of QCD. In the infrared (IR), the dynamical generation of mass which is associated with the spontaneous breaking of chiral symmetry is exhibited by the mass function. The IR is not accessible with perturbative methods; lattice QCD provides a nonperturbative ab initio approach to QCD and thus is a well adapted tool to study the IR physics of the strong nuclear force.
The quark propagator is a gauge dependent object and thus the gauge has to be fixed in order to study its properties; we adopt the manifestly Lorentz covariant Landau gauge for the present work. The Landau gauge quark propagator has been studied on the lattice with various fermionic actions. Some initial investigations using (improved) Wilson fermions have been reported in Ref. [1,2]. A series of studies using standard Kogut-Susskind [3] and Asqtad [4] quarks found that staggered quarks are well suited to explore the properties of the quark propagator on the lattice [5][6][7][8][9][10].
Lattice Dirac operators that fulfill the Ginsparg-Wilson (GW) equation allow for lattice fermions that have an exact chiral symmetry at nonzero lattice spacing. The overlap operator [11,12] provides a solution to the GW equation. The quark propagator from the overlap action has been examined in [13][14][15][16][17][18][19]. The drawback of overlap fermions is their very high computational cost which renders them impractical for full dynamical simulations.
In this letter we analyze the quark propagator from the so-called chirally improved (CI) Dirac operator [20,21] which fulfills the GW equation not exactly, but only approximately. Nevertheless, the gain in simulation time of * mario.schroeck@uni-graz.at roughly one order of magnitude, in comparison to overlap fermions, allows for an investigation of the propagator on full dynamical configurations [22,23]. The better chiral properties of the CI operator as opposed to Wilson's fermion action make it well suited to explore effects of spontaneous chiral symmetry breaking on the lattice.
Banks and Casher formulated a relation of the density of the smallest nonzero eigenvalues of the Dirac operator to the chiral condensate [24]. In [25] we have studied the effects of removing the lowest eigenmodes of the Hermitian CI Dirac operator γ 5 D CI on the meson spectrum and found signals for the restoration of chiral symmetry (the masses of the ρ and a 1 became approximately degenerate, cf. [26]) whereas confining properties persisted. The authors of [27] expand the Wilson loop in terms of Dirac operator eigenmodes and detect that removing the lowest modes does not influence the static quark potential qualitatively.
A portion of this study aims at answering the question, how change the quark wave-function renormalization function Z(p 2 ) and the quark mass function M (p 2 ) under Dirac low-mode removal? It is expected that the mass function flattens out in the IR once chiral symmetry is restored. Yet another question of interest is how the Dirac eigenmode truncation level at which chiral symmetry was found to be approximately restored [25], compares to the loss of dynamical mass generation in M (p 2 ) as a function of the truncation level.
The remainder of this work is as follows: in Sec. II we briefly summarize the defining equations of lattice Landau gauge fixing. In Sec. III we first remind the reader of the main steps in the construction of the D CI operator, followed by a discussion of Z(p 2 ) and M (p 2 ) from the D CI at tree-level and in the full interacting case. In order to reduce the dominant lattice artifacts we apply tree-level improvement and test a multiplicative and an hybrid scheme of tree-level correction. In Sec. IV we investigate Z(p 2 ) and M (p 2 ) from the D CI under Dirac low-mode removal and in Sec. V we summarize and conclude.

II. GAUGE FIXING
The continuum Landau gauge condition, can be realized on the lattice by requiring the maximization of the gauge functional with respect to gauge transformations g(x) ∈ SU (3) where The sum in Eq. (2) runs over the four Dirac components µ and all lattice sites x. Once such a gauge transformation is found, the discrete Landau gauge condition A measure for the achieved Landau gauge "quality" is given by here the trace goes over the color indices, N c is the number of colors and V is the number of lattice points. In the later discussion of the CI quark propagator we will choose θ < 10 −10 as the stopping criterion for the gauge fixing algorithm. We accelerate the costly task of lattice gauge fixing by utilization of the graphics processing unit (GPU) with NVIDIA R 's CUDA TM (Compute Unified Device Architecture) programming environment, as pointed out in the Appendix A.
For a general discussion of lattice gauge fixing and its problems we refer to [28].

III. THE CI QUARK PROPAGATOR
In the present section we analyze the lattice dressing functions from the CI quark propagator after having repeated the main steps in the construction of the CI Dirac operator.

A. The CI Dirac operator
The so-called chirally improved Dirac operator D CI was introduced in [20] and first analyzed in [21] where also its spectral properties were studied. An initial quenched hadron spectroscopy using the D CI was examined in [29] before dynamical configurations including two light degenerate CI quarks have been generated in order to calculate the hadron spectrum in a series of papers [22,23,30,31]. Renormalization factors of quark bilinears of the D CI were studied in [32,33].
The CI Dirac operator is an approximate solution to the GW equation. It is constructed by expanding the most general Dirac operator in a basis of simple operators, where the sum runs over all elements Γ i of the Clifford algebra. The coefficients c (i) xy (U ) consist of path ordered products of the link variables U connecting lattice sites x and y. Inserting this expansion into the GW equation then turns into a system of coupled quadratic equations for the expansion coefficients of the D CI . That expansion provides for a natural cutoff which turns the quadratic equations into a simple finite system.
The ansatz is constructed such that all symmetries of the fermionic action are maintained and moreover γ 5hermiticity is imposed. The so-called clover term [34] is included for O(a) improvement where the c sw parameter is set to its tree-level value (one).

B. Configurations
For the analysis of the CI quark propagator we use 125 gauge field configurations [22, 23] of lattice size 16 3 × 32 and lattice spacing a = 0.144(1) fm. The configurations include two light degenerate dynamical CI quark flavors with the mass parameter set to m 0 = −0.077 and a resulting bare AWI-mass of m = 15.3(3) MeV. For the simulation of the gauge fields as well as for our valence quarks, paths up to length four are used in the ansatz Eq. (7) and the corresponding coefficients can be found in [22].

C. Nonperturbative quark propagator
The continuum quark propagator at tree-level reads where m is the bare quark mass. In a manifestly covariant gauge like Landau gauge, the interacting renormalized quark propagator S(µ; p) can be decomposed into Dirac scalar and vector parts S(µ; p) = ip /A(µ; p 2 ) + B(µ; p 2 ) −1 (9) or equivalently as In the last equation we introduced the wave-function renormalization function Z(µ; p 2 ) = 1/A(µ; p 2 ) and the mass function M (p 2 ) = B(µ; p 2 )/A(µ; p 2 ). On the lattice, the regularized quark propagator is calculated and consequently it depends on the cutoff a. The regularized quark propagator S L (p; a) can then be renormalized at the renormalization point µ with the momentum independent quark wave-function renormalization constant Z 2 (µ; a), Whereas the mass function M (p 2 ) is independent of the renormalization point µ (and equivalently of the cutoff scale a), the wave-function renormalization function Z(µ; p 2 ) differs at different scales but can be related from different scales by multiplication with a constant, i.e., by the ratio of the two different quark renormalization constants.
Below we extract the nonperturbative functions M (p 2 ) and Z(p 2 ) ≡ Z 2 (µ; a)Z(µ; p 2 ) directly from a lattice calculation as it was discussed in great detail in, e.g., Ref. [35]. We perform a cylinder-cut [1] on all our data and average over the discrete rotational and parity symmetries of S L (p; a) to increase the statistics.

D. The lattice quark propagator at tree-level
For the sake of easier notation we will suppress the a dependence of the lattice quark propagator and write Z L (p) and M L (p) as functions of p rather than p 2 in the following discussion.
The lattice quark propagator at tree-level S (0) L (p) differs from the continuum case, Eq. (8), due to discretization artifacts, The dressing function A (0) L (p) is by construction equal to one at tree-level (at least without tree-level improvement) and thus the function B   The result is consistent with the analytically derived expression for the D CI momenta given in Appendix B.
The tree-level mass function aM

E. The interacting propagator
We expect the interacting propagator to have a similar form to the continuum case Eq. (10), hence we write The functions aM L (p) and Z L (p) extracted from the lattice Monte Carlo simulation are shown in Fig. 2 and Fig. 3 (blue triangles), respectively. The shape of aM L (p) is similar to aM (0) L (p) and also Z L (p) strongly deviates from the expected monotonically growing behavior, thus is clearly altered by discretization errors. To get a handle on the lattice artifacts, i.e., to retain the shapes of the wave-function renormalization function and the mass function familiar from earlier lattice works as well as from Dyson-Schwinger equation studies [36], we discuss improvement and tree-level correction in the forthcoming subsections.

F. Improvement
The Symanzik improvement program [37] offers a systematic way to reduce the errors of the fermionic action to O(a 2 ). All terms that have the correct dimensionality and the symmetries of the QCD fermionic Lagrangian must be included into the action: The terms L 3 and L 5 can be accounted for by a redefinition of the bare parameters m and g. L 2 and L 4 are only needed for off-shell quantities like hadronic matrix elements or the quark propagator [38]. Thus for onshell quantities it is sufficient to take the clover term [39] (which corresponds to L 1 ) into account. Note that whereas exact GW fermions are automatically O(a) improved, the CI operator fulfills the GW equation only approximately and thus the clover term is included in the CI action.
Since the quark propagator is an off-shell quantity we would like to include the terms L 2 and L 4 as well. In [40] it is shown that at tree-level L 2 and L 4 can be eliminated by a transformation of the fermion fields according to Improvement beyond tree-level requires tuning of the coefficients of the fermion field transformations [41] which we do not attempt. Hence we adopt the above fermion field transformations under which the quark propagator turns into [1,2] S I (x, y) ≡ (1 + am)S(x, y; U ) − a 2 δ(x, y) where the index I denotes improvement. In Eq. (22), S(x, y; U ) is obtained by inverting the D CI operator on each configuration and the brackets denote Monte Carlo integration over the gauge fields U . All results that follow have been tree-level improved according to the above prescription.

G. Tree-level correction
In order to blank out the lattice artifacts which are already present at tree-level, we now focus on the derivation of the interacting propagator from its tree-level form.
For the renormalization function Z L (p) we adopt a multiplicative tree-level correction As can be seen in Fig. 3 (red circles), this procedure together with the tree-level improvement from the previous subsection flattens Z L (p), hence reduces the dominant lattice artifacts. However, the fact that the function is still not monotonically growing indicates that the improvement coefficients are not sufficiently adjusted to remove all O(a) errors when simply picking their tree-level values.
In order to apply a multiplicative tree-level correction to the mass function of the form we have to carry out an additive mass renormalization of the tree-level function B (0) L (p) in order to avoid divergences, i.e., where am add is chosen such that B As a result, the multiplicative tree-level correction for the mass function is Alternatively, we may adopt an hybrid tree-level correction which is based on the ideas developed in Ref. [2]: if p < p ′ , then perform an additive tree-level correction and for momenta larger than p ′ apply a multiplicative tree-level correction The momentum parameter p ′ should be adjusted thereby such that M L (p) is continuous and smooth at p = p ′ which we found to be the case for p ′ = 1.5 GeV. Both possibilities of tree-level correction for the mass function M L (p) are plotted in Fig. 4. We observe that the pure multiplicative correction (blue crosses) results in an infrared enhanced function, enhancement occurring from 1.25 GeV on downwards and appearing to be rather steep. The hybrid scheme (red circles), on the other hand, exhibits a wider range of IR mass generation (from 2.5 GeV on downwards), gives a higher IR mass and yields flattening of the mass function in the deep IR. The hybrid scheme allows for an earlier mass generation due to the fact that the multiplicative correction therein (for p ≥ p ′ ) does not require an additive mass renormalization since the zero-crossing of the tree-level function is handled by the additive tree-level correction (p < p ′ ).
When comparing these results to lattice quark propagator studies from a different fermionic action, for example to the (quenched) overlap quark propagator [13][14][15][16][17][18][19], we find better agreement for the hybrid scheme. It has to be stressed however that the parameter p ′ introduces a small arbitrariness to the procedure whereas the simpler pure multiplicative scheme provides a straightforward comparison of the interacting mass function to its tree-level counterpart while still yielding qualitatively the correct physics. Consequently, for the next section we adopt the simpler multiplicative scheme for the analysis of the effects of Dirac low-mode removal on the quark propagator mass function in order to avoid possible systematic errors related to the tuning of p ′ .

IV. RESTORATION OF CHIRAL SYMMETRY
The lowest Dirac eigenmodes are known to play a crucial role for dynamical chiral symmetry breaking as stated by the Banks-Casher relation [24]. The latter relates the chiral condensate to the density of the smallest nonzero Dirac eigenmodes. As a consequence, when removing the Dirac eigenmodes near the origin from the theory, the chiral condensate vanishes and chiral symmetry becomes "artificially restored" [25].
The aim of the current work is to analyze the effects of artificial chiral restoration on the dressing functions of the quark propagator. Consider the Hermitian Dirac operator D 5 ≡ γ 5 D which is normal and thus has real eigenvalues µ i . D can be written in terms of the spectral representation of D 5 , We split the quark propagator S = D −1 into a low-mode part (lm) and a reduced part (red), e.g., using the eigenvalues and eigenvectors of D 5 , Hence we can obtain the reduced part of the propagator by subtracting the low-mode part from the full propagator We calculate the quark wave-function renormalization function Z L (p) and the quark mass function M L (p) from the reduced propagators of Eq. (33) with varying reduction levels k = 2 − 512. We tree-level improve the modified propagators and apply the multiplicative tree-level correction scheme, cf. Sec. III. The dressing functions from reduced propagators are presented in Fig. 5 and Fig. 6.   Fig. 6, demonstrates a similar behavior: it gets suppressed in the IR when removing more and more eigenmodes until the dynamic generation of mass completely ceases at truncation stage k ≈ 128.
In Fig. 7 we compare the deep IR mass of the CI quark propagator from M L (p 2 min ), at the smallest available momentum p min = 0.1345 GeV, as a function of the reduction level to the mass splitting of the vector meson ρ and its chiral partner the axial vector current a 1 , taken from Ref. [25]. Note that the reduction level k can be translated to an energy scale which is given in the lower   The infrared mass ML(p 2 min ) of the reduced CI quark propagator as a function of the reduction level compared to the mass splitting between the ρ and the a1 from Ref. [25]. The upper abscissa shows the truncation level k and the lower abscissa gives the corresponding energy scale, the relation between the two is obtained by integrating the histograms of the D5 eigenvalues.
abscissa of the figure and was derived in [25] by integrating the histograms of the eigenvalues.
The mass splitting between the ρ and the a 1 rapidly drops down and reaches a plateau after subtracting about 16 eigenmodes; it does not go down to zero which can most likely be attributed to the small explicit chiral symmetry breaking by the nonvanishing quark mass. In contrast, the dynamically generated mass of the quark propagator, M L (p 2 min ), decreases slower and reaches its plateau only after subtracting more than 128 Dirac eigenmodes.

V. CONCLUSIONS
The wave-function renormalization function Z(p 2 ) and the mass function M (p 2 ) from the CI quark propagator have been analyzed on configurations with two light degenerate CI quark flavors. It has been demonstrated that the combination of tree-level improvement and a multiplicative or hybrid tree-level correction scheme drastically reduce the dominant lattice artifacts.
Removing the lowest Dirac eigenmodes out of the quark propagator strongly suppresses the wave-function renormalization function in the IR and completely dissolves the dynamically generated mass displayed by M (p 2 ). Under Dirac low-mode removal, the mass function is found to reveal a smoother transition towards chiral restoration than the splitting of the vector and axial vector currents. In the current Appendix we discuss how the process of lattice gauge fixing with the overrelaxation algorithm can be accelerated by using NVIDIA R 's CUDA TM (Compute Unified Device Architecture) programming environment for GPUs. We compare the performance of the overrelaxation algorithm on one GPU (NVIDIA GeForce R GTX 580) with conventional calculations on the CPU and apply techniques to relax the bandwidth restrictions of the GPU.
In the recent years, many groups in the lattice QCD community have taken advantage of the cost effective opportunity to adopt GPUs for high-performance lattice QCD computations. Whereas the pioneering work of GPU calculations in lattice QCD was reported in [42], some more recent examples are given by [43][44][45][46][47][48][49][50].

Gauge fixing via (over)relaxation
The underlying idea of the relaxation algorithm [51] is a local optimization of the gauge functional F g [U ], i.e., for all x the maximum of Re tr [g(x)K(x)] is wanted, where The solution of the aforementioned subtask is given, in the case of the gauge group SU(2), by and for SU(3) one iteratively operates in the three subgroups of SU (2). From equations (A1) and (A2) it is evident that one can optimize all sites in each of the black and white checker sub-lattices simultaneously.
In order to reduce the critical slowing down of the relaxation algorithm on large lattices, the authors of [52] suggested to apply an over relaxation algorithm which replaces the gauge transformation g(x) by g ω (x) in each step of the iteration. This method has widely been studied and the value of ω was found to be well adapted at around 1.7, see [28] and references therein.

Lattice QCD on the GPU
Since CUDA supports only lattices up to three dimensions natively, one single index that runs over all four dimensions of the space-time lattice is used. We assign each lattice site to a separate thread and start 32 threads per multiprocessor. Better occupancy would be achieved with more threads per multiprocessor but we are restricted by the 48 KB of L1 cache.
A function which is called from the host system and which performs calculations on the GPU is called a kernel. We implemented two kernels, one which checks the current value of the gauge fixing functional, Eq. (2), and the gauge precision, Eq. (6), after every 100th iteration step and a second which does the actual work, i.e., which performs an overrelaxation step. The latter is invoked for black and white lattice sites consecutively.

Optimization
The GPU can read data from global device memory in an efficient way only if the data is accurately coalesced; that means the largest memory throughput is achieved when consecutive threads read from consecutive memory addresses. In order to do so we rearrange the gauge field into two blocks, one for even and one for odd lattice sites. Moreover, for the same sake of memory coalescing, we choose the site index running faster than color and Dirac indices.
Applying the overrelaxation algorithm to one lattice site requires 2253 floating point operations and we have to read and write eight SU(3) matrices for every site; thus the required data transfer in single precision is 1152 bytes per site. Comparing the ratio data transfer per floating point operation, 1152/2253 ≈ 0.51, with the theoretical peak performance of the GTX 580, 192/1581 ≈ 0.12, we clearly see that we are solely constrained by memory bandwidth and not by the maximum number of arithmetic instructions.
In order to reduce memory traffic we make use of the unitarity of SU(3) matrices and reconstruct the third line of each matrix on the fly when needed instead of storing it [43]. A minimal 8 parameter reconstruction turned out to be numerically not stable enough for our purpose since we not only read but also write the modified gauge fields in each step of the iteration. For more details see [43,44] and references therein.

Performance
We generated quenched SU(3) configurations with the heat-bath algorithm with 24 3 spatial lattice sites and a varying temporal extent from 4 to 128. On these configurations we compare the performance of our GPU kernels on the GTX 580 in single and double precision to the performance of the equivalent code (with a data alignment more appropriate for the CPU) on one core of the Intel R Core TM i7-950 ("Nehalem Bloomfield") processor run at 3.06 GHz, whereby the CPU code is optimized through SSE4 instructions generated by the compiler. The results of the performance test are given in Fig. 8: in the l.h.s. plot we show the performance of the algorithm using a 12 parameter reconstruction and a full 18 number representation in single and double precision together with the performance of the same code run on the CPU in single precision. We achieve a maximum performance of 135 Gflops (independent of the lattice 1 Using the Intel compiler (12.0.0) with the compiler flag SSE4.2. size) for the case of the 12 parameter reconstruction in single precision. On the r.h.s. of Fig. 8 we present a summary of the time needed to fix the gauge for the various settings up to the test accuracy of θ < 10 −6 . Overall, we find that for the task of gauge fixing with the overrelaxation algorithm the computational power of one GPU is equivalent to approximately 40 CPU cores (under the assumption of ideal scaling).
The wave-function renormalization function is equal to one at tree-level by construction.