Towards efficient density functional theory calculations without self-interaction: The Fermi-L\"owdin orbital self-interaction correction

The Fermi-L\"owdin orbital (FLO) approach to the Perdew-Zunger self-interaction correction (PZ-SIC) to density functional theory (DFT) is described and an improved approach to the problem of optimizing the Fermi-orbitals in order to minimize the DFT-SIC total energy is introduced. To illustrate the use of the FLO-SIC method, results are given for several applications involving problems where self-interaction errors are pronounced.


Introduction
Density functional theory (DFT) is the workhorse of computational condensed matter physics. Nowhere is this more evident than in a materials genome project where DFT calculations, typically using the Perdew-Burke-Ernzerhof (PBE) [1] generalized gradient approximation (GGA) to the exchange-correlation functional, are performed for thousands of materials to create a large database of calculated results that can be mined to discover property trends. A number of these projects were discussed at CCP2018. PBE is used for these calculations because it combines reliability with efficiency, i.e. accurate predictions of materials properties can be produced at relatively low computational cost. A subtle factor in the success of this enterprise is that the calculations are done for atomic configurations at or near equilibrium. In such situations PBE and similar functionals generally perform well; however, the reliability of these methods deteriorates when the atoms are far from equilibrium. For example, chemical transition states typically involve strongly stretched bonds and PBE and other functionals significantly underestimate the corresponding reaction barriers. The culprit behind this breakdown is self-interaction error (SIE), which is present in all approximate semilocal density functionals due to an inexact cancellation of self-Coulomb and self-exchange-correlation energies. In addition to too-low reaction barriers, SIE causes a raft of other problems in DFT calculations, from orbital energies that are too high, to problems in the description of charge transfer.
The Perdew-Zunger self-interaction correction (PZ-SIC) [2] was introduced in the 1980's to remove electron self-interaction from DFT. Early tests indicated that many SIE-related problems were indeed fixed by PZ-SIC, but the method was not widely adopted for two reasons. First, the method is computationally demanding relative to uncorrected DFT, and second, while the method can improve the success of DFT in situations where SIE is important, it degrades the performance of PBE and related functionals in settings where DFT already works well [3,4]. The Fermi-Löwdin orbital self-interaction correction (FLO-SIC) [5][6][7] is a recent implementation of PZ-SIC that offers a potential computational advantage over earlier approaches. A long-term goal of our research is to use FLO-SIC to explore new approaches of combining SIC with DFT to remove the effects of SIE without diminishing the otherwise good performance of DFT. A first step in this direction is to more fully assess the method for problems where SIE is known to be important. In this paper we present a brief overview of the FLO-SIC formalism, including a discussion of recent methodological developments that improve its efficiency and functionality. We then present illustrative examples to demonstrate how FLO-SIC successfully addresses the effects of electron self-interaction. We conclude with a reflection on remaining challenges to implementing FLO-SIC more broadly.

Theoretical background
In DFT, the electrostatic Coulomb energy of the electrons E H includes the interaction of each electron with itself. In exact DFT, this self-Coulomb energy is exactly cancelled by the exchangecorrelation (XC) energy E XC , but residual self-interaction remains when an approximate XC functional is used instead. In PZ-SIC, this residual energy is subtracted out on an orbital-byorbital basis [2] E where n σ i are single orbital densities of spin σ, E DFT is the DFT energy functional, E XC is the exchange-correlation functional used and E H is the Hartree, or Coulomb, energy Schrödinger-like DFT-SIC equations are derived analogously to the Kohn-Sham (KS) equations using a standard variational method, yielding HereĤ DFT σ is the DFT Hamiltonian, V SIC iσ is the SIC potential of orbital i, andĤ DFT-SIC i is the corresponding SIC Hamiltonian. The λ σ ji are the Lagrange multipliers introduced in the variational procedure to ensure orthogonality between the M σ occupied spin orbitals.
A challenging feature of PZ-SIC is that the total energy E DFT-SIC and the DFT-SIC equations are orbital-dependent. This means that individual orbital densities must be optimized in order to obtain the lowest E DFT-SIC . The key idea behind FLO-SIC is to use localized Fermi orbitals (FO) in the PZ energy functional [5][6][7]. A FO is defined as where ψ jσ (r) is any set of orbitals that spans the occupied orbital space (i.e. Mσ j=1 |ψ jσ (r)| 2 = n σ (r)). The a iσ are called Fermi-orbital descriptors (FODs). They are M σ distinct points in space that characterize the transformation. The FO are normalized, but not orthogonal, so the Löwdin method [8] is applied to obtain orthonormal Fermi-Löwdin orbitals φ iσ (FLOs).
Minimizing E DFT-SIC requires that both the total density and the FLOs are optimized. The latter is accomplished by rearranging the FOD positions. As discussed further below, the FOD positions can be optimized by using the gradients of the total energy with respect to the a i , which we refer to as FOD forces. A flowchart summarizing a FLO-SIC energy minimization is given in Fig. 1.

Improving FOD optimization
In FLO-SIC, the total energy is minimized by optimizing the FOD positions. Pederson [6] derived an expression for the gradients g i = ∂E DFT-SIC /∂a i and used these in a conjugate gradient algorithm to optimize the FODs for some closed-shell atoms. The method is akin to that used to optimize the arrangements of atoms in a molecule. Using a similar approach, we determined minimum-energy FODs for all the atoms Li to Kr [9]. In practice, the conjugate gradient algorithm may take hundreds of steps to determine optimal FOD positions, and the number of steps required grows as the number of FODs increases. It is clearly important to make the FOD optimization as efficient as possible.
To gain insight into the nature of the FOD energy surface, we computed second derivatives of the energy with respect to the FOD positions at the minimum energy FOD arrangement for several closed-shell atoms. For a particular FOD a, the second derivative h kl = ∂ 2 E DFT-SIC /∂a k ∂a l was calculated numerically via finite differences of the corresponding gradients where g k is the gradient in the k direction (k = x, y or z) and ±∆x l is an FOD displacement in the l direction. We used displacements of 0.001 Bohr to compute the derivatives. The average of the diagonal element (h D = 1 3 k h kk ) is a measure of how steeply curved the energy surface is for the given FOD. For a typical atom, the curvature is similar for all FODs corresponding to local orbitals in the same electronic shell. The averaged, per-shell values of h D for the closed-shell atoms Ne, Mg, Ar, Ca, Zn, Kr are presented in table 1. h D for the 1s FOD is at least an order of magnitude larger than those for valence FODs for all atoms. For example, for Kr the 1s second derivative (81 a.u.) is four orders of magnitude larger than that of the 4s4p FODs (0.0022 a.u.). This implies that for similar displacements of the FODs from equilibrium, the 1s FOD gradient would be vastly larger than a 4s4p gradient. This is a signature of an ill-conditioned optimization problem that can result in slow convergence to a solution using a straightforward application of gradient-based algorithms.
To accelerate the convergence, we used the averaged diagonal elements h D of the second derivative matrix to scale the gradients (g) and the FOD coordinates (a) as follows prior to using the coordinates and gradients in the optimization algorithm. The effect is to make the scaled second derivatives ∂g k /∂a l approximately the same size for all FODs. To carry out this preconditioning step for atoms not listed in table 1, we fit the values of h D for each shell vs atomic number Z. For each shell, a cubic polynomial gives an excellent fit to the data in the table. Values for any atom can be extracted from these fits.
The scaled gradients and coordinates can be used in conjunction with any gradient-based optimization algorithm. We used them with the conjugate gradient (CG) and L-BFGS algorithms.
To compare the performance of these methods with and without preconditioning, we considered the Kr and Cr atoms. We used the same starting set of FODs for all methods. To create the starting sets, we displaced each FOD from its ideal position given in Ref. [9], while maintaining the same approximate distance to the nucleus. For the scaled cases, we simply applied the scaling transformations to the gradient and coordinate vectors immediately before the call to the optimization subroutine and reversed the scaling for the updated coordinate vector returned from the subroutine. Each FOD was then moved using the unscaled update.
The total energy versus the optimization step for each method is shown in Fig. 2. It is clear that scaling improves both the CG and L-BFGS algorithms. The scaled L-BFGS performs the best of the methods tested. For both Kr and Cr, the minimum energy is reached after approximately 40 steps. None of the other methods converges to the lowest energy within the 60 steps shown. The unscaled CG method has the slowest convergence for both Kr and Cr. The scaled CG and unscaled L-BFGS show similar performance after 60 steps, although the scaled CG leads to a faster initial drop in energy.
To implement the scaled L-BFGS method for molecules, we simply use the h D value for the appropriate shell of the nearest atom to the FOD position. This is clear for the core orbitals, since the core FODs have positions quite close to the nuclear position. For valence states and particularly bonding states, the FODs can lie between atoms, potentially complicating the choice of scaling parameters; however, we find that simply using the h D value for the valence shell of the nearest atom is effective. In tests for molecules, we obtain speedups for the scaled L-BFGS method similar to those shown in Fig. 2.

Atomic orbital energies
Because the potential an electron "sees" in a DFT calculation includes an interaction with itself, at a large distance r from a neutral atom the potential goes exponentially to zero, rather than exhibiting the physically correct −1/r behavior. This leads to several problems in the description of atoms and other finite systems. In this section we focus on orbital energies, which are too high in DFT calculations, making them poor approximations of electron removal energies. For example, the 1s orbital in the H atom should be at −13.6 eV. As seen in Fig. 3, the actual orbital energy in a spin-polarized DFT-LDA calculation is −7.3 eV. On the other hand, the FLO-SIC treatment of H (again spin-polarized), which contains only one electron, is exact and the 1s orbital energy in FLO-SIC-LDA equals the experimental removal energy. FLO-SIC is not exact for He, but the He 1s energy level is clearly brought into much better agreement with experiment in FLO-SIC-LDA than in the uncorrected theory. It is interesting to look at other elements and at different energy levels (valence, semi-core, core) to determine whether all energy levels are corrected in a FLO-SIC calculation. This is done in Fig. 4, where the orbital energies of the 2s2p as well as 3s3p shells for the atoms from Na-Ar are shown. Even these deeper energy levels are strongly corrected towards the experimental removal energies when comparing DFT-LDA and FLO-SIC-LDA. (All calculations involving open shell atoms are done spin-polarized. The splitting of the majority and minority spin levels is not visible on the scale of Fig. 4, though all levels are plotted.) Taking all the orbital levels for all elements from H to Zn and comparing them to experiment, we find the errors given in table 2. It can be seen that not only the value of the highest occupied orbital is corrected [9,10], but all other orbital levels are as well, down to the deepest core levels. Accordingly, FLO-SIC restores to all orbital energies the physical significance of experimental removal energies.

Dissociation Curves
To investigate the performance of FLO-SIC in stretched bond situations, we computed dissociation curves by systematically removing a single H from LiH, BeH 2 , BH 3 , CH 4 , NH 3 , H 2 O, and HF, while holding the remaining atoms fixed. An example of such a curve for the CH 4 molecule is shown in Fig. 5, where a comparison between LDA, FLO-SIC-LDA and reference CASPT2 calculations is shown. The LDA and FLO-SIC-LDA calculations were done without  spin polarization. Energies are plotted with respect to the separated limit of an isolated H atom and an isolated molecular fragment. For a more quantitative analysis of the results for all seven molecules, the error (calculation−reference) at four equidistant points along the energy curves is determined for LDA, FLO-SIC-LDA, PBE, and FLO-SIC-PBE. The first of these points is the equilibrium X-H distance where the energy has the value E = E b , the binding energy of the bond. The second is the separation at which E = E b /2. The third and fourth points are then determined by making two additional steps of the same length outward from the second point. The mean error and the mean absolute error with respect to the reference values for all seven molecules is given in Fig. 6. It can be seen that FLO-SIC-LDA always improves the LDA energy, while FLO-SIC-PBE tends to over-correct the already good description of PBE. For the largest separations, FLO-SIC reduces the mean absolute error in both LDA and PBE, for LDA dramatically. Table 2: Mean absolute error (MAE) and mean absolute percentage error (MAPE) of the atomic orbital energies for H-Zn in comparison to experimental values. The notation 'all' means that all orbital energies were taken into account (from valence to core) while 'HOO' refers to the highest occupied orbital and '1s' to the lowest occupied orbital. The agreement of FLO-SIC with experimental data is significantly better than that of the parent LDA functional and close to Hartree-Fock [11].
Orbitals 9.9 1.9 1.8 Figure 5: Energy with respect to the dissociated limit E diss for moving a single H away from CH 4 calculated with LDA, FLO-SIC-LDA and CASPT2. The distances used in the error calculation are indicated by dashed lines.

Magnetic Exchange Couplings
As shown in Section 4.1, SIE strongly affects orbital energy levels and hence it is also expected to impact molecular properties. One property of particular interest for possible nanotechnology applications are magnetic exchange couplings (J) of transition metal systems. The couplings gauge the nature and strength of spin interaction among metal centers [12,13]. The effect of SIE on J has been analyzed in the past for transition metal complexes empirically using several widely available hybrid functionals with different amount of exact Hartree-Fock exchange [14].
Here we show the effect of explicitly removing SIE on the magnetic exchange couplings of the transition metal complex [Fe 2 OCl 6 ] 2− using the FLO-SIC method.
The computational burden associated with optimizing hundreds of FODs in a large system can be reduced by using effective core potentials (ECPs), where only the valence electrons (and hence valence FODs) are treated explicitly. This simple strategy enables calculations on large systems by substantially reducing the computational time. The use of ECPs has been shown to work well for a number of organic radicals as well as for another transition metal complex ([Cu 2 Cl 6 ] 2− ) [15]. Here, we used the Stuttgart effective core potentials [16] for Fe and Cl atoms with the corresponding uncontracted basis sets, whereas the O atom was treated in an allelectron manner. We calculate the magnetic exchange couplings (J) using the spin-projected approach of Noodleman [17], where E HS and E BS are the energies for the high spin state (HS) and a broken-symmetry spin state (BS), respectively, and S A and S B are the nominal spins on the two centers.
In table 3, we present the J values obtained from plain LDA and FLO-SIC-LDA (dubbed FLO-SIC in the table), along with the ones from multiple hybrid functionals. The experimental value is listed as a reference. The J value with plain LDA is overestimated by a large amount when compared to the reference value. However, with FLO-SIC it is significantly corrected and the resulting J value is close to the one obtained from experiment. All hybrid functionals considered in this work correct the exchange coupling toward the experimental reference similar to FLO-SIC and lie between the LDA and experimental values. The hybrid functionals partially correct for self-interaction by including a fraction of exact Hartree-Fock exchange in the exchangecorrelation energy.

Charge transfer and self-interaction
SIE also impairs the description of charge transfer in DFT calculations. For example, it has been shown that heteronuclear dimers dissociate to fractionally charged atoms in DFT, due to unphysical positions of the atomic energy levels caused by self-interaction [19]. A related, but more subtle effect is the description of the charge distribution in polar molecules at equilibrium bond lengths. For molecules near equilibrium, PBE gives a good description of properties like binding energies, yet properties like dipole moments do not agree well with experimental values. This is a result of electron self-interaction that makes the anions relatively less stable in DFT calculations, resulting in less charge transfer and smaller dipole moments. This can be seen in table 4 where results from PBE and FLO-SIC-PBE are compared with reference CCSD(T) values [20]. Experimental geometries were used for all calculations. PBE results in dipole moments that are systematically too small, while FLO-SIC-PBE produces values in much closer agreement with the reference values. For the six molecules shown in the table, the average PBE result underestimates the reference values by 5.2%, while the FLO-SIC-PBE values have an average error of only 1.8%, an improvement of roughly 3 times over PBE.
We also show calculated Mulliken charges for the anions in table 4. The increase in the dipole moments in FLO-SIC-PBE is clearly due to an increase in the charge transferred to the anion. For the six molecules shown, the charge on the anion is greater on average by 0.07 electrons.

Summary and future directions
The applications of the FLO-SIC method presented above illustrate the success of FLO-SIC and the PZ-SIC formalism in removing the effects of SIE from commonly used exchange-correlation functionals in situations where these errors are important. Similar improvements over uncorrected functionals have been found in other benchmarking applications carried out by our group [21,22] and others [4]. Challenges remain to making FLO-SIC more efficient for practical calculations. We are currently working on creating improved starting points for FOD positions, based on optimized results for the free atoms and accumulating experience gained in describing molecules. Good starting points can greatly reduce the number of optimization steps needed to find energy-minimizing FLOs. Further improvements to the optimization schemes are also possible, with the goal of accelerating convergence still further. We are also pursuing algorithmic improvements to make the solution of the DFT-SIC equations more efficient. Also needed is the implementation of periodic boundary conditions, to make FLO-SIC calculations possible for crystaline materials. A more fundamental goal is to improve the performance of FLO-SIC for problems where the underlying DFT functional already works well. As illustrated by the data for R 1 in Fig. 6, FLO-SIC-PBE yields worse binding energies than PBE for molecules near their equilibrium geometries. More work is needed to determine how to best pair SIC with functionals such as PBE and the promising new SCAN functional [23] in order to remove SIE when it is prominent, without diminishing their performance when it is not. This, combined with more efficient computational tools, can make self-interaction-free DFT calculations a reality.