Phoebe: a collection of Phonon and Electron Boltzmann Equation solvers

,


I. INTRODUCTION
As technology use and its energy footprint continue to grow, so does the importance of improving the properties of the materials which underlie such technology. Materials transport properties, and in particular the capacity of a crystal to conduct heat and electricity, are an important set of materials characteristics, critical to the function of virtually any electronic device. These properties are determined by the traveling of electrons through a crystal lattice, the propagation of lattice vibrations, and the interactions between these excitations.
The task of engineering better materials can be greatly accelerated through computational studies. In particular, the prediction of materials transport properties has reached great levels of accuracy in recent years, thanks to the combination of first-principles simulations and numerical Boltzmann Transport Equation (BTE) solvers, the former capable of accurately parametrizing the characteristics of crystal excitations and the latter providing macroscopic transport properties from microscopic data. Both components are still the subject of study and are continuously evolving, as there are several challenges, both technical and conceptual, behind both the calculation of the relevant microscopic interactions, as well as in increasing the sophistication at which the BTE can be solved. There are several projects actively implementing these simulation techniques in software packages, the challenge being to achieve the best possible accuracy at the smallest computational cost. Various existing BTE codes are currently available, including EPW [1], BoltzWann [2], PERTURBO [3], and BoltzTrap [4] for electronic properties, as well as phonopy/phono3py [5,6], ShengBTE [7], and ALAMODE [8] for the calculation of phonon transport. All of these have been used with various successes for the study of materials transport properties. The physics of heat and electricity conduction are intrinsically linked one to the other, and, as we will discuss in further detail, the electron and phonon BTE also share several common features. Despite the similarities, no currently developed software package is capable of describing both kinds of transport processes.
Our vision behind the development of this new software was to take advantage of the affinity between thermal and electrical conduction to provide a single framework to fully characterize materials transport properties. In addition, we aimed not only to provide a tool for the prediction of materials transport properties; instead, we developed a package containing the abstraction and generic data structures needed to calculate conduction. In doing so, we created a flexible computational framework designed to easily accommodate extensions and improvements, as ongoing work in this field continues to reveal new and exciting developments. Lastly, we took note of the rapid evolution of computational hardware, in particular with respect to the rise of GPU accelerators and the increasingly heterogeneous architecture offerings of modern supercomputing facilities. As a result, we aspired to develop a new software which could run efficiently regardless of the available computing environment.
Therefore, in this work we present the new open-source software package Phoebe -a suite of PHOnon and Electron Boltzmann Equation solvers -for the prediction of electron and phonon transport properties. Phoebe implements the construction and solution of the Boltzmann transport equation for crystals using a scattering matrix formalism that takes into account for electron-phonon, phonon-phonon, boundary, and phonon-isotope scattering contributions to transport properties. Phoebe interfaces with ab-initio codes to construct the BTE in a parameter-free approach, allowing for unbiased prediction of materials' transport properties. Additionally, Phoebe has been developed to tackle the intensive computational workload of ab-initio transport calculations in order to take advantage of contemporary supercomputing facilities. The package is written in object-oriented C++ and utilizes a hybrid MPI-OpenMP parallelization scheme, together with GPU acceleration, distributed memory data-structures, and parallel HDF5 input/output operations. We discuss the capability of Phoebe to scale over supercomputing hardware and to reduce the computational cost of the calculation with respect to other codes, streamlining the calculation of transport properties. Additionally, because of its flexible, object-oriented structure, Phoebe is designed for easy implementation of additional features, and the possibility of interfacing with additional ab-initio software. Going forward, we aim to expand the capabilities described here to include interfaces with additional ab-initio packages for the calculation of interaction matrix elements, more interacting quasiparticle properties, and further transport coefficients.
We showcase Phoebe's capabilities with the calculations of the transport properties of GaN, a III-V semiconductor used for high-power electronics. After validating the results against experimental measurements, we argue that Phoebe is able to accurately and efficiently predict a wide range of material transport properties such as the electrical and thermal conductivity and thermoelectric performance in large and complex crystal structures.

II. FUNCTIONALITIES
Phoebe is a software designed to characterize the transport properties of materials from first principles. In this article, we discuss the first release of Phoebe, which aims to provide the tools necessary for a complete study of electrical, thermal, and thermoelectric material properties in a single package. Phoebe interfaces with the output of ab-initio software and postprocesses it in order to predict transport properties. With Phoebe, it is possible to compute a wide range of properties, such as: • Wannier/Fourier interpolation of electron and phonon properties; • Electron and phonon BTE solvers, from relaxation time models to exact solutions; • Linewidths and lifetimes of electrons, limited by electron-phonon interaction and boundary scattering; • Linewidths and lifetimes of phonons, limited by three-phonon, isotope and boundary scattering; • Lattice thermal conductivity; • Electrical conductivity, mobility, thermal conductivity and Seebeck coefficient; • Phonon and electron viscosity; • Electron-phonon averaged approximation; • Wigner distribution description of electron and phonon transport; • Electron and phonon band-structure, density of states, and velocity operator; with more functionalities planned for future releases.
Phoebe is made available freely on GitHub [9] under a permissive MIT open-source license, and has been written using object-oriented C++, with the intention of allowing future extensions and implementation of new features with minimal effort. We rely on a number of open-source libraries, including Eigen [10] for small linear algebra problems and ScaLAPACK [11] for massively parallel linear algebra problems, as well as Spglib [12] for analyzing symmetries, PugiXML [13] for reading XML files, and HighFive [14] to handle I/O based using HDF5 [15].
Additionally, Phoebe has been written to take advantage of contemporary HPC infrastructure. Parallelization is achieved with hybrid MPI and OpenMP parallelization, while the most computationally expensive parts of the code are accelerated with Kokkos [16], allowing the use of GPU accelerators, if available. Finally, we take advantage of several GitHub functionalities, such as GitHub pages to create a static website for user accessing the code and GitHub Actions to perform a continuous integration. The documentation built with Doxygen [17] and Sphinx [18] is hosted online on Read The Docs [19].

III. METHODOLOGY
In this section, we report a brief summary of the main algorithms implemented in Phoebe, and refer to external works for the details of the derivation and the physics behind the methodology.

A. Phonon Properties
In several crystals, the motion of atoms is simply an oscillation around their equilibrium positions, and is thus described with the atomic displacements u(sαR), where R is the Bravais lattice vector labeling a unit cell, s is an index over the unit cell basis of atoms, and α is the Cartesian direction of the atomic displacement. To lowest orders in perturbation theory, the lattice potential energy is described with a Taylor expansion [20], where the first two important terms are the second derivative V (2) of the crystal energy with respect to atomic displacements and the third derivative V (3) while higher order derivatives are here neglected. These quantities can be computed using external ab-initio codes. In particular, Phoebe interfaces with phonopy/phono3py [5] and Quantum-ESPRESSO [21] for the calculation of the second derivative and with phono3py [6] and ShengBTE [22] for the calculation of the third derivative. These two tensors of derivatives allow the characterization of phonon properties and of their scattering mechanisms by means of a Fourier transform. For example, the Fourier transform of the second derivative tensor provides the dynamical matrix D defined as [23] where q is an arbitrary wavevector, and one Bravais lattice vector component of the second derivative V (2) has been set to a reference unit cell 0 without loss of generality thanks to the translational periodicity of the crystal. The phonon harmonic properties are then found by diagonalizing the dynamical matrix where ν is an eigenvalue index (i.e. the phonon branch),hω qν the phonon energy and z sα qν the phonon eigenvector. Polar corrections to the dynamical matrix are taken into account in Phoebe as explained in Ref. [23].
The Fourier transform of the third derivative V (3) instead describes the coupling strength of a 3-phonon scattering event, and is computed as [24,25] where again one Bravais lattice vector component can be set to a reference unit cell 0 thanks to the translational periodicity.
The equations 3 and 5 allow us to compute the phonon properties at arbitrary wavevectors, starting from the derivatives computed in real space that can be obtained from ab-initio codes. This interpolation procedure is a major computational bottleneck due to the large number of tensor indices and the fact that it needs to be computed for all wavevectors used to sample the Brillouin zone (BZ). As a result, it is critical to implement this interpolation procedure, in particular for the 3-phonon coupling strength, in the most efficient way possible, which will be discussed in Section IV E.

B. Electron Properties
In order to compute electronic transport properties, it is necessary to characterize electrons and the electron-phonon interaction. Phoebe relies on an external ab-initio code to compute the electronic energies kb and wavefunctions |ψ kb , where k, the electron wavevector, and b, the band index, are Bloch quantum numbers (spin is omitted for simplicity), which represent eigenvalues/eigenstates of the electronic Hamiltonian Once the electronic wavefunction is known, it is possible to compute other properties such as the interaction of electrons with the lattice. The electron-phonon coupling g is defined as [26] g b bν (k, q) = 1 2mω qν and provides the coupling strength of the interaction between two electrons and a phonon, and g 0 is introduced for later convenience (as it's a quantity smoother to interpolate than g), and ∂ qν V is the potential originating in response to the perturbation of the lattice with the ionic displacements of a phonon mode qν.
For the first release of Phoebe we interface with Quantum-ESPRESSO [21] in order to obtain values of the electronic energies and the electron-phonon coupling. In principle, ab-initio codes can directly compute electronic properties at any wavevector k. However, the calculation of transport properties requires extremely accurate sampling of k throughout the Brillouin zone, making a direct first-principles calculation unfeasible in most cases. To circumvent this issue, transport calculations typically adopt an interpolation scheme, in which Quantum-ESPRESSO computes electronic properties on a coarse grid of wavevectors, {k co }, and Phoebe interpolates these properties to an arbitrary fine mesh of wavevectors {k f i }.
We implemented the Wannier interpolation technique, interfacing Phoebe with Wannier90 [27], which provides the construction of the maximally-localized Wannier-function (MLWF) [28,29]. To briefly explain the MLWF procedure, we first note that the wavefunction |ψ kb is not unique, as it has a gauge degree of freedom: the wavefunction can be multiplied by an arbitrary factor e iφ(k,b) . Since the phase φ(k, b), for our practical purposes, is the result of the numerical diagonalization of a Hamiltonian matrix, it can be assumed to be random, and as a result the wavefunction does not vary smoothly with k. The MLWF procedure consists in finding a gauge that makes the wavefunction smooth -phase factors included -across the BZ. The Wannier function |mR computed by Wannier90 is defined as where m is a Wannier function index, and the rotation matrix U is found by Wannier90 as the matrix that smoothens the wavefunction in reciprocal space. The interpolation of electronic energies proceeds as follows from Ref. [29]: we can first compute the Hamiltonian matrix in the basis of Wannier functions 0n|H|Rm as where N k is the total number of wavevectors in the coarse grid {k co }. From the Hamiltonian in the Wannier representation, we interpolate electronic energies at any arbitrary wavevector k by computing a Fourier back-transform followed by a diagonalization of the resulting matrix which provides an interpolation of the electronic energies kb and rotation matrix U k at any desired wavevector. The electron-phonon coupling can be interpolated in a conceptually similar way to the electronic energies. Following the approach of Ref. [30], we first note that the tensor g 0 is smoother than g (as defined in Eq. 7) and thus more suitable for an interpolation algorithm. The tensor g 0 (k co , q co ) (with band indices omitted for simplicity) is computed by Quantum-ESPRESSO on a coarse grid of wavevectors and is then transformed by Phoebe to the Wannier representation as where N q is the number of phonon wavevectors in the coarse sampling, z is the phonon eigenvector introduced in Eq. 4, and R e (R p ) is the set of Bravais lattice vectors associated to the electron (phonon) Fourier transform of the set {k co } ({q co }). Once this transformation is performed, the electron-phonon coupling tensor is finally interpolated to arbitrary wavevectors k and q as where the U and z matrices at the interpolating wavevectors can be found by constructing the dynamical matrix at the desired q wavevector and by interpolating the electronic energy at wavevector k. As a result of this interpolation procedure, one can accurately sample the the electron-phonon coupling over the BZ and compute transport properties.
As for the interpolation of the three-phonon coupling, the interpolation of the electron-phonon coupling is a computationally expensive effort due to the large number of indices and the sheer number of interpolating wavevectors. The implementation of the coupling interpolation for both electron-phonon and three-phonon coupling will be discussed together in detail in Section IV E, as Eq. 5 and Eq. 13 are similar from a mathematical point of view.
Finally, polar corrections to the electron-phonon coupling are implemented in Phoebe following the procedure outlined in Ref. [31]. The coupling is decomposed into short-and long-range contributions, where the long-range contribution is given by and the short-range contribution is calculated on the coarse grid by subtracting the long-range term from the total electron-phonon coupling calculated by the ab-initio code. Here, Z * s is the Born effective charge tensor of atom s, and 0 and ∞ are the low-and high-frequency permittivities. Note that all factors except the overlap only depend on the single point q. We therefore precompute all of these factors before the scattering matrix calculation, leaving only multiplication by the overlap for the expensive double loop over k and q.

C. Velocity
Transport properties require knowledge of the velocity operator, which characterizes the diffusion velocity of a single heat or charge carrier. For charge carriers, we compute the matrix elements of the velocity operator v as the derivative with respect to the wavevector of the non-interacting Hamiltonian [32] v αk,bb = where H W is the Hamiltonian matrix in the Bloch representation computed at k with the Wannier interpolation technique described above, and δ α is a small displacement vector along Cartesian direction α. If the eigenvalue kb is non-degenerate, the diagonal matrix element v αk,bb represents the electronic group velocity. However if kb is a degenerate eigenvalue, we cannot readily identify the velocity matrix element v αk,bb with the group velocity: in this case, we select the degenerate subspace of the matrix v αk and diagonalize it. After this subspace diagonalization, the new diagonal matrix elements represent the group velocity of the degenerate state. The definition of the phonon velocity is extremely similar (to the point that we use the same code for both). Simply, the matrix H is replaced with the square root of the dynamical matrix D q and the rotation matrix U is replaced with the phonon eigenvector z [24].

D. Phonon Boltzmann Transport Equation
Given the microscopic quantities discussed in the previous sections, we can discuss now how to obtain the lattice thermal conductivity. In Phoebe, we describe lattice thermal transport with the phonon Boltzmann transport equation (BTE). This semiclassical model describes the motion of a phonon (wavepacket) through the crystal. A phonon state (q, ν) at thermal equilibrium is populated withn qν phonon quanta, wheren qν = 1/(eh ωqν /k B T − 1) is the Bose-Einstein distribution. The application of a thermal gradient ∇T to the crystal forces the system out-of-equilibrium, so that each phonon state is then populated with an out-of-equilibrium population n qν . The phonon BTE provides a way to find such out-of-equilibrium population and consists of a balance between two terms: The left-hand side is the diffusion operator, and describes the phonon drift in presence of a thermal gradient ∇T , while the right-hand side describes how scattering changes the population number of a phonon state. Here, V is the crystal unit cell volume, Ω qq νν is the scattering matrix and δn qν = n qν −n qν is the deviation from equilibrium population. For computational convenience, we also work with a rescaled scattering matrix A defined as A qq νν = 1 √n qν (nqν +1)n q ν (n q ν +1) Ω qq νν . In Phoebe, the phonon scattering matrix is built taking into account the three-phonon interaction, phonon-isotope and phonon-boundary scattering [24] as where we introduced the three-phonon scattering rates for coalescence events as while the scattering rate for phonon decay is where G is a reciprocal lattice vector, N q is a normalization number of phonon wavevectors, and V (3) (qν, q ν , q ν ) is the three-phonon coupling strength. Additionally, the scattering rate for phonon-isotope scattering is [33] P isot qν,q ν = π 2N q ω qν ω q ν n qνnq ν +n qν +n q ν 2 where the coupling strength g isot s depends on the variance of masses as where m is is the atomic mass of atom of species s and isotope i, f is is the abundance of the isotope (which by default is set to natural abundances [34]) and m s is the average atomic mass. Finally, the boundary scattering rate is described with the term where L is the user-specified macroscopic crystal size. Using the first-principles phonon properties discussed in Section III A it is possible to construct the phonon BTE and in particular the scattering matrix. We will later discuss how to solve both the electron and phonon BTEs to obtain the out-of-equilibrium population n qν . Assuming that n qν is found, the thermal conductivity tensor k αβ is computed as where v αqν is the phonon group velocity.

E. Electron Boltzmann Transport Equation
The semiclassical description of electron transport shares several similarities with the phonon transport theory. In the phonon case, we solved for out-of-equilibrium phonon populations. Now, for the case of electron transport, we deal with the problem of finding the out-of-equilibrium occupation number of a charge carrier. At equilibrium, an electronic state (k, b) is occupied according to the Fermi-Dirac distribution f kb = 1/(e kb /k B T −µ − 1), where µ is the chemical potential and kb is the electronic energy. When we apply a thermal gradient to a material, we can expect to induce the diffusion of carriers just as in the phonon case, resulting in an out-of-equilibrium population f kb . However, each carrier also possesses a charge e, and therefore can diffuse also when an external electric field E is applied to the material. As we are only concerned with the linear response to the perturbation induced by the thermal gradient or the electric field, the electronic BTE consists of two separate problems for the two different perturbations E and ∇T . The BTE describing the response to an electric field in direction α is where δf kb = f kb −f kb is the deviation from the equilibrium population, v αkb is the electronic group velocity in direction α, and A is the electronic scattering matrix. The BTE describing the response to a thermal gradient perturbation closely resembles the phonon case and is When taking the electron-phonon scattering into account, the scattering matrix A can be computed as [35] A where g bb ν (k, k ) is the coupling strength of the electron-phonon interaction. Electron scattering against surfaces or boundaries can be taken into account by adding to A a term of the form v αkb L δ k,k δ bb , where L is a characteristic length.
The solution to the electronic BTE is discussed in the next section. Solving the BTE provides a way of calculating the linear response of the electronic out-of-equilibrium population in response to an electric field δf Eα or a thermal gradient δf T α in direction α, defined as δf αkb = δf Eα E α and δf αkb = δf T α ∇ α T respectively. These perturbations induce both a charge flux J and an energy flux Q defined as These fluxes can be compared with the Onsager relations from which we can define and compute the electronic transport coefficients, such as the electrical conductivity σ and electronic thermal conductivity k With these definitions, we are finally ready to predict the thermal and electronic transport properties of crystals.

F. BTE Solvers
The similarity between electron and phonon BTEs allows us to readily implement solvers which can be applied generically to either the electron or phonon cases. We first look for a solution that is linear in the external perturbation of the system, i.e. an out-of-equilibrium population in the form f λ → f λ E or f λ → f λ ∇T , with λ an index running on the relevant Bloch-state indices (phonon or electron indices). Under this linear approximation, we write the BTE (Eqs. 17 25 and 26) in the more concise form of a linear algebra problem where b is the diffusion term, A is the scattering matrix, all chosen appropriately according to the transport problem under consideration (Eqs. 17, 25 or 26). On the surface, the BTE appears as a trivial linear algebra problem; however, there are several complexities which arise when solving the BTE. First, the scattering matrix is not invertible [36] due to the presence of zero-valued eigenvalues (arising from the property of energy conservation and/or charge conservation). It is thus not possible to directly invert the scattering matrix to solve the problem. The second challenge lies in the sheer size of the problem: in order to obtain converged results, the summation over the Brillouin zone (λ includes a wavevector index) must be performed for an extremely fine mesh of wavevectors. It is not uncommon to use more than 10 5 wavevectors. Therefore, Phoebe contains a number of different solvers which operate at various levels of complexity and computational cost, each with distinct advantages and disadvantages, and let the user decide which one is the best suited for a particular material.

Constant Relaxation Time Approximation
The constant relaxation time approximation (CRTA) is a rudimentary, approximate solution to the BTE. In this case, the scattering matrix is replaced with a single scalar where τ is a user-specified constant relaxation time. Most of the complexity of the BTE is removed and the BTE is readily solved with the out-of-equilibrium population f λ ≈ τ b λ . The CRTA may provide reasonable results for the Seebeck coefficient, as this often has a weak dependence on the scattering properties, and may be used for fast estimates of other transport properties. However, the choice of τ is largely arbitrary and in many crystals a more detailed description of scattering processes must be included to accurately predict transport properties.

Relaxation Time Approximation
The relaxation time approximation (RTA) is another approximate solution to the BTE, in which only the diagonal elements of the scattering matrix are retained so that where τ λ is a carrier's lifetime, and the off-diagonal matrix elements are neglected. Specifically, the electronic lifetime is defined as A kbk b = 1 τ kb , while for phonons the lifetime is Ω qνqν = 1 τqν . With this approximation, the out-of-equilibrium population is readily found as f λ ≈ b λ τ λ . The RTA is a fast and memory-efficient BTE solver, and can be used when more accurate solvers become prohibitively expensive. For electronic transport, it often shows good agreement with more accurate solvers. On the other hand, there are reported cases for which this approximation has failed, particularly for materials where the conservation of crystal momentum is of the utmost relevance [37][38][39]. In such case, one must consider solvers that utilize the complete scattering matrix.

Iterative solver
The iterative BTE solver [40][41][42] constructs the out-of-equilibrium populations with a geometric series. First, we decompose the scattering matrix into its diagonal terms A out and off-diagonal components A in . A solution is found by iteratively correcting the RTA solution, b A out = bτ , as where j is an iteration index that can be stopped when transport coefficients are converged. Note also that because A out is a diagonal matrix, its inverse is trivially constructed. Additionally, we implemented symmetries in this solver to further accelerate simulations (see Section III K below). However, it's worth pointing out that the iterative solver can only converge as a geometric series, i.e. only if the absolute value of the determinant | det( 1 A out A in )| is smaller than 1. If the convergence criterion is not satisfied, the algorithm may diverge after a few iterations. The iterative method can be used both by opting to store the scattering matrix in memory, which allows a fast solution once the scattering matrix is built but at the expense of a large usage of memory, or not storing A in memory, so that we lower memory requirements but require to compute the action of the scattering matrix A · f several times.

Variational solver
The variational BTE solver [35,43] takes advantage of the following quadratic functional of the population f Since the scattering matrix is semi-positive definite [35,43,44] (a property related to the ever increasing value of entropy [20]), the functional F has a single minimum which coincides with the solution of the BTE. In Phoebe, we implemented a conjugate gradient algorithm [43] that minimizes the functional F and finds the exact solution to the BTE. In contrast to the iterative solver, the variational solver is guaranteed to converge to the solution provided that the scattering matrix is semi-positive definite. Like the iterative solver, the user can choose to store the matrix in memory to quickly build the solution, or not, so that the solution is found more slowly but with smaller memory requirements.

Relaxons
The last solver is based on a direct diagonalization of the BTE [45,46]. The scattering matrix is diagonalized with a ScaLAPACK solver and we compute the complete set of eigenvalues and eigenvectors where α is an eigenvalue index, τ α is the inverse eigenvalue and θ α λ the eigenvector. Thanks to the completeness of the basis set of eigenvectors, we expand the out-of-equilibrium population as where we introduced auxiliary coefficients f α . Next, we write the BTE in the eigenvector (relaxons) basis as and note that in this form we can analytically solve the BTE by substituting f α = τ α λ b λ θ α λ in Eq. 41. This last solver has the benefit of always providing the exact solution, as it's not based on any iterative scheme and therefore has none of the related numerical instabilities. Furthermore, this analytical solution to the BTE allows for further physical insights, such as the calculation of viscosity [47] (see Section III H). On the other hand, this solver has the longest time-to-solution and may not suitable for rapid calculations of transport properties.

G. Electron-Phonon Averaged approximation
The Electron Phonon Average (EPA) approximation is another approximation for the calculation of electronic transport properties. Although the Wannier function technique discussed in Section III B provides an accurate interpolation of the electronic properties, the construction of the MLWF is a challenging task, not yet fully automated (progress is being made in this direction [48]), and therefore is not always suitable for materials studies. EPA provides a way to estimate transport properties without Wannier functions, making it a much faster and automatable method suitable for materials screening applications.
Consider a set of electronic energies (k co ) (at fixed band index) computed by a first-principles code on a coarse grid of wavevectors in the BZ. To interpolate the energies, we implemented a plane-wave based interpolation method [49], which fits an electronic band with the periodic functioñ where R m is a set of m = 0, 1, . . . , M − 1 Bravais lattice vectors chosen as all R m such that |R m | < R cut , where R cut is a user-specified parameter, and c m is an expansion coefficient. The expansion coefficient are found by optimizing a target error function against a set of training points. Further details can be found in Ref. [49]. This technique doesn't involve Wannier functions and is thus simpler to initialize. The drawback, compared to the Wannier interpolation, is that it requires a denser sampling of wavevectors {k co } in order to obtain a good interpolation quality and converged transport results. The next component of the EPA approach is a scheme for estimating the electron-phonon coupling. The idea underlying the EPA method is to replace integrations of the BZ with integrations over the quasiparticle energies, simplifying some of the microscopic details of electronic transport to gain computational performance. The electronphonon coupling is approximated as which replaces a 6-dimensional wavevector dependence on k and q with a 2-dimensional dependence on electronic energies. Additionally, as g has a weak energy dependence, it is possible to sample g 2 with just a few values of energies. There are several ways to construct the function g 2 [50] and in Phoebe we implemented the moving-least-squares (MLS) method [51], which constructs g 2 by minimizing the integral where σ gauss is a user-specified smoothing scale parameter. The energy dependence on 1 and 2 is implemented numerically with a user-specified uniform sampling of an energy interval. Once the coupling is constructed, the energy-dependent electronic lifetime is calculated as Nq q ω qν is the average phonon frequency of the phonon branch ν, f and n are energy-dependent Fermi-Dirac and Bose-Einstein distributions respectively, ρ is the electronic density of states, and g s is the spin-multiplicity (equal to 2 for non spin-polarized crystals).
Using this lifetime, the BTE is readily solved at the relaxation time approximation level, and it is possible to evaluate the electronic transport coefficients using an energy-dependent approximation for Eqs. 32, 33, 34, as discussed in greater details in Ref. [50]. Typically, the computational bottleneck for this method in Phoebe is the construction of the electronic DoS, which simply has a cost linear in the number of wavevectors used to sample the BZ.

H. Viscosity
In some materials with large conductivities, transport doesn't follow the conventional macroscopic diffusion equations such as Fourier's law for heat or Ohm's law for charge currents. Recent studies highlighted the necessity to complement conventional transport coefficients such as conductivity with hydrodynamic transport coefficients such as viscosity. In Phoebe, we implemented the phonon viscosity as described in Ref. [47], which is defined from the response coefficient η where i, j, k, l are Cartesian indices, u is the phonon drift velocity in direction k, r is the position, and Π is the crystal momentum flux Π ij = 1 Nq qνh q i δn D,j qν , where δn D,j qν is the out-of-equilibrium phonon population in response to a perturbation of the drift velocity u j . The out-of-equilibrium population can be found solving a modified version of the BTE, as reported in full details in Ref. [47]. Finally, the phonon viscosity µ is found by symmetrizing η as The phonon viscosity can be used to parametrize a generalization of the Fourier's law for describing heat transport. In Phoebe, we also implemented the analogous expression for electronic viscosity, which is readily obtained by replacing phonon properties such as group velocity, equilibrium distribution and lifetimes with their electronic counterparts.

I. Transport with the Wigner distribution
There have been some recent developments in the description of transport using the Wigner distribution. In crystals where several carriers' energies overlap with each other, it is possible to observe important contributions to transport that are not described by the BTE, for both electron and phonon transport. This is relevant for example in disordered systems [52] or narrow-gap semiconductors [53]. In Phoebe, we implemented the transport equation for the phonon Wigner distribution N , which is where where ∂N (r,k,t) ∂t coll is the phonon scattering operator, Ω(q) is a matrix of energies with elements Ω νν (q) = δ νν qν , [, ] is a commutator and {, } an anticommutator. The electronic version of the Wigner transport equation is where ∂F (r,k,t) ∂t coll is the electron scattering operator, (k) is a matrix of energies with elements bb (k) = δ bb kb , and d bb (k) is the electronic dipole [53]. Further details behind these methods are described in Refs. [52,53]. Here we just mention that the diagonal elements of these equations coincide with the electron and phonon BTE discussed previously, and that results for transport coefficients can be interpreted as corrections to the BTE predictions. Furthermore, these Wigner corrections to the BTE transport coefficients only depend on the carriers lifetimes. Because these quantities are already computed when constructing the scattering operator for the BTE, the computational cost of the Wigner corrections is negligible compared to the cost of constructing and solving the BTE.

J. Dirac delta
The calculation of materials properties such as scattering rates often requires a numerical approximation of the Dirac-delta function, for example, the density of states (DOS) In Phoebe, we offer three different schemes for the numerical approximation of the Dirac-delta for the user to choose from: 1. Gaussian broadening: the Dirac-delta is replaced with a Gaussian function where σ is a user-specified broadening parameter. This method is simple and fast, but requires the user to converge results with respect to σ in conjunction with the number of wavevectors used for the BZ integration.
2. Adaptive Gaussian broadening: the Gaussian broadening parameter σ introduced above can be approximated [54,55] as where C is the matrix describing the crystal lattice vectors, N β is the size of the Monkhorst-Pack mesh in crystal direction β, and the parameter A is a prefactor that can be fixed to 1 [55]. As this method self-adjusts σ with respect to the chosen BZ sampling and the local derivative of the carrier's energy, it tends to provide easier-to-converge results than the fixed-broadening Gaussian scheme.
3. Tetrahedron method : discussed in Ref. [56], this method provides a parameter-free linear approximation of the integrand. For simplicity, the tetrahedron method is currently implemented only for the calculation of the DoS. The method is accurate and parameter-free, although considerably slower than the alternatives.

K. Symmetries
A crystal is characterized by a set of symmetries S, each symmetry consisting of a rotation R and a fractional translation S = {R, t} such that the application of a translation and a rotation to the atomic positions leaves the crystal overall unchanged. In reciprocal space, symmetries ensure that a point in the BZ can be mapped to another symmetry-equivalent point. Rather than working with a set of wavevectors {k} that uniformly samples the BZ, we can select a subset of irreducible wavevectors {k * }, such that the complete set of wavevectors can be obtained by applying rotations on the irreducible wavevectors set k = Rk * . As a result, integrations over the BZ can be sped up by restricting the integration to the irreducible wedge of the BZ.
We recall here that microscopic carrier properties also obey symmetry relations. For example, scalars such as the carriers energy or lifetimes are left invariant by symmetry operations between an irreducible point k * and its symmetry equivalent point k = Rk * [46] and Instead, the group velocity transforms like a vector as and similarly, the out-of-equilibrium population [46] transform as As a result, the calculation of several properties can be restricted to the irreducible wedge of the BZ, for example when computing the DoS, or solving the BTE at the RTA level.
The relaxation time solvers of the BTE support symmetries, while the only exact solver that supports symmetries in Phoebe is the iterative one (see Section III F 3). In fact, the BTE for an irreducible point k * (omitting band indices for simplicity) is where we used the symmetry properties of the out-of-equilibrium population, andÃ is a tensor describing the scattering properties limited to the irreducible BZ wedge. As a result of symmetries, the BTE's linear algebra problem can be solved with a much smaller number of Bloch states, and thus with a better computational efficiency.

IV. SOFTWARE
Phoebe is written in C++ following an object-oriented paradigm. Since Phoebe computes a range of extremely different physical properties, each of these properties are organized in Apps. A main function simply initializes global properties (such as starting the MPI or Kokkos environments), reads the user input and launches the appropriate App through a class factory. Currently, Phoebe offers the following Apps: • ElectronWannierBandsApp, ElectronFourierBandsApp, PhononBandsApp: calculation of electron or phonon energies along a BZ path; • ElectronWannierDosApp, ElectronFourierDosApp, PhononDosApp: calculation of electron or phonon DOS; • ElPhQeToPhoebeApp: processing of the electron-phonon coupling from Quantum-ESPRESSO to a Phoebe format. An EPA processing computes the electron-phonon averaged coupling, while the Wannier processing transforms the coupling to its real-space Wannier representation; • ElectronLifetimesApp and PhononLifetimesApp: calculation of carriers' lifetimes along a BZ path; • ElectronWannierTransportApp: calculation of electronic transport coefficients with Wannier interpolation method; • TransportEpaApp: calculation of electronic transport coefficients with the EPA approximation; • PhononTransportApp: evaluation of phonon transport properties.
The physical properties computed by these Apps are either printed to standard output or written to JSON files for user's convenience. We also provide a few Python scripts to generate some of the most common plots of physical properties. Each App uses a number of classes built to reduce code duplication and to provide convenient data structures. In fact, a central goal of the project is to guarantee a simple deployment of methods and algorithms for future releases. The most important among the several classes available are • Crystal: description of the crystal unit cell and crystal symmetries; • Points: handling of wavevectors in the BZ; • HarmonicHamiltonian: base class for the calculation of non-interacting carriers properties; subclassed into PhononH0, ElectronH0Wannier and ElectronH0Fourier for evaluating phonon properties, Wannier-and Fourierinterpolated electron properties respectively; • BandStructure: storage of carriers' energies, velocities and eigenvectors; • Context: description of user input variables; • MPIcontroller: wrapper for calls to MPI functions; • Matrix: storage of MPI-distributed matrices and a wrapper to ScaLAPACK subroutines; • StatisticsSweep: calculation and description of values for temperature, chemical potentials and carrier concentrations; • Interaction: base class for the evaluation of the scattering coupling strength. Its subclasses describe the three-phonon interaction, electron-phonon coupling with Wannier interpolation and the EPA coupling; • ScatteringMatrix: base class for the calculation, storage and manipulation of the electron or phonon scattering matrix; • VectorBTE: data structure for storage and manipulation of the out-of-equilibrium population; • Observable: base class of a number of classes for evaluating transport coefficients.
In the following we will discuss some of these classes in greater detail.

A. Points class
The Points class in Phoebe manages the wavevectors used to sample the BZ. We defined two constructors, one for instantiating a class for uniform sampling of the BZ and another for paths of wavevectors in the BZ. When uniformly sampling the BZ with a Monkhorst-Pack grid, the complete list of points (wavevectors) is not explicitly stored for memory management convenience. This implicit list contains points sorted by their coordinates (incrementing the coordinates z, y and x in this order). When describing BZ paths instead points are not sorted and their coordinates are explicitly stored in a list. Points are internally manipulated in crystal coordinates (i.e. in scaled lattice vectors units), and we provide options to output points in Cartesian coordinates or to fold them in the first BZ.
In the Points class we implemented the logic for reducing a set of wavevectors to the set of symmetry-irreducible wavevectors in the BZ. The class method setIrreduciblePoints(), takes the set of crystal symmetries (from the Crystal class) and finds the list of irreducible BZ points, i.e. finds the irreducible set of points k * such that all remaining points k can be found applying a rotation R to k * .
It's worth noting that symmetry operations are not unique, so that a symmetry operation may rotate a wavevector correctly as k = Rk * but other quantities such as the group velocity may not be rotated correctly v(k) = Rv(k * ). Since the carrier populations transform like group velocities, the method setIrreduciblePoints() can optionally find a set of symmetries that rotate both points and velocities.
The Points class also has a setActiveLayer() method that allows discarding some points that don't provide a sizeable contribution to a physical observable, a functionality that will be explained in Section IV B in conjunction with the ActiveBandStructure class. After a call to this method, the functionalities in this class operate as if only the restricted set of points are present in the class.
Finally, several other parts of the code require the lookup of a wavevector in the list of points and, due to the large number of points used to sample the BZ (often in the excess of 10 4 ), this search operation can take a substantial part of the calculation if not implemented effectively. If Points describes a uniform BZ sampling with a complete Monkhorst-Pack grid, the point search in the list can be done in o(0) operations, taking advantage of the implicit ordering of points. If the list of points is sorted (e.g. after a call to setActiveLayer()) the lookup operation can be done with a binary search algorithm, which scales logarithmically with the number of points. Only if points are not sorted (e.g. when they describe a BZ path), the point search is made with a slow element-by-element comparison.

B. Band structure classes
We developed a data-structure dedicated to the storage of the band structure, that is, the energies ( kb andhω qν ), eigenvectors (U kmb and z sα qν ) and velocities (v αkbb and v αqνν ) of carriers over the BZ, which are needed to integrate several materials properties.
To handle this information, we defined two classes: FullBandStructure and ActiveBandStructure. Both classes take a Points instance as input, since Points defines the set of wavevectors k over which the carriers' properties are computed. While both classes store the band structure, there is an important difference. The FullBandStructure class stores energies, eigenvectors and velocities for all wavevectors defined in the Points instance. For example, given N k points and N b electronic bands, FullBandStructure stores the complete list of N k × N b values of e.g. energies kb (and similar for eigenvectors or velocities). Since wavevector meshes can be extremely large, carriers properties can optionally be stored in a MPI-distributed fashion, using the Matrix class described in Section IV C.
ActiveBandStructure contains similar functionalities of FullBandStructure in that it stores information of carriers properties (in fact they share the same virtual parent class). However, the constructor of this class discards some Bloch states (kb) that don't contribute significantly to the calculation of transport properties. Because transport properties at finite temperature only depend on electron (phonon) states that are thermally excited, there are many states which do not contribute and are readily discarded to save computational resources. For example, only long-wavelength acoustic phonon states contribute to transport at low temperatures T as their excitation energy goes to 0 with T . Similarly, only electrons with energy close to the chemical potential µ contribute to charge currents and thus we can restrict calculations to electronic states such that | kb − µ| is smaller than a few k B T . As a result, states that are not thermally excited can be discarded, and in Phoebe we term this Bloch state filtering procedure as a "window".
In Phoebe, we implemented two kinds of windows. The first window type imposes an energy cutoff and discards all Bloch states with energy larger than a user-specified cutoff cutoff : for phonons we retain states such that |hω qν | < cutoff and for electrons | qν − µ| < cutoff . However, this kind of window is impractical, as the cutoff is temperature dependent and, in practice, a user would need to perform a convergence test on cutoff .
We implemented a second window type, noting that transport coefficients contain terms of the formf kb (1 −f kb ) for electrons orn kb (1 +n kb ) for phonons (which arise from the energy or temperature derivative of equilibrium distributions). For both carriers, these terms decay exponentially to zero if the carrier's energy is much larger than k B T , thus suppressing contributions of high-energy carriers to transport properties. We therefore define a "population window" by retaining only states such that, for electrons or for phononsn where δ is a small number. Since the temperature dependence is inside the definition of Bose-Einstein and Fermi-Dirac distributions, we found that δ can be hard-coded to a fixed value and that transport properties are not affected. In this way, we achieve a reduction in the number of Bloch states used in the various calculations, and avoid the introduction of an additional parameter to be converged, for greater user convenience. As a result of this filtering procedures, the number of electron bands N b or phonon branches N ν is no longer a constant but depends on the wavevector index. As a result, quantities like the electronic energy kb cannot be stored as a matrix of size N k × N b . Instead, energies are stored as a flattened vector, and we provide class methods to access energies through Bloch state indices. A similar strategy is adopted for storing phonon properties, as well as eigenvectors or velocities.

C. Matrix
As described in Sections III D, III E, the BTE is a linear algebra problem and due to the number of Bloch states involved, requires distributed matrices and parallel linear algebra operations. In order to perform these massively parallel dense linear algebra operations, we utilize the ScaLAPACK library [11]. In order to both avoid explicit calls to this library in the main body of code and to allow future flexibility should we desire to swap ScaLAPACK routines with those of a different library, we developed the Matrix class as a wrapper to this library.
Because ScaLAPACK is mostly built in Fortran and C languages, it follows an imperative procedural programming paradigm, i.e. a set of routines that must be called in the right order, resulting in a departure from the objectoriented programming style adopted in the rest of Phoebe. The Matrix class has therefore been designed to map this procedural-style library into an object-oriented style. Therefore, the class constructor takes care of ScaLAPACK initializations, such as starting the BLACS environment and allocating the memory locations, while deallocations take place in the class destructor. Various class methods have been implemented to provide wrappers around the needed linear algebra operations, such as matrix-matrix multiplication and matrix diagonalization, with the benefit of hiding within the class method all the necessary auxiliary calls (such as temporary memory allocations or index mappings) in order to call use the ScaLAPACK library. As a result, the rest of Phoebe's code can act on distributed memory with a simplified interface that doesn't obfuscate the main body of the transport code.
C++ class templates further allow us to generalize the class to store different data-types (in particular, real and complex floating point values). We also allow the code to be compiled without the ScaLAPACK or MPI libraries. In this case, the Matrix class falls back to a "serial matrix" which relies on the Eigen library, losing all MPI-parallel characteristics (therefore becoming mostly suitable for debugging purposes).

D. Scattering Matrix
The ScatteringMatrix class, as the name suggest, implements operations related to BTE's scattering operator.
A key challenge in developing this class has been to try generalize this code to achieve different purposes. In order to implement the calculation of scattering rates, we defined ScatteringMatrix as a virtual base class that defines a virtual method builder(); the method is implemented in the subclasses PhononScatteringMatrix and ElectronScatteringMatrix and evaluates Eqs. 27 or 18. In this way, auxiliary methods such as the calculation of a matrix-vector product or the scattering matrix diagonalization can be defined directly in the base class and are shared by the two different carriers.
The scattering matrix has a few modes of operation. If the BTE is to be solved only at the RTA level, only the calculation of lifetimes is performed by the class instance. For exact BTE solutions, for which the full scattering matrix must be dealt with, we distinguish two cases: first, when the complete scattering matrix is stored in memory or alternatively, when only the product of the scattering matrix on a population vector is computed (without the need of saving the matrix in memory). In the former case, the calculation has a larger memory requirement, however, this allows to quickly solve the BTE. In the latter case, the memory footprint of the calculation is reduced, but as a drawback each iteration of a BTE solver will require more computational time. We thus let the user choose the desired memory impact, which will depend on both the material studied and on the available hardware. To summarize, the builder() method, depending on the input, is tasked with computing 1) the carriers' lifetimes, 2) the scattering matrix or 3) its action on a vector. Note that due to the size of the scattering matrix A νν , it is stored as a class member using an MPI-distributed Matrix instance.
Great care has been taken to optimize the efficiency of the scattering rate calculations implemented in builder, as these are some of the most computationally intensive functions in the code. Significant improvements to the calculation speed have been achieved by precomputing all independent particle properties whenever possible. Additionally, we used MPI parallelization to accelerate loops over k and k wavevectors, while other loops over band indices are accelerated with OpenMP parallelization. The calculation of the interaction coupling strength and its GPU acceleration is described in the next section.

E. Kokkos (Interaction)
In Phoebe, great care has been taken to accelerate the calculation of interaction matrix elements and scattering rates. The calculation of the scattering coupling strength for either three phonon and electron-phonon scattering in reciprocal space, f αβγ (k, k ), starting from a real-space representation, g abc (R, R ), is obtained with a tensor operation in the form where U are rectangular matrices indicating Wannier rotation matrices or phonon eigenvectors, a, b and c are indices labeling Wannier functions or atomic displacements, α, β and γ are Bloch band indices, k is a wavevector, and R is a Bravais lattice vector. In order to maximise performance, we first note that this expression typically appears inside a double loop on k and k . Therefore, if k is the outer loop, it is convenient to precompute the quantitỹ at fixed value of k, and then complete the transformation inside the loop over k . It is necessary to adopt the most convenient tensor contractions in order to minimize the size of loops needed to perform the interpolation. Additionally, because GPUs are extremely effective at performing tensor operations, we implemented the calculation of the coupling with the Kokkos library [57]. This library provides a hardware-agnostic programming model for parallelizing operations, so that tensor operations are implemented once following Kokkos' syntax and then, at compilation time, the code is optimized for the desired architecture. The code can thus run both on standard CPU-only architecture as well as on GPUs when available without code duplication. We therefore achieve strong parallel performance across a varied range of underlying architectures.
In our preliminary tests, we observed that the interpolation at fixed values of k and k is extremely fast on a GPU, causing communication latency between GPU and CPU to be the dominant bottleneck. To counteract this effect, we increase the workload on the GPU by performing the interpolation for a fixed value of k and a set of values for {k } at once. The size of the set {k } is optimized at runtime so that most of the on-board GPU's memory is used. Finally, as the size of the electron-phonon coupling tensor in the Wannier representation g bb ν (R e , R p ) may be too large to fit in a single GPU's memory, we adopt an additional layer of MPI parallelization. The electron-phonon coupling can be distributed over the index R e across a "pool" of MPI processes, and we allow the user to specify the size of the pool at runtime through the command line.

F. Quantum-ESPRESSO interface for electron-phonon coupling
Phoebe has been interfaced with Quantum ESPRESSO ph.x and Wannier90 in order to obtain the relevant ab-initio input for the electron-phonon coupling calculations. However, this interface presented an important technical challenge.
As mentioned in Section III B, gauge fixing is a critical component of the Wannier interpolation. In order for the Wannier interpolation to work, it is critical that the gauge of the wavefunctions used by these two separate codes stays the same throughout the calculations. Unfortunately, there is no built-in QE function to fix the gauge of a wavefunction and therefore the wavefunction may take a random phase every time it is computed. As a result, we developed a gauge-fixing procedure that ensures the wavefunction has a consistent gauge across pw.x (the QE code that finds the ground state wavefunctions), ph.x (the QE code that computes the electron-phonon coupling) and wannier90 (which computes the maximally-localized Wannier functions).
Therefore, in addition to the main Phoebe code, we provide a patched version of QE that must be used to compute electronic properties, which guarantees the gauge of the wavefunctions is appropriately taken into account. In order to explain this gauge fixing procedure, we first recall that QE is a plane-wave ab-initio code, so that the wavefunction is computed as where G runs over a set of reciprocal lattice vectors (typically defined up to a user-defined cutoff value on |k + G| 2 ), c are the plane wave coefficients, and r is the position. The subroutine c_bands() of QE is responsible for building and diagonalizing the Hamiltonian matrix, and therefore also computes the eigenvalues kb and the plane wave coefficients c kb (G). The phase arbitrariness manifests as a random phase on the complex plane-wave coefficients c kb (G). In our patch, we modified c_bands() so that if c_bands() is called by pw.x and QE has found the ground state wavefunction (i.e. a scf calculation), we save c kb (G) to file. These coefficients will serve as a reference wavefunction to fix the gauge of all subsequent calculations. If c_bands() is called by another code, for example, when generating auxiliary data for Wannier90 or ph.x, we read the reference plane-wave coefficients and use them to fix the gauge of the newly computed wavefunction. In this way, we can ensure that -at the phase level -the same wavefunction is used throughout these multiple calculations. An additional complexity in this patch is caused by the use of symmetries in the ab-initio code. The reference plane-wave coefficients built in a pw-scf calculation are typically computed only for wavevectors {k * } in the irreducible BZ wedge. Therefore, we save to file only the coefficients c k * b (G) computed at irreducible points. The wavefunction at a symmetry-equivalent wavevector k can be reconstructed from the irreducible wavevector as which in terms of plane-wave coefficients implies where S = {R, t} is a crystal symmetry as introduced in Section III K. An additional symmetry that is taken into account is the time reversal operation which in terms of plane-wave coefficients is A last constrain comes from the translational symmetry of the crystal, i.e.
or equivalently Symmetries in the presence of magnetism are currently being developed. Using these relations and knowing the wavefunction at k * , it is possible to reconstruct the wavefunction at any symmetry-equivalent wavevector k. However, in QE, the wavefunction cannot be built from symmetries just by mapping c k * ,b (G) into c k,b (G). In fact, the set of G vectors is chosen such that |G + k| 2 < δ, where δ is a user-specified cutoff. As a result, we cannot map all c k * ,b (G) into c k,b (G) as some plane-wave coefficients might be missing at the rotated wavevector, causing errors in the normalization of the wavefunction. To address this, we first rotate the reference wavefunction |ψ ref where |ψ arb−gauge is the wavefunction with an arbitrary gauge that has been computed by QE at wavevector k. With a complete basis set O would be a unitary matrix, however, due to the finite set of G-vectors, the unitary property is violated by a small amount ∆ defined as We can re-enforce the unitary property of O by first noting that O is expected to be semi-positive definite (due to the lack of completeness of the finite set of G vectors). We then write where L is the Cholesky decomposition of the matrix LL † = 1 + O † ∆O and introduced an approximation that has an error of order ∆ 3 (expected to be small for well-converged calculations). By construction, the matrix OL is a unitary matrix which can now be used to rotate the wavefunction computed at a symmetry-equivalent point to the desired gauge, while preserving the orthonormality condition. We therefore fix the gauge of the wavefunction at k by applying the rotation This procedure makes the wavefunctions obey the symmetries of the crystals. This allows for a great simplification of the electron-phonon coupling calculation. As explained in Ref. [30], if the wavefunction obeys crystal symmetries, the electron-phonon coupling satisfies the symmetry relation where S is a symmetry operation of a crystal and due to the periodicity of the crystal. As a result, we can compute the electron-phonon coupling on a set of irreducible wavevectors, then construct the remaining matrix elements by means of the symmetry relations, reducing the burden of the ab-initio calculation.
We stress that this patch avoids the need for re-implementing calculations of the electron-phonon matrix elements ψ k+q,b |∂ qν V |ψ kb , which are already computed in the code ph.x. As a result, Phoebe can readily take advantage of all features supported by ph.x, such as ultra-soft or PAW pseudopotentials [58], U corrections [21] and others, without the need for re-implementing them. The only modification necessary in the code ph.x has been the introduction of a subroutine that unfolds the symmetry of the coupling g and writes it to file output. The Wannier90 code has not been explicitly modified, as it simply reads the wavefunctions pre-computed by QE. As a last detail, we stress that the gauge problem must also be taken care when dealing with the phonon eigenvector z, and the eigenvector z present in Eq. 12 must be exactly the same phonon eigenvector (phase factor included) used by ph.x to compute the electron-phonon coupling. Additionally, also z must obey the symmetries of the crystal, as discussed in Ref. [30].  . Generally, the user must first perform ab-initio simulations (blue boxes), using our patched version of Quantum-ESPRESSO for studying electron-phonon interactions or any DFT code that can provide anharmonic phonon interactions. Next, shown in purple boxes, the user must launch the Phoebe executable. The electronic transport properties calculation must be preceded by a one-time post-processing of the electron-phonon coupling taken in output from Quantum-ESPRESSO, while the phonon transport calculation can be directly started from the output of ab-initio calculations. Finally, results are provided in output either in the standard output file, or as a parsable JSON format. Now, having introduced all the major features and computational structures of Phoebe, we demonstrate how the code can be used to calculate the transport properties of materials. In Fig. 1 we schematically show a user's workflow for computing electronic and thermal transport properties with Phoebe. The electronic transport workflow, as shown in the top panel, starts with the user going through the ab-initio calculation. The first step requires the calculation of the ground state wavefunction and electronic energies. Due to the wavefunction gauge requirements explained in Section IV F, the ground state calculation must be performed using our patched version of QE. Next, the user proceeds to compute the phonon properties with the patched version of ph.x. The output of this calculation, specifically files containing the force constants and the electron-phonon coupling, are later passed to Phoebe. At the same time, the user must also compute the Wannier functions with Wannier90, which outputs the real-space representation of the electronic Hamiltonian and the U rotation matrices that must be used to rotate the wavefunction to the Wannier gauge. After these ab-initio calculations have concluded, the user must run the qe2PhoebeApp, which uses the data from the ab-initio calculations to compute the electron-phonon coupling in the Wannier representation g(R e , R p ) starting from g(k, q) computed by ph.x, as described in Eq. 12. This operation needs to be run only once per material, and generates an output (HDF5) file with the electron-phonon coupling in real space, ready to be used for the interpolation. Finally, the user has all the input data necessary to compute the electron transport properties by launching the electronWannierTransportApp. Results are output as JSON files, easily parsed and analyzed by the user.
The workflow for computing the phonon transport properties is shown in the lower panel of Fig. 1. First, the user must select an ab-initio code (from any that work with phono3py or ShengBTE) to calculate the equilibrium ground state of the crystal. After a reference equilibrium crystal structure is found, the user runs an external library, e.g. phonopy and phono3py or ShengBTE, in conjunction with the ab-initio code to generate all atomic displacements needed to compute the second and third derivatives of the total energy (see Sec. III A). Once both V (2) and V (3) are computed, the PhononTransportApp is ready to be launched. This app will interpolate the phonon-phonon coupling, compute phonon lifetimes, and then solve the BTE to evaluate the phonon transport properties such as the thermal conductivity. As in the electronic transport case, the results are written in easily parsable JSON files that can be conveniently analyzed. Next, we showcase the calculation of transport properties by applying these two computational workflows to GaN, a semiconductor often used for high-power applications.

A. GaN
Gallium nitride (GaN) is a well-studied semiconductor that crystallizes in a wurtzite structure. We calculate the electronic structure of this material via Quantum-ESPRESSO using optimized norm-conserving Vanderbilt (ONCV) pseudopotentials [59,60] and the local-density approximation (LDA) exchange-correlation functional [61]. For the phonon workflow, we construct the force constants in a 4×4×3 supercell, and the third derivative using a 3×3×2 supercell. The supercell DFT calculation is performed using a 2×2×2 mesh for the k-point integration and a wavefunction cutoff of 110 Ry. Next, the phonon BTE is built using the adaptive smearing method for approximating the Dirac-delta and converged with a 15×15×15 mesh of q-points. For the electron transport workflow, ab-initio properties are computed using a 110 Ry plane-wave energy cutoff, a 12×12×7 coarse mesh for both k-and q-points and the same pseudopotential choice. The maximally-localized Wannier function basis for this material was established using 14 random Wannier centers. The electronic BTE is build using an adaptive smearing scheme, and states are discarded with the population window. The BTE is integrated using a variable mesh of k-points, starting from the densest mesh of 170×170×106 at the lowest temperature of 100K and gradually decreased down to 90×90×56 at the highest temperature of 450K, as the convergence properties depend on temperature.
In Fig. 2 we plot some of the transport properties of GaN as a function of temperature. In 2a, we plot the lattice thermal conductivity as a function of temperature, and compare results of the BTE at various levels of approximation agains the experimental values [62]. We find a good agreement between our simulations and experiments, validating the results of this work. The RTA solution tends to underestimate the thermal conductivity predicted by exact BTE, as expected due to the variational principle underlying the formalism [20,24]. To solve the phonon BTE exactly, we use the variational and the relaxons solvers, mainly to note how the two solvers are able to provide, as expected, approximately the same solution to the BTE.
The electron drift mobility at a n-carrier concentration of about 2 · 10 17 cm −3 is plotted versus temperature in Fig. 2b and compared against some of the available experiments [63][64][65]. Here we distinguish two regimes. At higher temperatures, where a sufficient number of phonons is thermally excited, we expect electron-phonon scattering to be the dominant factor limiting charge transport in GaN. Indeed, for temperatures above approximately 150K, we obtain a good agreement with experimental values, and correctly reproduce the decreasing trend of mobility with respect to temperature. At lower temperatures we do observe a departure of experimental values, which tend to decrease as temperature goes to zero, while our results are showing the opposite trend. This is likely due to the fact that, as temperature is decreased, fewer phonons are thermally excited and electron have a higher probability of scattering against defects or impurities in the crystal, which for simplicity have not been taken into account for this simulation. We also note that here the RTA already provides a good approximation to the mobility (and also for the Seebeck coefficient and the resistivity), with the variational and relaxon solvers providing essentially the same estimates. Overall, we do observe a good agreement with experiments for simulations in the phonon-limited regime of charge transport.
In Fig. 2c, we plot the Seebeck coefficient as a function of temperature. Here, we do obtain a good qualitative agreement between our results and the experimental values for large temperatures [66], i.e. larger than 150K. Since the Seebeck coefficient is strongly related to the density of states of the material [67], the results seem to indicate that the DFT simulation provided a good approximation to the electronic band structure. Also in this case, we observe a departure between our results and experiments at the lowest temperatures. As in the case of the drift mobility, a possible explanation can be attributed to the presence of different scattering mechanisms. Another alternative explanation could be attributed to the presence of phonon drag, which for example has been proposed to exist in GaAs [68], and typically manifests as a large sudden change in the values of the Seebeck coefficient at low temperatures. Therefore, our results suggest that a detailed investigation of phonon-drag in GaN could be relevant to understand the transport properties of this material at this carrier concentration.
In the top panels of Fig. 3 we plot the phonon (left panels) and electronic (right panels) band structure. On top of the band structure, we superimpose the carrier linewidths computed at a temperature of 300K. These pictures are produced with Python scripts that have been included in the Phoebe release for user's convenience. The top panel provides an easy way to identify some particular bands or areas of the Brillouin zone which tend to scatter the most, as the linewidths are a measure of the total scattering rate. For this particular material, we can observe how optical phonons tend to scatter more, especially at the Γ or A high-symmetry points. In panel Fig. 3b instead we can observe how the holes close to the chemical potential have larger linewidths than electrons close to the band gap, but otherwise we don't observe any pocket of large scattering rates. The bottom panels describe the energy-dependence of the lifetimes, which sometimes is useful to derive simplified models of transport. For phonons, in Fig. 3c, we can clearly see that phonon lifetimes tend to be the largest for the three acoustic phonon modes whose energies tend to zero at the Γ point. Generically, lifetimes tend to decrease as the energy increases, in agreement with standard expectations. However, we observe significant deviations from a simple power-law behavior of the energy dependence, especially for optical phonon modes. This complicated energy-dependence of the phonon lifetimes is indicative that simple constant-relaxation time models are inadequate for studying the phonon transport in GaN, and a more accurate microscopic description, as done here, is necessary. In panel 3d we instead show the energy dependence of carrier lifetimes for p-doped GaN. While we observe a small spike for carriers closest to the band edge due to the vanishing density of states, the relaxation times simply show a weak energy dependence.

B. Performance and Scaling
Phoebe uses MPI, OpenMP and Kokkos for scaling and acceleration at different levels. MPI enables large-scale simulations on multiple nodes, while OpenMP allows shared-memory parallelization within one node. As explained in Section IV E, the interpolation of the particle interaction is implemented with Kokkos, so that if no GPU is available, Kokkos's OpenMP backend will add to the intra-node parallelization, otherwise its CUDA, HIP, SYCL and OpenMPTarget backends compile the code for GPUs from different vendors.
In this section, we show some examples of the computational performance of Phoebe. As a benchmark, we take the GaN calculation of electronic transport properties discussed in the previous section, and integrate the BTE at a temperature of 300 K on a grid of 90 × 90 × 56 wavevectors. For this test, we reduced the coarse ab-initio grid of wavevectors to 7 × 7 × 4. The simulation is run on the TACC Longhorn supercomputer, where each node has 4 NVIDIA V100 GPUs and 2 IBM Power 9 AC922 20-core CPUs.
In Fig. 4 we plot the scaling of the ElectronWannierTransportApp with respect to the two levels of parallelization (MPI and OpenMP), while also demonstrating how the GPU accelerates the calculation (with 1 GPU per MPI process). The Figure 5. System load as a function of time during a simulation of GaN electronic transport properties. The top panel shows a CPU-only simulation performed with 1 MPI process and 8 OpenMP, where a vast majority of Phoebe's work is well parallelized with OpenMP, except for the input file reading at the beginning of the simulation. In the bottom panel, the same simulation, but GPU-accelerated, shows a much shorter total wall time, and a lower system load during the calculation of the scattering properties, attributable to the latency spent transferring data between GPU and CPU.
the ideal scaling of a calculation -a straight line in this logarithmic scale -and the color code labels simulations performed with (blue) or without (red) GPU acceleration.
In the left panel of Fig. 4, we test the MPI scaling by increasing the number of nodes used, with 4 MPI ranks, 4 GPUs and 160 OpenMP threads per node. When running without GPUs, the overall scaling of the simulation is very close to ideal. In this case, the calculation of the scattering matrix dominates the total cost of the simulation and since it's efficiently distributed across MPI processes over the wavevectors, we achieve a good scaling performance. The GPUs, shown in blue lines, greatly accelerate the calculation of the scattering matrix, which is now a much smaller contribution to the overall computation time. Remarkably, we achieve a GPU acceleration of a factor 2 or 3 to the overall simulation wall time. Dashed lines in Fig. 4 show in detail the time taken by the construction of the scattering matrix. This part, dominant for larger systems and grids, is very close to ideal scaling.
The right panel of Fig. 4 showcases the scaling performance of Phoebe with respect to the number of OpenMP threads per MPI process. Here, we used a single node with 4 GPUs and 4 MPI ranks, while the number of OpenMP threads per MPI rank is gradually increased from 1 to 8. Because not all steps of the calculation (see Fig. 5) can be parallelized with OpenMP (in particular, the input files reading at the start of the calculation), the total simulation time deviates from ideal scaling as more threads are used. Nevertheless, we achieve good OpenMP scaling performance when GPUs are not used, especially for the construction of the scattering matrix. When GPUs are used, we observe a significant deviation from the linear scaling, which can be traced to the lack of OpenMP scaling of the scattering matrix construction. This is not surprising: since the majority of the scattering matrix calculation is offloaded to the GPU, it becomes largely independent of the number of OpenMP threads being used on the CPU, which are only responsible for minor precomputations and data transfer. Figure 5 shows the timeline of the system load during a single electronic transport calculation. This simulation is computed using a 80 × 80 × 50 k-mesh, and has been run on one node of Harvard University's Cannon supercomputer with 4 MPI processes and 8 OpenMP threads, where each node has 4 V100 GPUs and 2 Intel Xeon Gold 6142 16-core CPUs. The simulation is profiled with the NVIDIA Nsight Systems software. In the top panel, the simulation has been run without GPU acceleration, and we report the system load from the main thread and the first OpenMP thread (the remaining threads behave identically to this thread); in the bottom panel, the GPU is being used and we thus reported the system load for the GPU as well.
At the start of the calculation, where Phoebe is reading input files, the vast majority of time is spent reading the large HDF5 files containing the electron-phonon coupling in real space precomputed from Eq. (12), an operation that cannot be OpenMP accelerated and thus only MPI processes are active. After this step, OpenMP threads are used and remain active for the vast majority of the remaining execution time.
In the next step of the calculation, Phoebe precomputes electron and phonon properties as described in sections III A and III B, i.e. energies, eigenvectors, and velocities at all points k, q and k + q. This operation is trivially parallelized with both MPI processes and OpenMP threads (notice the load almost at 100%), thus driving the overall scaling towards linearity. At this stage, we also precompute all factors in the long-range polar correction (Eq. 15), except for the wavefunction overlap which is added during the calculation of the scattering rates. In this way, part of the calculation of the polar correction has a linear scaling with q, rather than being embedded in a loop over k during the construction of the scattering matrix.
In the last part of the simulation, Phoebe computes the scattering matrix, which is the largest contribution to the total wall time. When the GPU is not used, the OpenMP threads are accelerating only the interpolation of the electron-phonon coupling strength. However, the OpenMP system load is at around 75%, indicating how the coupling interpolation represents most of the simulation cost. When the GPU is being used, as shown in the lower panel, we can see how the calculation of the scattering matrix is significantly accelerated. The relatively low GPU utilization for this production-scale simulation indicates that the tensor products involved in the calculation runs so quickly on the GPU that preparing the data transfer between CPU and GPU becomes the bottleneck of the acceleration. Finally, the calculation of the transport coefficients, here made at the RTA level, is too fast to be visible in the picture.
Overall, we find that Phoebe takes good advantage of multi-level parallelism and GPU acceleration even for a moderately expensive calculation. With just a few GPU nodes, the calculation is fast enough that it becomes dominated by unavoidable overhead tasks like file reading and data transfer. We mention in passing that future releases of Phoebe have still room for improving the performance, in particular by increasing the fraction of work performed by the GPU when available.

VI. CONCLUSIONS
In this article we present Phoebe, a HPC software for characterising electron and phonon transport properties from first-principles. We provide various tools to predict transport properties at various levels of theory and accuracy, such as the Boltzmann transport equation or models based on the Wigner distribution. Phoebe has been interfaced with Phono3py and ShengBTE to obtain the first-principles coupling strength of the phonon-phonon interaction, and the electron-phonon interaction has been interfaced with Quantum-ESPRESSO. The accurate integration of the Boltzmann transport equation and related transport properties is achieved with Fourier/Wannier based interpolation techniques or with energy-averaging approximations. Lastly, the code has been optimized for speed and scaling, using a mix of parallelization levels based on MPI, OpenMP and GPU (through Kokkos). The code has been made publicly available on Github (https://mir-group.github.io/phoebe/) under an open-source basis (MIT license).
The overall goal for this project has been to provide the community with an efficient software to perform simulation of materials transport properties, while providing an extensible framework for implementing new methodologies as they are being developed. We believe the current software framework allows us to achieve these goals, and we intend continue our work on the project, both improving performance and adding new features to future releases.