BTE-Barna: An extension of almaBTE for thermal simulation of devices based on 2D materials

We present BTE-Barna (Boltzmann Transport Equation - Beyond the Rta for NAnosystems), a software package that extends the Monte Carlo (MC) module of the almaBTE solver of the Peierls-Boltzmann transport equation for phonons (PBTE) to work with nanosystems based on 2D materials with complex geometries. To properly capture how the phonon occupations evolve in momentum space as a result of scattering, we have supplemented the relaxation-time approximation with an implementation of the propagator for the full linearized version of the PBTE. The code can now find solutions for finite and extended devices under the effect of a thermal gradient, with isothermal reservoirs or with an arbitrary initial temperature distribution in space and time, writing out the temperature and heat flux distributions as well as their spectral decompositions. Besides the full deviational MC solver, a number of useful approximations for highly symmetric devices are also included.


Introduction
The continuous shrinking of electronic components, following Moore's Law [1], is pushing bulk semiconductorbased devices, such as silicon transistors, to their fundamental limits. In addition, this increase in the integration level leads to ever higher power densities and raises the Herculean challenge of dissipating the generated heat [2].
In this context, two dimensional materials (2DMs), thanks to their atomic thickness, low surface roughness and density of dangling bonds [3], together with the possibility of stacking them to create heterostructures with tuned properties and their compatibility with CMOS technology, are quite promising candidates to replace III-V compounds and silicon in transistor channels [4,5]. Understanding thermal transport in 2DMs is essential to optimize heat management in such devices. Since phonons are the main heat carriers in semiconductors, heat flux can be described through the Peierls-Boltzmann Transport Equation (PBTE) [6]. For highly symmetric structures (e.g.: bulk systems, nanowires, thin-films. . . ) the relaxation time approximation (RTA), where it is assumed that each phonon mode relaxes to equilibrium independently, has a long tradition of being used, but more recent advances enable an iterative solution beyond that crude approximation [7,8]. Furthermore, the inclusion of phonon properties calculated from first-principles (frequencies, scattering rates. . . ) makes it possible to solve the PBTE even for novel materials where simpler models to describe those properties are lacking [9,10]. These iterative first-principles-based PBTE solvers are available to the community in software packages such as ShengBTE [11], almaBTE [12] or Phono3py [13].
Despite those advances in solving the PBTE for simple systems, using direct or iterative methods for its solution becomes impractical for more intricate configurations, such as the ones required by micro or nanosized devices. A usual approach to overcome such limitations is to use a Monte Carlo method to integrate the PBTE. Notwithstanding their success in other applications, in classical Monte Carlo methods to solve the PBTE the accuracy is hindered by several factors, most notably the inability of the scattering algorithm to conserve energy. Indeed, additional algorithms are required to keep the system energy constant [14,15,16], which can bias the distribution in unknown ways [17]. Moreover, classical methods suffer from high statistical noise plus the fact that most of the computational time is wasted simulating the equilibrium part of the distribution [12,18,17]. In that context, RTA-based deviational energy Monte Carlo methods have proven themselves as good alternative ways to overcome the limitations [18,19,17] of more traditional methods, as they naturally conserve the energy by operating with bundles of energy instead of phonons and reduce the required number of computational particles by simulating only the deviation from a reference equilibrium distribution.
However, the validity of the RTA approach for 2D materials is questionable, and it has been shown to yield a very poor description of thermal properties for several of them [20,21]. For those cases one might need to use an energy deviational Monte Carlo method based on the full collision operator [22].
In this work we present the BTE-Barna software package, an extension of almaBTE to tackle 2D systems both within and beyond the RTA, so that now it can address finite and/or periodic 2D materials and their heterojunctions under the effect of thermal gradients and isothermal reservoirs. We analyze a selection of test cases and discuss the validity of the RTA. Additionally, the iterative solver in almaBTE is extended to provide the effective thermal conductivity for nanoribbons and nanowires.
The paper is structured as follows: after displaying the general structure of BTE-Barna in Sec. 2 and discussing the theoretical background in Sec. 3, we provide test cases of the implementation in Sec. 4 and present illustrative example applications of our package simulators in Sec. 5. Our summary and conclusions are given in Sec. 6. The appendices provide a discussion about the solution of the PBTE in nanowires, additional examples and detailed documentation. Fig. 1 shows the different pieces of BTE-Barna package and how they relate to each other. The whole package heavily relies on almaBTE library routines, which have been extended to allow for the solution of PBTE in finite devices based on 2D materials. Moreover, all the executables use the mode-resolved phonon properties as inputs, which are read from almaBTE-generated HDF5 files. A more detailed explanation of all the executables, their inputs, and outputs can be found in the Appendix A for the iterative solver, and in Appendix B for the Monte Carlo solvers, their input generators, and post-processing tools.

Effective thermal conductivity for nanoribbons
Despite the fact that an exact solution of the PBTE for highly symmetric systems like nanoribbons or nanowires would require a discretization in space, it is possible to obtain an approximate solution by using averages under the assumption of fully dispersive boundaries [8]. In such a way, it becomes possible to obtain an effective thermal conductivity (κ nano ) by simply introducing suppression factors in the lifetimes (τ nano λ = τ 0 λ S nano λ ) and then solving the PBTE like in bulk under homogeneous gradients [11].
For nanoribbons contained in the XY plane, the suppression factors (S nr λ ) can be calculated by evaluating the integrals in Eq. (8) of Ref. 8 (see Appendix C for nanowires), obtaining: where u is a normalized vector pointing along the unbounded direction of the system, v λ and τ λ are the velocity and bulk lifetime of λ-th mode, L is the nanoribbon width and e 1 is the first column of identity matrix. Details on implementation, executable, inputs and outputs can be found in Appendix A.

RTA Monte Carlo
The implementation for 2D materials of the RTA Monte Carlo is based on code already in almaBTE [12], whose formulation was proposed by Péraud et al. [19]. The algorithm simulates the space and time evolution of deviational power (emitted by sources and absorbed by sinks) by splitting its distribution into discrete packets-the deviational particles-and tracking their trajectories in a linearized regime. The validity of the existing implementation rests on the assumption that differences in temperature are small enough that a single reference temperature can be defined for the whole system. We now take a look at the main improvements of our code upon that baseline; see Appendix D for a detailed explanation of the whole algorithm.
The original implementation, steady montecarlo1d, was designed to investigate one-dimensional steady-state situations in materials/heterostructures embedded between two isothermal reservoirs having the same cross section, which owing to finite thickness might not be true for  2D-material-based systems/devices. Consequently, we extended the geometric algorithms to deal with 2D systems using the boost::geometry library [23]. In this implementation the system is composed of different computational boxes, which are defined by the user as convex hulls of points, therefore enabling the creation of complex geometries. Additionally, as the finiteness of real 2D devices requires dealing with boundary scattering, we implemented a full diffusive condition in which the out-state is randomly selected from a Lambert cosine law distribution [24] defined by transition probabilities from a state i to f : where v i and ω i are the group velocity and frequency of the i − th phonon mode,ê ⊥ is the wall normal vector pointing inwards the material and δ(ω j − ω i ) is regularized using adaptive smearing [11]. Besides the steady-state, the code allows for the exploration of time evolution determined by boundary conditions. This is done by sampling the trajectories on a time grid on top of the spatial grid, in such a way that the contribution from a trajectory path inside a computational box to deviational energy (e d ) and heat flux (j) grid point is: where k is the computational box id, V k is the volume of the k-th box, σ is the particle sign, ε d is the deviational power carried per particle, v is the particle velocity and represents the interval between t 0 and t f that belongs to the i-th point of the time grid, with Θ representing the Heaviside function. Such contribution is computed each time a particle changes its state or its spatial bin. Finally, by integrating Eqs. (4)- (5) one can obtain the evolution of temperature T (k, t i ) and heat flux J(k, t i ): where T ref is the reference temperature and C v (T ref ) is the volumetric heat capacity at T ref .

A note on extended systems with applied gradients
For extended systems where no isothermal boundaries are present, e.g. an infinitely long nanoribbon with an applied thermal gradient, particles cannot exit the structure. It is therefore necessary to introduce an alternative way to collect them. For this purpose, and taking into account that transient to steady-state is normally not of interest for those cases, we implemented a modified version of the algorithm originally proposed by Randrianalisoa et al. [25,26] and adapted to such cases by Péraud et al. [17]. We now list our modifications to the algorithm, with respect to the finite systems described in Appendix D, for dealing with extended systems under applied gradients: 1. Generate particles from a gradient (source) generator, evolve them until they scatter (intrinsically or at borders or interfaces) and compute their contribution to steady-state. It is noteworthy that the cancellation scheme in step 2 introduces a non-negligible error of second order with respect to the space mesh [17].

Interface model for stacked layered systems: localized diffuse mismatch model
The traditional diffuse mismatch model (DMM) implemented for the treatment of interface scattering in steady montecarlo1d is a purely elastic model, allowing the coupling between modes at each side of the interface with energy conservation as the sole requirement. Despite this crude approach, the DMM has been proven to qualitatively describe the interface thermal resistance (ITR) for several interfaces of bulk 3D materials such as Si/Ge [12]. However, for systems comprised of stacked layers, such as the interface between graphene and encapsulated graphene [27], energy matching as the single condition for transmission is no longer a valid assumption. Under such conditions, well localized modes in unconnected layers-e.g.: encasing hBN layers vs. the bare monolayer at the graphene/encapsulated-graphene interface-would be predicted to be coupled, which is quite unrealistic and would lead to too low (or even negligible or negative) ITR values (see Fig. 2) [27]. Consequently, we developed the localized DMM (LDMM) to account for mode localization at layers. To that end, we define the localization vector at the I-th layer for a given mode (L A I,λ ) as: where I is the layer index of A side, λ is the phonon mode index, j is the atomic index, α is the cartesian axis and ζ λ,α,j is the eigenvector. Therefore, when calculating the  coupling strength we multiply the classical DMM expression by the coupling factor C A,B λ,λ : where JSD(L A λ ||L B λ ) ∈ [0, 1] is the Jensen-Shannon divergence between localization functions. Therefore, C A,B λ,λ is one for perfectly matching localization and zero for modes fully localized at different layers. As the Jensen-Shannon divergence requires both vectors to be of same length, for different materials the localization vectors are modified in such a way that connected layers remain untouched and paired while the remaining unconnected layers have their values summed up and added at the back of the localization vector; we provide an example for reference: In this example, layers i and j of C are connected to the α and β layers on the D side, and k and µ denote all unconnected layers on each side or a zero term if no additional layers exist. In Fig 3 we show results for the same case studied in Fig. 2 but using the LDMM, enabling us to obtain a non-negligible ITR as expected in this kind of system [27]. It should be noted that, despite the improvement in the results, the model is still incapable of describing the thermal rectification predicted in such systems [27,28] because of the intrinsic symmetry of the DMM and the Jensen-Shannon divergence used to model the interface scattering.

Beyond RTA Monte Carlo
Given the RTA's failure to describe the thermal properties of 2DMs [20,21] Graphene hBN-encapsulated Graphene  [22] simulator to overcome such limitation. This LAIP-LVDSMC algorithm, henceforth referred to as beyond RTA (bRTA), solves the deviational energy linearized PBTE: where n d i is the deviational energy distribution, n 0 i is and B ij is the linearized scattering operator comprising three-phonon and isotopic (mass disorder) scattering terms: where P 3ph s+m→l and P 3ph s→m+l are the intrinsic transition probabilities for three-phonon absorption and emission processes, derived from perturbation theory [6,11] and P iso s→m is the intrinsic isotopic transition calculated using Tamura's model [29]. The latter was not included in Landon's original algorithm. The PBTE is then solved using a Monte Carlo scheme with each step split in two parts: advection, in which we introduce, evolve and scatter particles at boundaries (see Appendix E for more details) and scattering within the material, in which the distribution evolves according to n(t + ∆t) = P (∆t)n(t) = e B∆t n(t) (13) where the propagator P (∆t) is a non-Markovian transition matrix, as it has negative or higher than unity elements [30], which makes its direct implementation difficult. A more implementation-friendly form can be obtained by using the power series: (14) can be implemented stochastically for a particle in state j with sign σ through the following strategy: 1. Sample a random number R in [0, 1), and find the lower bound f of k |P kj (∆t)|/P j for R. 2. Set the new sign to σ = sgn(P f j σ). 3. If σ σ, generate two particles with state j and time t.
Finally, the deviational energy density ρ d and the heat flux J are obtained from the distribution in the i-th computational box as: where ε d is the deviational energy per particle and r j is the position of the j-th particle.

B ij calculation and enforcement of conservation laws
As noted by Landon et al., it is important for B ij to respect crystal symmetries and microscopic reversibility [22]. The original almaBTE routines for calculating scattering amplitudes lead to vialoations of those constraints because the smearing method implemented there does not enforce the symmetry between emission and absorption processes [11,31].
To enforce symmetry, matrix elements are built from a single representative of each equivalence class in the quotient group of q points using a new symmetric adaptive smearing scheme for energy conservation where i, j and k are the phonon modes taking part in the three-phonon process, µ indicates a reciprocal-space lattice vector, α indicates a Cartesian axis, G µα is the reciprocal lattice basis matrix, N µµ is a diagonal matrix whose elements are the size of the q-point grid, and a is a broadening factor. The theoretically optimal value of a is 1, but it can often be decreased with significant gains in performance and little degradation in accuracy. Next, those matrix elements are expanded using crystal symmetry and microscopic time reversibility and averaged to eliminate possible asymmetries. This lookup, together with matrix building, is parallelized via MPI and OpenTBB, with stable summations following Neumaier's algorithm for matrix collapse and gathering [32].
Finally, it must be noted that B ij requires a rather strict conservation of energy, which is violated by the broadening scheme. Therefore, we add a correction extracted from a Lagrange-multiplier approach [22] to our matrix to enforce it. On top of that, Landon et al. also discussed the necessity of including a momentum correction to make normal processes conserve the momentum. However, we found that including such correction was unnecessary and, in fact, results in spurious effects such as nonnegligible fluxes in directions perpendicular to the thermal gradient for homogeneous bulk systems.

Efficient propagator calculation
The scattering algorithm requires the explicit calculation of the propagator matrix P (∆t) = e B∆t [see Eq. (14)]. Although the matrix exponential is a well defined mathematical operation given by: its practical computation is cumbersome and still a topic under active research, with lots of methods available [33]. One of the most common approaches to computing e A is the scaling-and-squaring method [34,35], also chosen by Landon in his original work [30]. The method is based on squaring the matrix to reduce its norm, then computing the exponential using a Padé approximant and undoing the squaring, with an overall computational cost of at best 20N 3 operations for dense matrices of size N [35]. For reference, the B-matrix of the prototypical 2DM, graphene, contains approximately 1.5×10 9 elements when a 80 × 80 × 1 grid is used, so the scaling and squaring method is not suited for our problem.
In contrast, Krylov subspace methods are especially suited for big matrices, where the action of the exponential matrix (e A ) on vector (b) can be approximated using much more smaller matrices. The Krylov subspace (K n (A, b)) of order n is a vector subspace spanned by {b, Ab, A 2 b, . . . , A n−1 b}, an orthonormal basis (S n ) of which can be build via Arnoldi iteration [36]. The problem can be then recast in terms of K n (A, b) as [33,37]: where H n is the projection of A on the basis S n (of size n×n) and e 1 is the first column of the identity matrix. The Krylov subspace size and therefore the dimensions of H n may be truncated down to a desired precision via the error bound [38]. In fact, small values of n tend to give good approximations and enable a calculation of the small n × n sized e Hn -matrix efficiently through the scaling and squaring method. Despite its efficiency and suitability for our case, the Krylov subspace method is limited to the calculation of arbitrary matrix-vector products e A b, not of e A itself. Nevertheless, one can easily recover each column of e A by using canonical basis vectors as b-vectors. This way of calculating e A has the added advantage of being straightforward to parallelize, as each column can be calculated independently.

Linear interpolation of the propagator for systems
with multiple reference temperatures From Eq. (12) it becomes clear that different reference temperatures would require different propagators. This is not problematic per se, but the fact that each propagator occupies a big amount of RAM can be a problem for simulations with variable reference temperatures [17]. To relieve the memory burden for such simulations we use on-the-fly linear interpolation of P (∆t) between pairs of temperatures, thus requiring memory storage only for a few reference propagators. Linear interpolation was chosen because it ensures energy conservation at the interpolated temperatures. We tested the performance of these linear interpolants against the corresponding exact propagators by calculating the error per element between the phosphorene propagator at 305 K and the interpolated result using 300 K and 310 K as knots. We also did the same for 320 K using 300 K and 340 K as knots. In both cases, we obtained an error per element in the order of 10 −6 .

Code Validation
As previously mentioned, 2DMs are being extensively studied as possible substitutes of silicon in MOSFET [39,40,41]. Amid all candidates to succeed silicon, the monolayer, also known as phosphorene, and few-layer black phosphorous (bP) have attracted lots of attention due to its electronic properties, such as its high mobility when compared to other candidates like transition metal dichalcogenides [42]. Indeed, is it possible to find several examples of fully functional MOSFETS based on few-layer bP [43,44,42]. The work of Wu et al. is of interest, presenting high-performance MOSFETs with reconfigurable polarities [45]. Furthermore, phosphorene has been proposed to be an important actor in the survival of Moore's law down to atomic sizes [46] thus increasing the importance of controlling heat transport for phosphorene at the device level.
Therefore, in this section we present phosphorene-based test cases. To that end, we have used first-principles Second-order interatomic force constants were renormalized to enforce crystal symmetry, translational invariance and rotational invariance necessary for a proper description of quadratic acoustic bands [48], the broadening parameter was fixed to 1 for energy conservation and the layer thickness was set to 0.533 nm [49]. The phonon properties and the propagator were calculated on a Γcentered q-mesh of 50 × 50 × 1 points, for which thermal bulk conductivity is found to be converged-with less than a 5% change with respect to a higher quality mesh of 100 × 100 × 1 points-at 300 K. The propagator for the bRTA calculations was calculated using a time step of 0.25 ps.

RTA code validation
To validate the RTA code, we simulated an infinitely large piece of phosphorene with an applied thermal gradient represented as a source generator (see Sec. 3.2.1 and Appendix D) as depicted in Fig. 4.
κ MC AC,RTA and κ MC ZZ,RTA were calculated via Fourier's law from fluxes (J h = −κ l ∇ r T ). The results are quite close to the ones obtained using almaBTE's bulk thermal conductivity calculator, kappa Tsweep (see Table 1); with the differences being less than the typical experimental error of 5% for thermal conductivities [50].
We conducted an additional test of this RTA algorithm by comparing it to an RTA version of bRTA (see Appendix F). To do so we simulated an infinite nanoribbon (NR) in the AC direction with an applied gradient of 0.2 K nm −1 along the NR. The heat flux profiles from both methods, plotted in Fig. 5, are in excellent agreement.

Beyond RTA: B-matrix validation
To validate our B ij construction algorithm we used the resulting matrix to obtain the lattice thermal conductivity   (κ l ) by iteratively solving the linear system: using the RTA solution (n d,RT A j = 1 Bjj ∂n 0 j ∂T v j · ∇ r T ) as an initial guess, and we then compared the results against almaBTE's kappa Tsweep for the case of phosphorene (see Table 2). The agreement between both methods and the fact that they are in line with other theoretical calculations provide support to our methodology.

Beyond RTA: Propagator and bRTA validation
To validate P (∆t) together with the rest of the bRTA implementation, we simulated an infinitely large piece of phosphorene with an applied thermal gradient represented as a source generator [see Eq. (D.2)], as shown in Fig. 4.
The MC heat fluxes for the infinite phosphorene under thermal gradients along the ZZ and AC directions are plotted in Fig. 6. κ MC AC and κ MC ZZ are calculated via Fourier's law (J h = −κ l ∇ r T ) to be 27.4 ± 0.2 W/(m · K) and 82.8 ± 0.5 W/(m · K) respectively. Those results show an excellent agreement between iterative and MC solutions, thus validating our bRTA implementation.

Phosphorene devices
In this section we present thermal transport results for different phosphorene-based configurations/devices using the simulators developed in previous sections.

Nanoribbons
Among the simplest 2D-based systems used in devices are nanoribbons [40,41]. We have calculated the heat transport in infinite phosphorene AC nanoribbons under the effect of a thermal gradient of 0.2 K nm −1 . The results of the normalized heat flux relative to the bulk value, together with an RTA rescaled version for three different widths are shown in Fig. 7.
As expected, boundary scattering increases and becomes dominant over other mechanisms in thinner nanoribbons. This is clearly seen in the reduction of heat flux with decreasing width and the fact that the difference between the RTA and beyond-RTA methods vanishes for smaller ribbons (see the 4 nm case in Fig. 7a),since boundary scattering is not dependent on the approach used to describe intrinsic anharmonic and isotopic scattering. Consequently, for wide nanoribbons in which anharmonic and isotopic scattering are dominant it should be possible to obtain a good approximation to bRTA results by simply using a κ almaBTE /κ RTA -rescaled RTA (see Fig. 7b). Indeed, the main differences between this estimate and bRTA for the 400 nm-ribbon are in the regions near the boundaries, in which the bRTA flux is lower than the rescaled version. Although also visible in other cases, this is more pronounced in the wider nanoribbon.
In view of the above, the RTA clearly overestimates the momentum destruction due to intrinsic scattering leading to more diffusive flux profiles when compared to bRTA results. The latter yields more Poiseuille-like profiles, with stronger hydrodynamic features, by properly capturing the coupling between phonon modes [20,53].
To further explore those hydrodynamic signatures, we fitted our nanoribbon results to a mesoscopic equation based on Sellitto et al.'s work [54]: where W is the nanoribbon width, x is the distance from the center of the nanoribbon, is the non-local length [55] and C is related to wall properties, taking a value of 2 in our case because we have assumed completely diffusive walls.
In Figs. 8 and 9 the fits of our RTA and bRTA simulator results to hydrodynamic mesoscopic equation for nanoribbons are shown; in both we obtain a set of parameters that accurately match the simulation results. The agreement afforded by the beyond-RTA method is, however, slightly better.
As for the fitted parameters, we can observe a relative fast convergence of thermal conductivity towards bulk values as the nanoribbon gets wider and boundary scattering effects become negligible in the middle of the strip (see   10). The value of the non-local length ( ) rises towards a converged value as the ribbon becomes wider (see Fig. 11), in agreement with what is expected from a microscopic description of the value [55]. The observed differences between RTA and the higher beyond-RTA values, especially for larger widths, can be easily interpreted by keeping in mind that in RTA all scattering processes are deemed as resistive and introduce artifactual modifications of the heat flux. It should be noted that the theoretical formulas used to obtain the RTA,iso values are derived under the assumption of isotropy and are therefore are expected to be useful only as approximations in our case. For reference, the results for ZZ-nanoribbons are also given in Appendix G. Finally, we have also obtained κ nano for nanoribbons of several widths with both types of edges using the methodology for solving the PBTE in systems with edges described in Sec. 3.1 (see Fig. 12). For consistency check, we compared the effective flux obtained via Fourier using κ nano with the flux average over width obtained from Monte Carlo simulators, obtaining, as can be seen in Fig.13, an excellent match between both methods.

RTA, bRTA and Fourier heat equation comparison
Taking into account that operational frequencies of microprocessors are limited to the GHz by cooling constraints [46,56], it is of interest to be able to study heating dynamics at short times. As an example of capabilities to      simulate short heating dynamics, we have studied the temperature time evolution for a piece of phosphorene initially at 300 K with periodic boundary conditions in the AC direction and sandwiched between two isothermal reservoirs, at 300 K and 302 K, in the ZZ direction (see Fig. 14). Fig. 15 shows the RTA, bRTA and Fourier ( ∂T ∂t = α∇ 2 T , where α is the thermal diffusivity: α = κ/C v ) heat profiles at two different times. The difference between the Fourier heat equation and BTE results at short times is a well known shortcoming of the former [57,58]. Regarding PBTE solutions, bRTA and RTA results clearly differ at this instance. The bRTA shows a faster heating, which is not surprising since RTA scattering completely randomizes momentum, thus dampening the fluxes and leading to a lower thermal conductivity.
In Fig. 17 we plot the spectral decomposition of the contributions to the deviational temperature at the middle and near the hot edge of the beam. In keeping with the fact that differences between the RTA and bRTA solutions are the largest in the middle of the bar (see Fig. 15), spectral decompositions of the deviational temperature deviate the most at the middle as opposed to the edges. They also vanish with time as both tend to the same temperature profile (see Figs. 17a-17d). Moreover, from the spectral decomposition it can also be seen that differences are more prominent at low frequencies, corresponding to phonons with longer intrinsic lifetimes (see Fig. 16), which indicates that the decay of such modes is clearly much more overestimated than for high-frequency ones. This explains the large disparity between the RTA and bRTA conductivities. It is therefore advisable to resort to the bRTA method for modeling fast/short heat dynamics, for example when studying heat dissipation in state-of-the-art electronic devices, as less sophisticated approximations fail to describe it accurately.  [ps] Figure 16: Bulk-phosphorene lifetimes as function of frequency at 300 K.

Finite device examples
As previously mentioned, being able to predict thermal transport in complex devices and geometries is of key importance. To this end, we show examples for more complex systems, either because they have geometrical elements which are difficult to model computationally, such a wedge geometry (see Fig. 18), or because they present interesting elements from a simulator capability point of view such as more than two terminals, which is a common experimental setup.
Steady-state temperature profiles and heat fluxes for the wedge-like geometry within and beyond the RTA for two different configurations, depending on which terminal is put at 301 K (the one at the top or the one at the bottom) while the other is kept at the reference temperature of 300 K, can be seen at Figs. 19 and 20 respectively. Similar asymmetric devices are used as thermal rectifiers, but we cannot expect to detect rectification here because  this model is based on the bulk spectrum and therefore does not account for phenomena such as device-reservoir interactions or size-dependent vibrational spectra [59,60]. Moreover, it should be noted that thermal differences used here are too small for any sign of thermal rectification to be significant over statistical noise [59], or to activate the rectification mechanism based on different temperaturedependent behavior [59,61,62]. Higher thermal differences are however unattainable with the current implementation as the error introduced by the linearization of the collision operator would be too high.
The heat fluxes and temperature profiles for the multiterminal structure (Fig. 21) are presented in Figs. 22 and 23, showing the capability of our simulator to properly account for several sources/drains (isothermal reservoirs). Indeed, for the bRTA case, there is even an additional population of phonons due to the initial conditions, as the initial temperature profile was set to the RTA estimate in order to accelerate its convergence to the steady-state.

Results: Example of material junction
Finally, to show the capability of our improved RTA simulator to describe devices with different materials, we present the temperature profile (Fig. 24) in a finite device structure composed by graphene on one half and h-BN encapsulated graphene on the other half (see Fig. 25). The first-principles data needed for this calculation were obtained from Ref. 63 in the case of graphene, where those properties had been calculated on a 80×80×1 q-mesh with a broadening parameter of 1 and a conventional thickness of 0.345 nm [22]. Regarding h-BN encapsulated graphene, the required first-principles data were obtained using density functional theory using Perdew-Burke-Ernzerhof functional [64] plus the D3 [65,66] correction to energy due to van der Waals interactions between layers as implemented in VASP [67,68,69] with a Γ-centered k-mesh of 7 × 7 × 1 points for minimization, and using Phonopy [70] and thirdorder.py [11] with a supercell of 7 × 7 × 1 to obtain second and third order interatomic force constants, respectively. The phonon properties of h-BN-encapsulated graphene were calculated using a 40 × 40 × 1 Γ-centered q-mesh with a broadening parameter of 1 and setting the stack thickness to 1.001 nm [22,71,72].

Conclusions
In this work we have presented BTE-Barna, a software package that extends the almaBTE package to calculate the thermal properties of devices and systems based on 2D materials. We have showcased the new capabilities with an extensive set of tests and examples. For instance, the package was used to highlight the differences in the heat flux profile for the case of Poiseuille flow in a nanoribbon, for the case of RTA and beyond the RTA. Amid all new features the most relevant are:

Appendix A. Effective thermal conductivity in simple nanosystems: inputs, outputs and executable
The calculation of the effective thermal conductivity κ nano for nanowires and nanoribbons at both the RTA and beyond RTA levels is implemented in the kappa Tsweep nanos executable. The calculation is performed as in kappa Tsweep [12], but using τ nano λ in place of τ 0 λ . Because of boundaries breaking crystal symmetry (i.e.: S nano λ and consequently τ nano λ not possessing crystal symmetry), one needs to solve the linear system in the full Brillouin Zone; hence, kappa Tsweep nanos first recalculates and symmetrizes all bulk processes in the whole q-mesh, and if beyond RTA is required it solves the linear system using sparse matrices. The executable parameters are the same as for kappa Tsweep but with an additional parameter to control the number of TBB threads in which the recalculation and symmetrization will take place.
where u is the transport direction.
where i, j and k are the phonon modes taking part in the three-phonon process, µ indicates a reciprocal-space lattice vector, α indicates a cartesian axis, Gµα is the reciprocal lattice, Nµµ is a diagonal matrix with the q-grid size and a is the scalebroad factor) which, because of how σ ijk is calculated, breaks the microscopic reversibility as one process can be registered but not the reciprocal or it can be registered, but with different value. Moreover, the way smearing is constructed from products of velocities over reciprocal lattice vectors can cause symmetry equivalent pairs/triplets to give different values.