NanoNET: an extendable Python framework for semi-empirical tight-binding models

We present a novel open-source Python framework called NanoNET (Nanoscale Non-equilibrium Electron Transport) for modelling electronic structure and transport. Our method is based on the tight-binding method and non-equilibrium Green's function theory. The core functionality of the framework is providing facilities for efficient construction of tight-binding Hamiltonian matrices from a list of atomic coordinates and a lookup table of the two-center integrals in dense, sparse, or block-tridiagonal forms. The framework implements a method based on $kd$-tree nearest-neighbour search and is applicable to isolated atomic clusters and periodic structures. A set of subroutines for detecting the block-tridiagonal structure of a Hamiltonian matrix and splitting it into series of diagonal and off-diagonal blocks is based on a new greedy algorithm with recursion. Additionally the developed software is equipped with a set of programs for computing complex band structure, self-energies of elastic scattering processes, and Green's functions. Examples of usage and capabilities of the computational framework are illustrated by computing the band structure and transport properties of a silicon nanowire as well as the band structure of bulk bismuth.


Introduction
The tight-binding (TB) method coupled with the nonequilibrium Green's function (NEGF) formalism is a widely used method for simulations of electronic devices at the atomic level [1] including large-scale FinFETs [2], nanowire FETs [3,4,5], single-atom transistors [6,7], etc. The TB method is a method to tackle large-scale electronic structure problems [8,9,10,11] by both limiting the size of a basis set and taking into account only interactions between a finite number of neighbouring atoms. As a result, even for large numbers of atoms, the Hamiltonian matrix of a system in the TB representation is sparse. Computing the inverse and eigenvalues of this matrix scales almost linearly with the system size using approximate numerical technique like Krylov subspace methods [12,13,14,10,11]. NEGF theory, often taking TB Hamiltonians as an input, provides a set of tools to solve electron transport problems for systems with both elastic and inelastic scatterings [15,16,17], spatially inhomogeneous parameters and open boundary conditions [18,19].
The Hamiltonian operator in the tight-binding representation may be written in the form: where H n,m is an element of the Hamiltonian matrix H, c † n and c m are the creation and annihilation operators respectively, and indices n and m identify basis functions. In the tight-binding representation each of the indices correspond to a set consisting of a position vector of an atomic site, and a set of quantum numbers associated with a localized basis set for each site. The algorithms presented in this work are aimed at efficiently constructing the matrix H n,m given a list of atomic coordinates and a lookup table of two-center integrals as input parameters. Note that a matrix representation of the Hamiltonian using a linear combination of atomic orbitals (LCAO) is not unique and can be obtained in various ways, resulting in matrices with varying sparsity patterns depending on the order at which atoms are addressed when the Hamiltonian matrix is built (see Fig. 1 for an illustrative example). Although permutations of atomic coordinates may lead to physically equivalent matrix representations of the Hamiltonian, some of them work better with certain numerical methods resulting in faster computations.
There are several ways to extract physically relevant information from the Hamiltonian. For isolated systems in thermodynamic equilibrium, one may compute properties by diagonalizing the Hamiltonian matrix. For the systems with open boundary conditions, one computes the so-called Green's function matrix, G [19]: where E is the energy, 0 + is a small positive infinitesimal, H is the Hamiltonian and Σ is the self-energy describing an exchange of electrons through open boundaries. In general, a numerically exact computation of the matrix inverse scales as O(N 3 ) with the number of diagonal elements N. Better scalablity may be achieved exploiting the sparsity of the matrix and taking into account that only some but not all matrix elements of the inverse are of interest for physical applications [19]. There are two main strategies for this. One is to construct a hierarchical method such as the method developed by Ozaki [20] using a set of formulae derived from a recursive application of a block LDL T factorization using the Schur complement to a structured matrix obtained by a nested dissection for the sparse matrix. The method scales as O[N log 2 N 2 ], O N 2 , and O N 7/3 for one-, two-, and three-dimensional systems respectively. The second strategy is to minimize the bandwidth of the matrix and to utilize its block-tridiagonal structure to improve the computational efficiency by using the recursive Green's function (RGF) algorithm to compute the inverse as a sequence of layers, scaling as O( k j N 3 j ), where k is the number of diagonal blocks and N j is the number of diagonal elements in the j th block. If all blocks are equal in size this corresponds to a k 2 improvement in both memory and run-time. These examples motivate development of a software framework that provides facilities to construct TB Hamiltonians in different formats defined by a particular application.
In this work, we propose a method to construct TB Hamiltonian matrices in several possible forms (with different sparsity patterns) including dense, sparse, and block-tridiagonal matrices. Section 2 describes algorithms providing facilities for efficient building of such matrices. Section 3 presents the software architecture of NanoNET consisting of two Python packages tb and negf: the first one implements new algorithms for composing TB matrices, and the latter contains tools for computing NEGF and other physical quantities related to the elec-tron transport. Relationships between TB matrices, NEGF, and other quantities of interest are briefly listed in Section 4. Section 5 contains several illustrative examples of NanoNET usage.
NanoNET is an open-source software providing a framework for electron transport computations based on the TB method. Although its functionality somewhat overlaps with other software such as NEMO5 [21], Kwant [22], pybinding [23], Transiesta [24], Smeagol [25], Gollum [26], DFTB+ and combined ASE [27] and GPAW [28], the framework has following distinctive features: it provides tools for processing and outputting TB matrices with desired properties, it focuses on the semiempirical tight-binding method with a flexible setting of tightbinding parameters together with coordinate representations of an atomic structure, and it provides a Python application programming interface to NEGF computations.
A framework architecture can be seen as a set of more or less independent building blocks and pipeline schemes that can be used in a user-written code. This flexibility allows the user to pick algorithms that suit best for his particular problem to achieve the best performance. Due to its compact syntax, Python seems to be a particularly convenient programming language for implementing programming interfaces for the software frameworks (examples of popular Python frameworks are Tensorflow from Google [29] for building neural networks, Flask for web development etc.). For the electronic transport problems, examples of an existing software framework are Kwant and the stack ASE+GPAW. In these frameworks, the user has multiple choices of computational methods at each stage of the calculations as well as the flexibility to modify dataflow in many possible ways, which is useful for building software interfaces and post-processing. Comparing to Kwant, our framework is more oriented on the processing of atomic coordinates, while, comparing to the ASE+GPAW stack, we provide a tool to work with semi-empirical tight-binding parameters and detecting the tri-blockdiagonal structure of matrices.
The framework interacts with a user via a command line interface or Python application programming interface inspired by the ASE [27] and GPAW [28] software packages. In future we will provide additional interfaces to achieve compatibility with ASE and Kwant in order to enhance mutual functionality of these frameworks.

Construction of Hamiltonian matrices
Using a list of atomic coordinates, L, the Hamiltonian matrix H is constructed in the framework of the TB method via Algorithm 1. Each entry in L contains information of an atomic site site j ; the numerical index of the site, j, the name of chemical element, label, and Cartesian coordinates, coords. The label is used to search a look-up table, OrbitalsT able, which associates a list of orbitals with each of the chemical elements.
The algorithm starts with one of the sorting procedures defined by a user or embedded in the framework. Different matrix representations are determined by a particular order of atomic coordinates which are in turn computed by a user-defined sorting procedure or one of the embedded sorting algorithms (see Section 2.4 for more details). Typically sorting algorithms scale as O(N log N) with the number of atoms. The location of matrix elements in the resulting matrix can be quickly determined using a pre-computed set of index offsets, even though each site can possess a different number of orbitals (see lines 3-5, 10 and 13 of Algorithm 1). When the order of atoms is determined, the Hamiltonian matrix element can be computed by addressing each atom in sequence and finding its nearest neighbours with which the atom has non-zero coupling. In order to speed up the nearest neighbour search procedure, the atomic coordinates are arranged in a kd-tree -a space-partitioning data structure for organizing points in a k-dimensional space [30,31]. The nearest neighbours search itself is a querying of the constructed tree. Building the tree scales as O(kN log N), though one query can be done in O(log N) time. The total time of the nearestneighbour search for all atoms scales as O(N log N). After arranging all atomic coordinates into a kd-tree, Algorithm 1 loops over all atoms (line 7 of Algorithm 1) and their orbitals (line 9), finds nearest neighbours for each site (line 8), loops over orbitals of the neighbours (lines 11 and 12), and computes the matrix elements of H (line 14). Note that in such an implementation, the algorithm does not run over pairs of atoms that are not coupled. Therefore each time the algorithm invokes the function get matrix element(), it most likely returns a nonzero value if the coupling is allowed by orbital symmetry. As a result, the Hamiltonian matrix can be directly constructed in one of several existing sparse matrix formats, since only nonzero elements are processed and their indices, j 1 and j 2 , are explicitly computed.

Periodic boundary conditions and coupling between unit cells
The periodic boundary conditions are determined by a set of translation vectors T defining a primitive cell of the periodic structure. The Hamiltonian describing coupling between primitive cells H C is computed according to Algorithm 2. The algorithm generates the list of atomic coordinates L t s by acting translation operators on the coordinates of each atom inside the primitive cell (line 6) in addition to the list of atomic coordinates L in Algorithm 1. For large structures, the computational burden can be reduced by translating solely the atoms adjacent to the surfaces of the primitive cell. Comparing to the list L, the entries of the list L t s have one additional field, origin, which is the index of the original atom being a pre-image of an atom after the translation. The rest of the steps of Algorithm 2 are similar to those of Algorithm 1 with the main distinction that the kd-tree is built only for the atoms outside the primitive cell and the matrix elements are computed for the pairs of atoms interacting across the borders of the primitive cell. Thus, when periodic boundary conditions are applied, two kd-trees are built: one is for atoms within a primitive cell and another one is for the atoms from neighboring/adjacent primitive cells. The matrix H C has same shape as H.

Computing Hamiltonian matrix elements
Usually in TB methods, the electronic structure problems are formulated in terms of two-center integrals neglecting threecenter contributions. Computation of these integrals is performed by a function get matrix element() in Algorithms 1 and 2. In the semi-empirical version of the TB method, twocenter integrals depend on a relatively small number of empirical parameters which, according to Ref. [32], can be reduced to the two-center integrals of a diatomic molecule, designated by a tuple of three quantum numbers, e.g. (ssσ), (ppπ) etc. Mapping those parameters to the two-center integrals is achieved via coordinate transformations from the diatomic molecule frame to the crystal frame. Explicit formulas for those transformations have been derived by Slater and Koster for many possible combinations of angular momentum quantum numbers [32]. Since the number of the coordinate transformations is rather large, we have computed angular dependence of the diatomic two-center integrals using an approach proposed by Podolskiy and Vogt [33] instead of using a table of explicit formulas. Podolskiy and Vogt have derived compact close analytic expressions for the angular dependence of tight-binding Hamiltonian matrix elements in the two-center approximation that are well suited for numerical calculations and are valid for all angular momentum quantum numbers. In our codebase, the diatomic two-center integrals are introduced using meta-programming features of Python. Namely, the parameters are arranged as a module-level variable created at runtime. Each variable is a Python dictionary with a certain naming convention (see Section 4) containing a set of parameters for coupling between a pair of elements.
The matrix elements can be computed beyond the first nearest neighbour coupling if the corresponding TB parameters are provided. In NanoNET one can specify two types of radial dependence functions: one, depending on the distance between atoms, outputs an integer number which allows for location of a certain set of tight-binding parameters (the number is used as a part of name convention and designates first, second, third etc. nearest neighbours), and the other outputs a float number which is then multiplied on the matrix element. If the radial dependence functions are not specified, NanoNET will search for and use a single set of TB parameters. Also, if not all orbitals in the basis set have same spin quantum number and the spin- : Sparsity patterns and detected block-tridiagonal structure of TB matrices for a) the hydrogen passivated silicon nanowire with 77 atoms in the unit cell and sp 3 d 5 s * basis set for silicon atoms and single s-orbital for hydrogen, and b) 2D quantum billiard consisting of 1888 atoms, each has a single s-orbital respectively. c) and d) distribution of block size cubes for silicon nanowire and quantum billiard respectively computed with a greedy algorithm and optimized recursive algorithm. orbit coupling energy is set, the Hamiltonian matrix includes the spin-orbit coupling terms computed for p-orbitals.

Sorting the list of atomic coordinates and computing block-tridiagonal representation of a matrix
The sparsity pattern of the Hamiltonian matrix H depends on the order of sites taken from the list L. This statement is illustrated in Fig. 1; the order that we store the sites of the system in the Hamiltonian matrix results in different sparsity patterns. Sorting is performed by a function specified by a user during instantiation of the Hamiltonian object. Alternatively, one can use one of the function embedded in the framework. If the sorting function is not specified the order of matrix elements follows the order of entries in the input list of atomic coordinates. In the current version we have implemented three sorting routines: sorting by lexicographical order over atomic coordinates, sorting using projections of the position vectors on an arbitrary vector as sort keys, and sorting over keys determined by a potential function over atomic coordinates.
Sorting by lexicographical order implies that atoms are arranged in a sequence of slices along an axis specified by the first element in the tuple of coordinates. Although this kind of sorting does not guarantee the minimal bandwidth, in most cases it results in matrices with relatively narrow bandwidth (see Fig. 2 for an illustrative example).
Algorithm 3: Greedy algorithm to compute blocktridiagonal structure of matrix with fixed sizes of leftmost and rightmost blocks function compute blocks (l, r, P, P * , N) Input : Sizes of leftmost and rightmost blocks, l and r, profiles of the sparsity pattern, P and P * , and N is the size of P. Output: An array of sizes of diagonal blocks, blocklist The keys computed as projections on a vector can be used for one-dimenssional structures with collinear leads, when we want to arrange atoms into slices cut along a specific direction in Euclidean space (normally parallel to leads). An example of the matrix sparsity pattern with elements sorted according to this method for a silicon nanowire is shown in Fig. 3a. The presented TB Hamiltonian is for the hydrogen-passivated silicon nanowire with the primitive cell containing 77 atoms with the sp 3 d 5 s * basis set for silicon atoms and a single s-orbital for hydrogen.
The third embedded sorting method can be used for an arbitrary two-terminal device. For this method, the atomic grid is approximated by a capacitance model [34], where atoms are represented by a charge nodes with interactions between neighbours modeled by a capacitor. In this case, the sorting keys are determined by a vector V j which is a solution of the matrix equation CV = R, where C is the capacitance matrix and R is a charge distribution. The capacitance matrix can be computed from the adjacency matrix, A, of the graph describing connec-Algorithm 4: Computing optimal block-tridiagonal representation function compute optimal blocks (P, P * , l, r, N) Input : Profiles of the sparsity pattern, P and P * , sizes of leftmost and rightmost blocks, l and r, and N is the size of P. Output: Array of diagonal blocks, blocklist. j, blocklist, l , r ← find optimal cut(P, P * , l, r, N) f lag ← false // Concatenate two lists if they are compatible with the size constraints. Otherwise the algorithms outputs the block list computed in the line 2 if flag then blocklist ← blocklist + blocklist tivity of the nodes: The elements of vector R equal minus one for nodes contacting with left lead, plus one for nodes contacting with right lead and zero for all other nodes. The matrix equation has been solved using the LGMRES method [35]. We illustrate this sorting procedure by applying it to a two-dimenssional twoterminal quantum billiard consisting of 1888 atoms, each has a single orbital. The resulting sparsity pattern is shown in Fig.  3b.
Note that all three sorting methods are heuristic approaches, meaning that they do not guarantee the minimal bandwidth of TB matrices, but lead to a significant reduction of the bandwidth comparing to an arbitrary ordering of atomic coordinates. Another way to perform sorting of atomic coordinates efficiently can rely on the graph partitioning technique described in [36].
Algorithm 5: Finding the best truncation point giving smallest value of the goal function, Eq. (6) function find optimal cut (P, P * , l, r, N) Input : Profiles of the sparsity pattern, P and P * , sizes of leftmost and rightmost blocks, l and r, and N is the size of P. Output: Index of the optimal cut, j, an array of sizes of diagonal blocks, blocklist, and sizes of the diagonal blocks adjacent to the cut, l and r L ← empty list E ← empty list // Loop over all possible cutting indices for j ∈ [l + 1, N − r] do // Compute sparsity pattern profiles for the left block This technique will be embedded in the codebase in the next release.
It is useful for many applications that our matrix be split into a block-tridiagonal structure. In this work we propose a greedy algorithm that can detect optimal block-tridiagonal representation of the band matrix. The proposed algorithm first evaluates the bandwidth of a matrix defined by the expression: where K is the index set of the matrix. The matrix bandwidth is then given by the maximal element of the array B. According to this expression the bandwidth is computed for each row as a difference between the largest index of the non-zero elements and the index of the element on the main diagonal (see Fig.  4). If no such element exists, the bandwidth is equal to zero. profile B * defined as (see Fig. 4): Sizes of diagonal blocks of the original matrix are computed using the function compute blocks(l, r, P, P * , N) which takes fixed sizes of the leftmost and rightmost blocks, l and r, as well as the sparsity pattern profiles, P and P * , and the size of the matrix, N, as input arguments. The sizes of blocks l and r are defined by the maximum index of non-zero elements in the corresponding coupling Hamiltonians H −1 and H 1 for the open systems or by l = P[0] and r = P * [0] for the isolated structures. From the mathematical point of view, these conditions are sufficient for the algorithm to work but they are not necessary, meaning that in some cases they can be relaxed in some cases. Strictly speaking, the size of the coupling matrix should be smaller than the sum of the sizes of the two leftmost or rightmost diagonal blocks. The algorithm compute blocks implements Algorithm 3 according to which if sizes of left and right blocks are small enough (see Algorithm 3 for explicit conditions), the algorithm defines new smallest leftmost and rightmost blocks that fit into the sparsity pattern profiles. Also at this stage the algorithm determines new sparsity pattern profiles being subsets of the original profiles taken from l to N − r, and invokes the function compute blocks with new parameters again (see Fig. 5a). At each function call, new block sizes l and r are determined by the sparsity pattern profiles. Since those sizes are the minimal allowed sizes taken at the current step, the algorithm belongs to the class of greedy algorithms [37]. The sequential calls break in one of three cases illustrated in Fig. 5b-d: the spacing between left and right blocks is not sufficient for the algorithm to continue, the function returns the ordered set of two elements (max[l, r])∪(N − max[l, r]) (see Fig.  5b); the sum of the left and right block sizes is equal to the size of original matrix, the function returns an ordered set consisting of two elements (l, r) (see Fig. 5c); and the left and right blocks overlap, the function returns the size of the original matrix (N) (see Fig. 5d).
A greedy algorithm does not guarantee the optimal solution. Indeed, during analysis of our algorithm we observed instances that lead to very large block sizes when the bandwidth varies substantially over the matrix. However, we found that the solution can be further improved by sequentially splitting the largest blocks. In order to do that we start with splitting the original matrix A into two sub-matrices, A 1 and A 2 (see Fig. 6), determined by a value of some cutting index, s (see Algorithm 4). At the cutting point, the sparsity pattern profiles determine sizes of new rightmost and leftmost blocks, r and l , for matrices A 1 and A 2 shown by grey shaded area in Fig. 6. To determine the most favorable choice of the cut s, we apply the previously discussed algorithm compute blocks to each of the matrices A 1 and A 2 for a range of values s in the interval (l, N − r) and minimize the goal function: where N j is the size of j-th block, (see Algorithm 5 implementing the function find optimal cut). Minimization of the goal function represented by a sum of cubes of diagonal blocks optimizes the computation time for recursive Green's function algorithms which scale as O(N 3 j ). If the sizes of the matrices A 1 and A 2 are larger than a sum of sizes of leftmost and rightmost blocks, further improvement can be achieved by repeating this procedure recursively for each of the matrices A 1 and A 2 (see Algorithm 4). We find the points of these optimal cuts are almost always where the bandwidth is at its widest.
We have applied the algorithms described above for two structures whose TB matrices have sparsity patterns shown in Fig. 3 a and b. In the first case, the original size of the matrix is 446 × 446. The algorithm outputs the diagonal block sizes of 131, 97, 121, and 97 (shown in Fig. 3a as red rectangles). Note that the algorithm suggests a solution with four blocks, and this corresponds to the number of silicon atomic layers in the primitive cell along the direction of translation. For the second case with the matrix size 1888×1888, the number of blocks is 51. In Fig. 3 c and d we also compare output of the function compute blocks that computes the block sizes in a single run with the output of the function compute optimal blocks computing an optimized solution. In the last case the sizes of block are more equalized.

Software Architecture
A generic control flow for applications developed with NanoNET [38] is shown schematically in Fig. 7. The input parameters are the list of atomic coordinates and a table of twocenter integrals. The framework contains two packages tb and negf. The package tb is the core responsible for composing and diagonalizing Hamiltonian matrices. The negf package processes TB matrices; it contains subroutines for computing Green's functions, namely implementing the Recursive Green's Function (RGF) algorithm [19].
tb represents a library of Python classes facilitating building Hamiltonian matrices, imposing periodic boundary conditions and computing electronic structure for molecules and solids using the TB method in the single-particle approximation. The Hamiltonian matrices are built from an XYZ file containing atomic coordinates and a list of TB parameters.
The software architecture relies on the object-oriented paradigms -the framework represents a library of classes whose UML diagram is shown in Fig. 8. The central class of the framework is called Hamiltonian and contains all necessary information and functionality to construct Hamiltonian matrices that represents its main goal. This class inherits properties and member functions from classes BasisTB and StructureDesignerXYZ -abstractions for basis sets and geometrical configuration of atoms correspondingly. The first one stores a look-up table that allows one to associate a set of orbitals with a label denoting a chemical element. The second one stores a kd-tree built from the list of atomic coordinates.
The class CyclicTopology is used when periodic boundary conditions are applied. It translates atomic coordinates according to translation vectors and creates a kd-tree for atoms outside the primitive cell, needed to compute the Hamiltonian matrix responsible for coupling between neighbouring primitive cells.
The orbital sets are created using facilities of the class Orbitals. This class is the parent class for all basis sets. The current version of the software package contains predefined basis sets: sp 3 d 5 s * model for silicon [39], SiliconSP3D5S, single s-orbital for hydrogen [4], HydrogenS, and an sp 3 model for bismuth [40], Bismuth.

Software Functionalities
The interface to NanoNET is provided either by an application programming interface represented by a library of Python classes, or by a command line user interface for Unix-like operating systems.
The functionality of the package includes: • building TB Hamiltonian matrices (Algorithms 1 and 2); • detecting block-tridiagonal matrix structure (Algorithms 3 -5); • diagonalizing Hamiltonian matrices using standard numerical methods implemented in NumPy and SciPy packages; • computing a set of non-equilibrium Green's functions as well as observables that can be derived from them such as density of states, transmission function and charge density. The Green's functions are computed using the recursive Green's function algorithm [19].

Extracting physical quantities of interest from the Hamiltonian matrix and Green's functions
Although NanoNET can be interfaced with any other software by extracting Hamiltonian matrices in any favorable format and passing them as an input to other programs, for the sake of completeness we provide basic routines for computing some physical quantities of interest from Hamiltonian matrices. Time-independent observables, such as energy spectrum, density of states, and stationary and/or equilibrium electron density distribution, can be extracted from the eigenvalues and eigenvectors of the Hamiltonian matrix obtained by the direct matrix diagonalization or computed via the Green's functions formalism that requires computing the matrix inverse. The direct matrix diagonalizaion is implemented as a member function of the class Hamiltonian in the package tb, while all other algorithms performing post-processing of the tight-binding matrices are placed in the package negf.

Band structure
For the periodic structures the band structure can be found solving the eigenvalue problem H(k)ψ k = E(k)ψ k , where ψ k and E(k) are the eigenvectors and eigenvalues of the TB matrix, and the wave vector k is set as a parameter. This problem is solved by a direct matrix diagonalization performed in the class Hamiltonian using LAPACK [41] routines with a Python interface provided via Numpy package.

Complex band structure
Recently much attention has been drawn to the complex band structure computation in the context of molecular junction transmission [42] and Majorana modes of nanowires [43]. The complex band structure can be computed from the TB Hamiltonian written in the form: H(k) = H 0 +H −1 e −ik +H 1 e ik , where H 0 is the intracell TB Hamiltonian and H −1 and H 1 are the Hamitonians of intercell coupling. This leads to the eigenvalue problem: Here, instead of the wave vector, we can set energy E as a parameter and find the eigenvalues in the form λ = e ik [44,45] from the matrix:

Self-energies for periodic leads
A quasi periodic structure is often used to model electrodes or leads that can be attached to a nanostructure. The leads modify the nanostructure by providing an additional source of elastic and/or inelastic scattering. The effect of the leads on the Hamiltonian or Green's functions associated with the nanostructure can be conveniently expressed through the selfenergies. In the case of the two-terminal devices we have left and right leads whose self-energies can be expressed in terms of the eigenvectors and eigenvalues obtained from Eq. (7). Following the procedure described in [45], the eigenvectors ψ E and eigenvalues λ E can be divided in two classes, left-and right propagating modes, depending on the position of complex eigenvalues in the complex plane. The eigenvectors and eigenvalues for the left-propagating modes are collected in matrices Ψ < and Λ < respectively, while the right-propagating modes are described by the matrices Ψ > and Λ > . With this notation, the self-energies describing couplings to the semi-infinite leads reads [45]: Another set of useful quantities employed in the nonequilibrium Green's function computations are the in-scattering and out-scattering matrices: where j ∈ {L, R}, and f j (E, µ j ) is the Fermi-Dirac distribution function with the chemical potential µ j , and Γ j = i Σ j − Σ † j is the coupling matrix.

Green's functions
Computing Green's functions is based on the recursive Green's function algorithm published in [19]. The algorithm implies a steady-state regime of the electron transport and discretized real space. The program outputs following functions as a result: retarded Green's function G = [EI − H − Σ] −1 , electron correlation function G n = GΣ in G † and hole correlation function G p = GΣ out G † . In the literature [16,15], the G n and G p are also known as the lesser and greater Green's functions, −iG < and +iG > , respectively.
The retarded Green's function contains information related to properties of a system in thermodynamic equilibrium such as density of states, transmission function (Caroli formula [46]), and conductance in the linear transport regime [47], where f 0 (ε) is the equilibrium distribution function, and the prefactor 2 corresponds to spin-degeneracy. The electron and hole correlation functions can be used to compute physical parameters such as current density, associated with the non-equilibrium regime [19].
It is necessary to note that most of the mentioned above transport properties can be also computed using the spectral methods [4,1,2,3]. Such methods do not require matrix inversion. The efficiency of the spectral methods can be significantly enhanced using the Chebyshev expansion method [48,49], kernel polynomial method [50,51,23,22] or reduced mode space techniques [5,52] which is planned for a future release.

NanoNET: illustrative examples
As an example of NanoNET usage we have computed the electronic structure for a silicon nanowire, discussed in Section 2.5, and bulk bismuth. The silicon nanowire represents the case of 1D periodic structure with large number of atoms in the primitive cell (the atomic coordinates of atoms in the primitive cell are shown in Supplementary material). The bulk bismuth is a representative example where second and third nearest neighbour interactions have to be taken into account.

Silicon nanowire example
First we load all necessary modules. Here, we make an instance of the class Hamiltonian via reading atomic coordinates from the XYZ file. The resulting band structure is visualized in Fig. 9a. Now we proceed to computing the electron transmission function.
In order to get the Hamiltonian matrices H 0 , H −1 and H 1 for Eq.
(7) and evaluate their blocktridiagonal structure, we invoke the member function t r [ j ] = np . r e a l ( np . t r a c e ( gamma l @\ g t r a n s @\ gamma r @\ g t r a n s . c o n j ( ) . T ) ) ) The resulting density of states and transmission function are visualized in Fig. 9b and c respectively.

Bulk Bi crystal example
from n a n o n e t . t b i m p o r t H a m i l t o n i a n , O r b i t a l s from n a n o n e t . t b i m p o r t s e t t b p a r a m s , g e t k c o o r d s First, we form sp 3 LCAO basis set for bismuth atoms [40]. The basis set is represented by an instance of the class Orbitals. Each orbital possesses a label, energy value, and a set of quantum numbers. Note, the basis set includes orbitals with different spins since the spin-orbit coupling has to be taken into account. The atomic coordinates may be defined in a XYZ file or directly in a Python variable containing a formated text string like in the example below. Here the variable so coupling specifies the value of the spin-orbit coupling.
For the bismuth crystals, coupling between first, second and third nearest neighbours have to be taken into account. For that we need to specify a so-called radial-dependence function that classifies nearest neighbours in one of mentioned above categories depending on the inter-nuclei distance. The function returns one of four labels. The TB parameters can be set via the function set tb params. This function follows certain naming convenction for the TB parameters: PARAMS <el1> <el2><order>, where <el1> and <el2> are chemical elements of a pair of atoms, and <order> is a number specifying the order of neighbours.
The TB parameters and orbital energies for Bi are taken from [40]: The function initialize() takes a radial-dependence function as an argument. If it is not specified the None label is used for the TB parameters. The computed Hamiltonian matrix is related to the isolated nanocrystal whose atomic coordinates are stored in the variable xyz coords. The crystal structure is defined setting the periodic boundary conditions with translation vectors defining a primitive cell. The band structure is computed for a set of points in the kspace defined below (see Ref. [53] for a definition of these high symmetry points): The resulted band structure is shown in Fig. 10.

Software performance
The overall execution time of the applications built with the NanoNET framework depends on the considered atomic structure and choice of algorithms. To illustrate this point, we measure the execution time of the components of the script computing the transmission probability for an electron in the Si nanowire from the Section 5.1 with different geometrical sizes. All numerical experiments have been carried out on one processor core of a 24-core Intel Xeon Scalable 'Cascade Lake' processor.
The time required to form the Hamiltonian matrices is determined by the nearest-neighbour search based on the kd-tree. Building such tree scales as N log N with the number of orbitals (see red shaded area in see Fig. 11 a and b). The most favorable way to increase the number of atoms in this structure from the computational point of view is increasing the length of the nanowire in the direction perpendicular to the leads. Since the area of the leads does not change in this case, the computational time spent for the leads self-energy is independent of the number of orbitals (see blue shaded area). The execution time for the recursive Green's function algorithm in this case depends linearly on the number of orbitals, since the algorithm for detecting the tri-blockdiagonal structure of the Hamiltonian matrix finds new blocks when more orbitals are added (see Fig.  11 a). In the limit of extremely long nanowires the time spent for the recursive Green's function algorithm can overcome the time needed to compute the self-energies. Note that building Hamiltonian matrices has to be done only once at the beginning of the script while the self-energies and Green's functions have to be computed for each value of quasi-particle energy (the wall time has been evaluated for 50 points). In Fig. 11 b, we show the execution time for the same structure without splitting the Hamiltonian into blocks. In this case, the execution time for the recursive Green's function algorithm grows as N 3 becoming dominant for the structures with more than 2000 orbitals. This time is determined by the execution time of LAPACK gelsy algorithm outputting the least-squares solution to a linear matrix equation. The least favorable case from a computational point of view is when new atoms are added to the leads and to the nanowire in the direction parallel to leads. The algorithm for the computing leads self-energies is based on solving the eigenvalue problem which scales as N 3 and is performed by the LAPACK algorithm cggev for the non-symmetric generalized eigenvalue problem [41]. The recursive Green's functions algorithm also cannot perform efficiently in this case, since adding new atoms leads to growing block sizes.
The data in Fig. 11 c show performance of the algorithms generating matrices in the tri-blockdiagonal form for a set of longitudinal sizes of the nanowire. If none of those algorithms are applied, some time is spent to split the Hamiltonian into the device region part, and left-lead and right-lead coupling parts (the line with square markers). Computing the tri-blockdiagonal structure with the greedy algorithm takes approximately twice as long with the same scaling (the line with triangle markers). The optimization algorithm is aimed to even out the block sizes and it is polynomial in time (the line with circle markers). However, since all the algorithms mentioned above operate on block sizes and the profiles of the sparsity patterns, not the Hamiltonian matrix itself, even in the last case the overall execution time is only a few seconds. When the Hamiltonian matrices are computed the most time is spent on the nearest-neighbour search.  Figure 11: Stack diagrams for the execution time of computing transport properties of the silicon nanowire a) using algorithm detecting tri-blockdiagonal structure and b) without detecting tri-blockdiagonal structure. c) The execution time spent detecting the tri-blockdiagonal structure of the Hamiltonian matrices.

Conclusions
We have introduced a novel open-source Python framework NanoNET for the electronic structure and electron transport modelling based on the tight-binding method. The framework provides user with facilities to build the tight-binding Hamiltonian matrices in dense, sparse or block-tridiagonal forms taking a list of atomic coordinates and a table of two-center integrals as an input. The framework can be extended by a user in several ways: adding subroutines for sorting atomic coordinates in a way that is required by a certain application and defining a radial distance dependence of tight-binding parameters. NanoNET consists of two Python packages tb and negf: the first one is for building TB matrices and the latter contains a set of subroutines that compute NEGF and related physical quantities. In future we plan to extend the package tb by additionally facilities to generate reduced models using the tightbinding mode-space technique [52].