Active Learning in Bayesian Neural Networks for Bandgap Predictions of Novel Van der Waals Heterostructures

The bandgap is one of the most fundamental properties of condensed matter. However, an accurate calculation of its value, which could potentially allow experimentalists to identify materials suitable for device applications, is very computationally expensive. Here, active machine learning algorithms are used to leverage a limited number of accurate density functional theory calculations to robustly predict the bandgap of a very large number of novel 2D heterostructures. Using this approach, a database of ≈2.2 million bandgap values for various novel 2D van der Waals heterostructures is produced.


Introduction
The electronic bandgap (BG) is one of the fundamental properties of materials. It arises directly from the configuration of electronic structures and corresponds to the minimum energy that an electron requires to be excited into the conduction band. [1][2][3][4] The bandgap is one of the most fundamental properties of condensed matter. However, an accurate calculation of its value, which could potentially allow experimentalists to identify materials suitable for device applications, is very computationally expensive. Here, active machine learning algorithms are used to leverage a limited number of accurate density functional theory calculations to robustly predict the bandgap of a very large number of novel 2D heterostructures. Using this approach, a database of %2.2 million bandgap values for various novel 2D van der Waals heterostructures is produced.
than the transport bandgap. These two definitions have physical meaning and correspond to quantities that can be measured experimentally. In contrast, the fundamental bandgap refers to the energy separating unoccupied from occupied one-electron states, and it is meaningful only within a theoretical model. In general, the experimental bandgap does not correspond to the fundamental bandgap. Some approaches that go beyond density functional theory (DFT) have been developed to consider electron excitation for bandgap calculations. However, within the commonly used DFT approximations (e.g., generalized gradient approximation [GGA], meta-GGA, and hybrid functionals used to solve the Kohn-Sham equations for the ground state), optical and fundamental bandgaps are equivalent. [18,19] Furthermore, some studies posit a weak electron-hole binding energy in selected vdWHS and, therefore, to the first approximation they can be treated as the same quantity. [20] For the simplest approximation of the exchange correlation functional, DFT largely underestimates the value of the bandgap. When calculated within the local density approximation (LDA), the values are typically %40% less than experimental values. [21,22] To improve the accuracy of the calculations, functionals considering the gradient of the charge density (GGA) are used. However, significant underestimation of the bandgap values persists and, in many cases, it leads to an incorrect description of the electronic and optical properties of a material. [23][24][25] To improve the prediction of bandgap calculations, Hartree-Fock exact exchange interactions can be included in the Kohn-Sham equations, leading to the so-called hybrid functional approximations. [26,27] Heyd et al. developed one of the most widely used functionals (HSE06) for predicting electronic structures and properties of solid-state materials, including bandgaps. [28,29] However, the inclusion of exact exchange leads to significantly higher computational costs compared with simpler LDA or GGA functionals, a very important issue when a large number of calculations is required. [30,31] Due to the large number of theoretically possible 2D monolayers (%6000 structures), it is possible to generate %20 million unique novel bilayer heterostructures by a direct stacking of these 2D monolayers (N b ¼ N m (N m þ 1)/2).[https://2dmatpedia.org/), [32] ] Calculation of the electronic structures and properties of this many complex materials is currently intractable, even for simple DFT calculations, and the lack of experimentally fabricated vdWHs makes validation challenging.
We have previously shown how machine learning (ML) models can be used to leverage results from DFT calculations of hybrid 2D materials, generating outstanding results. [32] Here we demonstrate how a combination of a relatively small number of computationally expensive DFT-HSE06 calculations and a ML method called active learning (AL) can reliably predict bandgap values for a large set of stable semiconducting bilayers. [32] We show how these ML models can be used to create a database of reliable bandgap values for approximately 2.2 million vdWHs, using a very limited number of first-principles calculations (%400 bilayers).

Results
Here, an AL model was built starting from an initial training set of 109 randomly selected bilayers (X L ), where each bilayer consists of two semiconducting monolayers from the 2matpedia database.
Members of this set have an IE, E ≤ À1.0 eV Å À2 , ensuring that the interactions are vdW, and contain a subset with HSE06 bandgaps relatively close to literature values (see Table 1). Possible discrepancies between our results and the literature data may originate from the use of different computational methodologies and/or from configuration of the supercells used in the calculations.
The 109 HSE06 bandgap calculations were used to build the initial Bayesian neural network (BNN) model. Five different versions of the initial model were produced using different selections of the train-test sets, selected by clustering using the kmeans algorithm. [32] For each bilayer, bandgap values were calculated using the five versions of the initial model, resulting in five different databases consisting of 2.2M structures (Y 1 ,Y 2 ,… Y 5 ). After calculating the mean and standard deviation of each bilayer bandgap, a set of %100 bilayers (X AL1 ) that had the largest standard deviations were selected. In other words, the initial BNN models could not find a mapping function of sufficient accuracy for that set of bilayers. The process, shown schematically in Figure 1, was repeated four times, each time selecting a new set of %100 poorly predicted bilayers (X AL1 …X AL3 ), until the quality of the model reached convergence. At that point, the training set contained 473 structures whose bandgaps were predicted with an R 2 of 0.81 and mean absolute percentage error (MAPE) of 0.16, and the test set was predicted with an R 2 of 0.92 and MAPE of 0.11. An additional set of 52 structures (X AL4 ) was subsequently added to the training set to test for convergence of the parameters ( Table 2). The X AL subsets used to expand the training set are represented in the uniform manifold approximation and projection (UMAP) in Figure 2, showing how the DFT calculations are distributed over the whole set of bilayers. UMAP provides a 2D, visual, and intuitive representation of possible structural-functional similarities of the structures in the hyperspace of descriptors. Points that are close together in the UMAP correspond to structures with similar physicochemical properties. As the training set is approximately uniformly distributed Table 1. Bandgap (in eV) comparison between the value calculated here using HSE06 and other HSE06 calculations found in the literature. [58][59][60] Discrepancies may result from different methodologies used in the approximation of the vdW potential and/or configuration of the supercells used.  [58] OTl 2 -GeI 2 1.83 1.45 [58] Br 2 Mg-Cl 2 Zn 5.50 5.49 [58] Cl 2 Zn-CdCl 2 5.10 5.29 [58] OTl 2 -O 2 Pt 0.62 0.86 [58] I 2 Yb-Br 2 Ge 1.30 1.04 [58] InSe/AsP 1.26 1.07 [59] HfS 2 /MoTe 2 0.59 0.35 [60] www.advancedsciencenews.com www.advintellsyst.com across the UMAC, the 2.2M structures predicted by the ML models lie within the domain of applicability for the model, so should be predicted with reasonable accuracy.
To demonstrate how efficient the AL model was we generated an additional BNN model trained on 425 randomly selected bilayers. This model did not predict the training and test set well, with no R 2 values greater than 0.51. This indicated the fundamental role that AL played in reducing the computational cost and increasing the accuracy when leveraging HSE06 calculations of 2D heterostructures.
The distribution of bandgaps and relative errors associated with the predictions, calculated from 600 trial Bayesian networks, as a function of the monolayer building blocks, is shown as a heatmap in Figure 3. This shows a logarithmic distribution of bandgaps over the 0.1-8.0 eV range, with %9% of the structures having a direct bandgap configuration. Notably, it also indicates a relatively large error (but still below 30%) associated with low bandgap values, apparent around the monolayer 1 ¼ 650-750.
Although the error appears to be associated with specific monolayers, this is due to the sampling used to build the training set that explores the hyperspace of bilayer descriptors in a partly inhomogeneous way. Therefore, any relationships between the BNN error and the nature of chemical bonds or chemical composition of the bilayers are not significant. This is also confirmed by the distribution of bandgaps and largest error structures in the UMAP across the whole set (see Figure 4).
The bilayers were grouped by bandgap energy and labeled with their position in the optical spectrum as follows: infrared The number of bilayers within each range is shown in Table 3.
Within each bandgap group, we counted the frequency of each monolayer in the 2.2M bilayer dataset and shows the most frequent representatives in Table 4.
To elucidate the contribution to the bandgap due to inclusion of the exchange term in the DFT functionals, the correlation between the bandgaps calculated within the GGA-PBE and HSE06 approximations was assessed (see Figure 5). The Table 2. R 2 , root mean-squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) test and train set predictions for the BNN bandgap models. The results are labeled progressively by four steps where each step adds additional data point sets (X AL1 …X AL3 ) to the initial 109 bilayers, selected using an AL algorithm. The fifth run was carried out using additional 52 bilayers (X AL4 ) to test the convergence of the parameters.   www.advancedsciencenews.com www.advintellsyst.com Pearson and Spearman correlations between PBE and HSE06 bandgaps were 0.68 and 0.60, respectively, indicating a significant (linear) correlation between the results of the two approaches. The feature importance (FI) of each descriptor and the one of the GGA-PBE bandgaps were calculated, and the results show that the GGA-PBE bandgap FI is only marginally higher than the largest FI of the other descriptors (0.13 vs 0.08). In addition, a least absolute shrinkage and selection operator (LASSO) regression analysis indicated that the GGA-PBE calculated bandgap is a less-effective descriptor in the BNN models compared with the structural descriptors used here, validating the relevance of the descriptors used to represent the bilayers in this study (listed in the Supporting Information). [32][33][34] We also assessed the change in the bandgaps as a function of the twist angle. Monolayers with different symmetries along the x-y plane (AgI, AlTe, BN, Br 3 Cr, ITaTe 4 , and GaSe) were selected, and supercells with twist angles of 0 , 30 , 45 , 60 , and 90 were built. The calculated total energy of each structure elucidated the resultant effect of charge transfer on the bandgap. The charge transfer suggested a relatively small change at different twist angles because of the weak vdW interactions, resulting in a negligible bandgap change (shown in the Supporting Information).
Although the bandgap change with the twist angle was small, care must be taken in interpreting this outcome because of the limited number of twist angles that produce a supercell small enough to make the calculations tractable. The literature suggests that for specific twist angles, the bandgap can vary up to %15%, providing a novel way to use geometric parameters of a bilayer to adjust the electronic properties. [35][36][37][38] Our work identified the crucial role of crystal symmetries in the band structure configuration. The point symmetries, rotation, reflection, and inversion determine the nature of the bands, supporting the validity of the  www.advancedsciencenews.com www.advintellsyst.com present analysis within the high-symmetry/low-energy/bilayer configurations of the vdWHs. [39] Furthermore, the potential correlations of the HSE06 bandgap with the IE and elastic constant along the z-axis (C 33 ), calculated in our previous work were analyzed. [32] These two quantities represent macroscopic features that originate from the charge redistribution upon vdWHs formation, and therefore they may, in principle, have an influence on the band alignment that determines the bandgap. However, the Pearson and Spearman coefficients between the HSE06, and the IE and C 33 were calculated to be %0.06 and 0.03, respectively, indicating a negligible correlation between these quantities, again confirming that the complex nature of the bandgap goes beyond intuitive considerations.   www.advancedsciencenews.com www.advintellsyst.com

Conclusions
We have used DFT calculations and AL to generate a database of %2.2 million novel vdWHs bandgaps, potentially identifying those with significant potential for technological and scientific utility. No correlations between the BG and other fundamental properties of vdWHs such as IE and C 33 were found, highlighting the BG calculation as a fundamental problem rather than an emerging property correlated to macroscopic observables. This work did, however, find a relatively strong correlation between the PBE and HSE06 calculated BG. Furthermore, the potential of active ML to very substantially accelerate convergence of ML model prediction accuracies using modest training set sizes was demonstrated, demonstrating the capabilities of the combined ML þ DFT approaches used in materials discovery.

Experimental Section
DFT Calculations: To calculate the energy of the structures by DFT, we employed a projector-augmented waves (PAW) approach as implemented in VASP, within both the GGA (Perdew et al. [PBE]) and HSE06. [42][43][44] In HSE06, the van der Waals correlation correction was applied using the method of Grimme, with Becke-Jonson damping (DFT-D3). [40,41,45] A (3 Â 3 Â 1) point k-mesh, where x and y are in the plane of the 2D layers and z represents the orthogonal stacking axis for the monolayers, and a basis set energy cut-off of 700 eV were used for all geometry optimization calculations. A larger grid of (9 Â 9 Â 1) was used for non-self-consistent bandgap calculations. The energy minimization tolerance was 10 À6 eV, and the force tolerance was 10 À2 eV Å 2 . A vacuum of %15 Å along the z-axis was chosen to avoid interactions with replicas in the periodic boundary conditions. Structural information was obtained for 6,138 monolayers from an online database (https://2dmatpedia.org/) and 2,132 semiconducting structures were selected to build the 2 270 142 (2.2M) bilayer database. A subset of bilayers was used for the DFT calculations, selected by applying two constraints: the lattice mismatch, L m < 2%; and the number of atoms in the cell, N a < 200. This ensured both a reliable convergence of DFT calculations and a reasonable computational time.
Only one twist angle between the two monolayers was considered, corresponding to the lowest energy configuration. However, some additional considerations were also required. In both homo-and heterostructures, the weak interlayer forces allowed the angles between the monolayers along the x-y plane to adopt different values, with only a small change in the interlayer energy (IE; %30 meV). However, as a consequence of the angle-dependent symmetry of the resulting bilayer and the resulting formation of dipoles, the band structure may be significantly affected by the twist angle (changes reported in the literature are between 5% and 15%). [35][36][37]46] Although this may be useful for engineering the bandgaps of structures, it adds additional complication to the bandgap analysis. [38,47,48] We found a negligible change in the bandgap at high symmetry twist angles for selected bilayers, supporting the accuracy of our calculations within low energy configurations.
Bayesian Neural Networks: We used a Bayesian neural network (BNN) ML algorithm to predict the bandgaps of a large number of heterostructures. From the Bayesian point of view, regressions were formulated using probability distributions rather than point estimates. [49] The target property, or response, was not estimated as a single value, but was assumed to be drawn from a probability distribution using a dropout approach. [50][51][52] Therefore, our BNN also predicted the confidence interval for each value, indicative of the quality of the prediction for each individual heterostructure. [53] Here, a BNN with two hidden layers composed of 32 neurons each was used, where the dropout probability was 0.1. The dropout regularized the network and avoided overfitting by creating a distribution over the calculated response. This was averaged over 600 trial networks giving the mean response value and the associated standard deviation. A detailed description of the BNN used can be found in previous studies. [32] Active Learning: In evaluating the exact exchange energy density, HSE06 calculations have a large computational cost and large memory requirements compared with GGA. The central processing unit (CPU) time ratio of a singlepoint calculation carried out within HSE06 approximations was %5 and 7 times larger than for GGA-PBE calculations, consistent with the literature. [54] Thus, the use of HSE06 calculations to compute the bandgap for selected representative bilayers in the training set was very computationally demanding. Here, an active learning (AL) approach was adopted to restrict the number of HSE06 calculations required to train the BNN. [55,56] The main advantage of AL was that the data used for BNN training can be chosen selectively, leading to a better performance, while requiring substantially less data than traditional static learning methods. It was important to ensure that the training dataset was representative of the true distribution of the data.
The data were divided into a very small, labeled dataset that included up to a few hundred bilayers (the seed X L ), and a large unlabeled dataset that included up to %2.2 million bilayers (X U ). Typically, there was an arbitrary partitioning between the labeled and unlabeled data. After splitting the data, the seed can be used to train the model. Once the model had been trained N times using a different training-test split by k-means clustering, it can be used to predict the response for the unlabeled data (Y i ¼ f i (X U )). Each element of the unlabeled data will therefore have N different predicted values. By calculating the mean and standard deviation of the target property (response) for each unlabeled data over the N runs, the worse performing structure was selected, its properties calculated using HSE06, and added to the seed for the next BNN training iteration. With this new labeled dataset, the learner can be retrained, iterating the process until the required accuracy was achieved, evaluated by parameters such as R 2 , mean square error (MSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). The measures of dispersion were preferred as they are less dependent on the number of parameters in the model and the number of materials in the data set. [57] Supporting Information Supporting Information is available from the Wiley Online Library or from the author.