Introduction

Accurate materials property prediction using crystal structure occupies a primary and often critical role in materials science, particularly when screening through a near-infinite space of candidate materials for desirable materials performance. Upon identification of a candidate material, one has to go through either a series of hands-on experiments or intensive density functional theory (DFT) calculations which can take hours to days to even months depending on the complexity of the system. Hence, the ability to accurately predict the properties of interest of the material prior to synthesis can be extremely useful to prioritize available resources for simulations and experiments, which can significantly accelerate the process of materials exploration and discovery. Owing to significant advances in materials theory1,2,3 and computational power, it has become possible to compute several materials properties of a compound using DFT. This has led to the creation of large DFT databases4,5, which when combined with various advanced data mining techniques have extensively contributed to enhanced property prediction models6,7,8,9,10,11,12,13 and catalyzed the development of the field of materials informatics14,15,16,17,18,19,20.

Since the size of data available for training the model has a significant impact on the quality of the predictive models21,22,23, reliable and accurate models are still limited to a few selected materials properties that are relatively easy to compute. Several works have attempted to improve the performance of the model for small datasets24,25,26,27,28. However, the quality of the prediction for these studies rely on the materials property-specific feature engineering performed prior to training the model, making it less applicable for generalized use across various properties. Alternatively, transfer learning (TL), an advanced data mining technique is often applied for scarce data problems which utilizes the knowledge learned from a large collection of historical data29,30,31,32,33,34,35. For instance, it can use the knowledge of a model for a given property trained on a large DFT dataset to build a model of the same property but on a small experimental dataset. However, the absence of a large collection of historical data for most of the materials properties prohibits the broad application of this same-property transfer learning, i.e., where both source and target properties are the same. Gupta et al.36,37,38 attempt to address this by introducing cross-property transfer learning, which allowed training models on target properties for which corresponding big source datasets may not be readily available. However, the models were confined to only taking composition as input. Although composition-only based predictive models can be helpful for screening and identifying potential material candidates without the need for structure as an input, they are by design not capable of distinguishing between structure polymorphs of a given composition, which would end up being duplicates in the data, and thus would need to be removed before ML modeling. This prevents us from applying transfer learning in cases where the datasets contain large amounts of structure polymorphs, and the removal of duplicate entries might result in significantly less data available for model training. It might also prevent the implementation of cross-materials class transfer learning, thereby limiting the application of transfer learning to the same materials class only. Thus composition-based models may have limited applicability in the materials discovery process, as structure information is critical to define the material and to perform DFT computations and further experiments for validation. Further, composition-only based models could potentially have substantial errors in the predicted values as compared to ground truth, as different structure polymorphs of a given composition can have drastically different properties. These shortcomings of models trained on composition-based inputs can be mitigated by incorporating structure-based inputs, and hence structure-based modeling presents bigger opportunities than composition-based modeling to advance the discovery process in the field of materials science.

In this work, we present a framework that combines advanced data mining techniques with a structure-aware graph neural network (GNN) to improve the predictive performance of the model for materials properties with sparse data. The overall workflow of the proposed framework is shown in Fig. 1. Here, we first apply a structure-aware GNN-based deep learning architecture to capture the underlying chemistry associated with the existing large data containing crystal structure information. The resulting knowledge learned is then transferred and used during training on the sparse dataset to develop reliable and accurate target models. For simplicity, we call the large body of available data as the source dataset, the model trained on the source dataset as the source model, the sparse data as the target dataset, and the model trained on the target dataset as the target model. The transfer of information can be performed by either fine-tuning or feature extraction methods. Fine-tuning uses the weights from the pre-trained model as the preliminary weight initialization for the network, which are further refined using the target dataset. In the feature extraction method, we treat the pre-trained model as a feature extractor to extract robust features for the target dataset and use them to build the target model using representation learning. In this work, we use structure-aware GNN-based model, ALIGNN39 as the source model architecture, as it has been shown to significantly outperform several other contemporary models (SchNet40, CGCNN41, MEGNet31, DimeNet++42) on materials property prediction task across a wide variety of datasets (MP4, QM943, JARVIS5) with upto 52 solid-state and molecular properties of different data sizes using crystal structure information as the model input. Interested readers can refer to the publication39 for more details. We implement fine-tuning-based TL for ALIGNN and design a ALIGNN-based feature extractor for feature extraction-based TL using atom, bond, and angle-based features. Therefore, all the models developed in this work are structure-aware which facilitates better screening and identification of the potential material candidates, making it easier for the domain scientists to perform follow-up DFT-computations and experiments, thereby saving time and resources in the process of future materials discovery. We compare models obtained using the proposed framework with models trained from scratch (SC). Note that the proposed framework can be easily adapted to the ever-increasing datasets and ever-advancing data mining techniques to improve the models further. The significant improvements gained by using the proposed framework are expected to be useful for materials science researchers to more gainfully utilize data mining techniques to help screen and identify potential material candidates more reliably and accurately for accelerating materials discovery.

Fig. 1: Outline of the proposed framework.
figure 1

First, a data mining model (e.g., Atomistic Line Graph Neural Network (ALIGNN)39 comprised of ALIGNN layers and Graph Convolutional Network (GCN) layers is trained from scratch on a big source data set (e.g., Materials Project (MP)4) using structure files (e.g., atomic positions for the Vienna Ab initio Simulation Package (POSCAR)) to produce knowledge model. Next, the data mining model is trained on smaller target datasets (e.g., Joint Automated Repository for Various Integrated Simulations5 (JARVIS)) with different properties by using available information contained within the knowledge model to improve the predictive ability of the model further.

Results

Datasets

We use nine datasets of DFT-computed and experimental properties in this work: Materials Project (MP)4, Joint Automated Repository for Various Integrated Simulations (JARVIS) 3D with 46 properties and 2D with 32 properties5, Flla44 with three properties, Dielectric Constant (DC)45 with five properties, Piezoelectric Tensor (PT)46 with two properties, Experimental Formation Energy (EFE)47 with one property, Kingsbury Experimental Formation Energy (KEFE)48 with one property, Kingsbury Experimental Bandgap (KEB)49 with one property, and Harvard Organic Photovoltaic Dataset (HOPV)50 with 24 properties. MP dataset was downloaded from39, JARVIS-3D (https://figshare.com/collections/ALIGNN_data/5429274), JARVIS-2D (https://ndownloader.figshare.com/files/26808917) and HOPV (https://ndownloader.figshare.com/files/28814184) from their respective figshare links and the rest of the datasets were obtained using Matminer51.

A model trained on the formation energy of the MP dataset39 is used as the source model to perform fine-tuning and feature extraction-based transfer learning as formation energy has shown to lead to meaningful representations from large source datasets36, which can then be applied during the model training on the smaller target datasets to improve their predictive performance. The rest of the datasets are used to perform target model training followed by materials property prediction and evaluation. The target datasets are randomly split with a fixed random seed into training, validation, and holdout test sets in the ratio of 80:10:10. The data size for every materials property in each of the datasets are shown in Supplementary Table 1, 2 and 3, and modifications made to some of the target dataset’s materials properties to suit the model input are shown in Supplementary Table 4. We use mean absolute error (MAE) as the primary evaluation metric for all models. We also incorporate a ‘Base’ model, which always uses the average property value of all the training data provided to it as the predicted property of a test compound as a naive baseline for comparison with scratch (SC) and transfer learning (TL) methods. Note that due to the large number of materials properties investigated in this work and the limited computational resources, we do not investigate the aleatoric uncertainty caused by the random initialization of the models.

ALIGNN-based Feature Extractor

We use a structure-aware GNN-based architecture, ALIGNN39 as our base architecture for training the source models, performing transfer learning using fine-tuning method, and extracting structure-based features, as it has shown to significantly outperform other known GNN models31,40,41,42,52 for materials property prediction across a wide variety of datasets with different data sizes39 using crystal structure information as the model input. For the initial set of input features used to train ALIGNN, please refer to the publication39. To extract structure-based features from ALIGNN, we design a ALIGNN-based Feature Extractor, which is shown in Fig. 2.

Fig. 2: Outline of the ALIGNN-based feature extraction method.
figure 2

Blue color indicates atom-based features, orange color indicates bond-based features and green color indicates angle-based features.

The structure file containing information on lattice geometry and the ionic positions of a compound is divided into atom, bond, and angle-based features before feeding into the ALIGNN-based Feature Extractor where we perform feature extraction. As the graph neural network (ALIGNN) used for extracting features comprises of an intricate arrangement of layers, simply extracting features from every layer would yield nearly 100 variations of possible features without any definite meaning. If each of these sets of features is used as model input to perform deep learning-based model training, it will make the entire process too costly and time-consuming. Hence, we define several analytical checkpoints, mainly after the ALIGNN layer and GCN layer, each containing two edge-gated graph convolution layers53 and one edge-gated graph convolution layer, respectively to extract features instead of extracting features from every layer in order to design a more generalized mechanism for performing feature extraction based TL, which is both meaningful as well as helps save time and resources to carry out the model training for the proposed framework. After performing feature extraction from the pre-defined analytical checkpoints, we obtain 9 sets of atom-based features, 9 sets of bond-based features and 5 sets of angle-based features, each with a different 256-vector representation of the compound. We also test the effect of features on the performance of the model by combining atom-bond and atom-bond-angle features from the same checkpoint. Moreover, as it is known that features extracted from the last layer of a given architecture are also helpful when performing transfer learning (also known as TL based on the freezing method54), we also combine the last set of atom, bond, and angle-based features (called atom-bond-angle features(last)) to see its effect on the performance. Note that we do not try all possible combinations of atom, bond and angle-based features extracted from different checkpoints in order to facilitate further generalizability of the workflow. Due to the nature of the source model architecture, all the features extracted from the feature extractor are structure aware. For a detailed explanation of the pre-processing of the structure-based features associated with the feature extractor, please refer to the methods section. Next, we perform model training using the above-defined set of features as input for the deep neural network where we use a 17-layered neural network comprising of stacks of fully connected layers and ReLU as the activation function inspired from21,22,55 as the base architecture and formation energy of JARVIS-3D dataset as the materials property for property prediction task, the results of which are shown in Table 1. In this work, we use a very basic deep neural network to perform model training on the extracted features to see the potential of the extracted features to predict the materials properties.

Table 1 Prediction performance benchmarking for the prediction task of ‘Atomistic Line Graph Neural Network (ALIGNN) based Feature Extractor’ on formation energy of JARVIS-3D dataset.

Table 1 shows that, in general, feature representations containing structure-aware atom-based features tend to perform better as compared to only bond or angle-based features. Moreover, the combination containing the last set of the atom, bond, and angle-based features, called atom-bond-angle features(last), performs the best among the 38 sets of features used for the analysis. Hence, for the rest of the analysis, we only atom-bond-angle features(last) as the feature set to perform feature extraction-based TL for generalizability. Moreover, we use the model with the least validation error only (among fine-tuning and atom-bond-angle features(last) based TL models) to perform model testing on the holdout test set to have a fair comparison with the SC model, i.e., both the TL and SC models look at the holdout test set only once during testing.

JARVIS-3D database

Here, we demonstrate the performance of TL models on different target materials properties in the JARVIS-3D dataset. We compare the performance of TL models with the SC models, i.e., ALIGNN trained directly on the target dataset from scratch. Table 2 presents the prediction accuracy of the best SC and best TL model on the test set for each of the 48 target properties.

Table 2 The table shows the test MAE of the SC model, proposed TL model, and % error change for each of the target materials properties for the prediction task of ‘JARVIS-3D Database’.

Table 2 indicates that TL models outperform the SC models in 42/46 cases, i.e., in ≈91% of the cases. We observe higher percent error improvement in the TL model for materials properties with less number of data points (below ~19,000 data points). Supplementary Table 5 shows that among the TL models, fine-tuning-based TL model performed the best for 27/42 target properties, and feature extraction-based TL model performed the best for 15/42. The results illustrate the benefit of using the proposed framework even when the materials properties of the source datasets and target datasets are different using structure-based features as model input. We believe this is because the source model was able to learn and extract useful and widely applicable features during the model training on the source data.

Other DFT-based databases

In the previous section, we only used a single DFT-computed dataset to perform the model training using the proposed framework to improve the performance of the target model. However, as various DFT-computed datasets are calculated using different computational settings and can show significant discrepancies across each other56, these differences may affect the performance of the target model when applying TL. Hence, here we investigate the effect of using the same source model trained on the formation energy of MP dataset on other small DFT-based databases.

Table 3 indicates that TL models outperform the SC models in 10/10 cases, i.e., in 100% of the cases. Supplementary Table 6 shows that among the TL models, the fine-tuning-based TL model performed the best for 2/10 target properties, and feature extraction-based TL model performed the best for 8/10. It is interesting to see that on smaller DFT databases, not only the feature-extraction-based TL gives the more accurate model for a large fraction of evaluated properties, but the best TL model is also quantitatively much more accurate than the best SC model, underscoring the power of structure-aware feature-extraction based TL for small datasets.

Table 3 The table shows the test MAE of the SC model, proposed TL model, and % error change for each of the target materials properties for prediction task of ‘Other DFT-based Databases’.

JARVIS-2D database

In the previous sections, we used different DFT-computed datasets containing 3D materials to perform the model training using the proposed framework to improve the performance of the target model. However, there also exist a class of materials that exhibit plate-like 2D shapes whose physical and chemical properties may differ in nature from that of 3D materials. Hence, here we investigate the effect of using the same source model trained on 3D materials dataset with TL to build target models on datasets containing 2D materials. Table 4 presents the prediction accuracy of the best SC and best TL model on the test set for each of the 34 target properties in JARVIS-2D database.

Table 4 The table shows the test MAE of the SC model, proposed TL model and % error change for each of the target materials properties for prediction task of ‘JARVIS-2D Database’.

Table 4 indicates that TL models outperform the SC models in 27/32 cases, i.e., in ≈84% of the cases. As most of the materials properties have a small number of data points, we observe even larger improvement in the performance of the TL model. Supplementary Table 7 shows that among the TL models, the fine-tuning-based TL model performed the best for 5/27 target properties, and feature extraction-based TL model performed the best for 22/27. The results demonstrate that our proposed framework is able to improve the performance of the predictive model even when the source model trained on 3D materials is applied to 2D materials across different materials properties.

Other materials class data

So far, we have observed the advantages of using the proposed framework on a variety of materials properties from different DFT-computed datasets of crystalline solids where TL models typically outperform SC models. However, as there are different classes of materials available, it would be interesting to see if the knowledge learned from one class of materials can be helpful in building a more accurate model on another class of materials. Hence, in this section, we explore the effectiveness of our proposed framework by applying it on datasets comprised of molecular properties.

Table 5 indicates that TL models outperform the SC models in 22/24 cases, i.e., in ≈92% of the cases. We also observe for some specific materials properties, improvement in the performance is always very little, such as scharber jsc, scharber pce, and scharber voc. It would be interesting to see if it is possible to analyze and quantify possible relations between materials properties from different materials classes which can lead to possible improvement in the performance of the target model for cross-property transfer learning scenarios in future work. Supplementary Table 8 shows that among the TL models, the fine-tuning-based TL model performed the best for 7/22 target properties, and the feature extraction-based TL model performed the best for 15/22. It is quite encouraging to observe that the proposed TL models outperform the SC models even when using properties from another materials class as the target properties for most of the cases. This shows that the ALIGNN model is able to successfully and automatically capture relevant atom, bond, and angle-based domain knowledge features from source data and effectively and appropriately apply that information for building improved predictive models for a variety of target properties on small target datasets across different materials classes using the proposed structure-aware TL framework.

Table 5 The table shows the test MAE of the SC model, proposed TL model and % error change for each of the target materials properties for prediction task of ‘Other Materials Class Data’.

Experimental data

Here, we demonstrate the performance of our proposed framework on experimental datasets with formation energy and band gap as materials properties.

Table 6 indicates that TL models outperform the SC models in 3/3 cases, i.e., in 100% of the cases. Supplementary Table 9 shows that among the TL models, the fine-tuning-based TL model performed the best for 1/3 target properties, and feature extraction-based TL model performed the best for 2/3. It is very encouraging to observe the improvement in performance not only for computational datasets but also for experimental datasets. This along with the other results demonstrates that the proposed framework can significantly and consistently help improve the prediction of the materials properties across various domains and classes, thereby potentially saving time and resources in the process of future materials discovery.

Table 6 The table shows the test MAE of the SC model, proposed TL model and % error change for each of the target materials properties for prediction task of ‘Experimental Data’.

Discussion

In this paper, we presented a framework that combines structure-aware GNN architecture with advanced data-mining techniques to build a powerful source model whose information is then used to build significantly and consistently accurate target models on various materials properties from smaller datasets for enhanced materials property prediction across various domains and materials classes. To show the benefit of the proposed approach, we built source models using a structure-aware GNN-based architecture called ALIGNN on the MP dataset by using only formation energy as the source materials property. This trained model was then used to perform transfer learning on 115 different dataset-property combinations to find that the proposed framework yields highly accurate and robust models even when the source property and target property are different, which is expected to be especially useful in building predictive models for properties for which big datasets are not available. We compare the performance of the TL models with ALIGNN model trained from scratch.

To check the robustness of the proposed framework even further, we perform empirical and statistical analysis to examine the performance difference between SC and TL models. First we describe empirical analysis, where we perform training size-based and extrapolation-based analysis using formation energy as materials property (as it is one of the most studied property) from JARVIS dataset. For training size-based analysis we perform model training with different training data size using the same test set (10% of the total data size) to create a learning curve with prediction error as a function of the training set size. Figure 3 shows that TL model outperform SC model for all the training sizes for formation energy prediction.

Fig. 3
figure 3

Training curve for predicting formation energy in JARVIS dataset for different training data sizes on a fixed test set.

For extrapolation-based analysis, we divide the whole dataset into different splits, where data points corresponding to the bottom 10% of formation energy values were set aside as the ‘Extrapolation test set’, and the remaining data was divided into training, validation, and test split (as ‘Interpolation test split’). The lower values for formation energy indicate a more stable compound, and it is desirable to have a model that can predict the lower values accurately and even extrapolate. The scatter plot of the prediction error for ‘Extrapolation test set’ and ‘Interpolation test set’ is shown in Fig. 4. It shows that the best TL model (in this case, fine-tuning based TL model) performs better as compared to the best SC model for both the test splits.

Fig. 4
figure 4

Prediction error analysis with mean absolute error (MAE) as error metric for predicting formation energy in JARVIS dataset using best scratch (SC) and best transfer learning (TL) model.

Next, we perform statistical analysis where we perform uncertainty and statistical significance analysis using different materials properties. For uncertainty analysis, we perform 9-fold cross-validation (as the datasets were divided into 8:1:1 ratio) for SC and proposed TL model with the best modeling configuration using formation energy and bandgap (as they are widely studied materials properties) of JARVIS 3D, JARVIS 2D, and Experimental datasets. Supplementary Table 10 shows the distribution of performance for the models across different train/test splits, where we observe that TL outperforms SC in terms of MAE for all six cases. Additionally, to see if the observed MAE is statistically distinguishable from one another, we perform a corrected resampled t-test57 and obtain p value < 0.01 for all cases. This shows the MAE obtained using the proposed TL model is statistically distinguishable from the MAE obtained using the SC model at α=0.01. For statistical significance analysis, we estimate a one-tailed p-value to compare the test MAEs obtained on 115 target datasets (out of which TL models outperformed SC models on 104 target datasets) in order to see if the observed improvement in the accuracy of TL models over SC models is significant or not. Here, as we are dealing with different properties obtained from different datasets, whose differences in MAE may not be directly comparable58, we use the Signed Test59 to estimate the one-tailed p-value. Here, the null hypothesis is ‘TL model is not better than the SC model’ and the alternate hypothesis is ‘TL model is better than the SC model’. After performing the statistical testing using a sign test calculator60, we get the p value < 0.00001, thus rejecting the null hypothesis at α=0.01. This suggests that the difference in test MAE between SC and TL models is unlikely to have arisen by chance, and thus we can infer that in general the proposed TL models perform significantly better than SC models. Additionally, we train ALIGNN on multiple materials properties simultaneously for both the source and target models to examine its performance as compared to training the source and target models with just a single property, as performed in this study. We use the formation energy and bandgap as the materials properties where the source model is trained on the MP dataset, and the target model is trained on the JARVIS 3D dataset. Supplementary Table 11 shows the test MAE of the SC model and proposed TL model when the source and target models are trained on single and multiple materials properties. When training the model on single materials property, we observe that using the corresponding source model as well as formation energy as the source property helps improve the performance of the model. When training the model on multiple materials properties, we observe a decrease in model accuracy for formation energy and negligible difference in accuracy for bandgap. This suggests that training models on multiple materials properties simultaneously for both the source and target datasets is not beneficial for improving the accuracy of the model.

We also observe that out of 115 materials properties analyzed in our work, the SC model performed the best for 11 properties, fine-tuning-based TL model performed the best for 42 properties, and feature extraction-based TL model performed best for 62 properties (Supplementary Figure 1). We observe that in general, fine-tuning-based TL models perform better for larger target datasets, and feature extraction-based TL models perform better for smaller target datasets, which is consistent with a previous study on composition-based cross-property TL36. Additionally, we plot the percent error improvement of the TL model against the SC model as a function of dataset size with a histogram in Supplementary Figures 2 and 3 and observe larger improvement in the model accuracy for smaller datasets as compared to larger datasets. The mean ± standard deviation, 1st quartile, median, 3rd quartile, minimum and maximum percent error improvements are -11.95 ± 20.23, -15.16, -5.48, -2.54, -96.09 and 34.97, respectively. Although we only used formation energy as the source material property to train the feature extractor (source model) and a basic deep neural network to build target models using the extracted features, feature extraction-based TL was found to perform better for more number of materials properties as compared to fine-tuning based TL for small datasets. This shows the powerful ability of the feature extractor to learn relevant, robust, and versatile sets of features that can be leveraged even with relatively simple data mining techniques, thereby providing flexibility and interoperability. We also observe that transfer learning works not only for classical quantities such as Deltae (5.25%) but also for electronic properties such as bandgap (6.19%) equally well. The TL-based improvements are also mostly isotropic, e.g., improvements in Meps (x,y,z) components are similar. While some properties like PMDiEl show substantial improvements, the underlying reasons for this remain unclear. A potential future utility could involve a GNNExplainer-like tool61 for ALIGNN architecture. Hence, the proposed method can help improve the robustness and accuracy of the target model on small datasets by incorporating the rich set of hierarchical features that can be learned using the ever-increasing data and ever-improving data mining techniques. The proposed framework is thus flexible and can leverage state-of-the-art data mining techniques to improve upon the performance and can be applied to other materials properties across various domains and materials classes for which enough source data may not be available. Although transfer learning is not always effective for all kinds of materials properties with varying data sizes, we observe that the benefit of transfer learning is more for materials properties with smaller number of data points, transferring knowledge from periodic (e.g., crystalline) to non-periodic (e.g., molecular) properties, i.e., performing cross materials class transfer learning to increase the accuracy of the target model is possible when using structure-based modeling (albeit with smaller benefits), and there is larger improvement in performance for ‘extrapolation’ than ‘interpolation’ problems. Further, the proposed framework is expected to be easily adaptable to other scientific domains beyond materials science. The presented framework is conceptually easy to implement, understand, use, and build upon. For future work, it would be interesting to explore the effect on the performance of the target model when materials properties other than formation energy are used as the source material property and GNN architecture other than ALIGNN is used for training the source model. Although in the current study, we have used DFT-relaxed structures, which hold origin one way or another in experimental crystal structures, we plan to use such TL models for crystal generative models as well62 where property predictions and pre-screening with TL-performance boosted models will be useful. It would also be interesting to explore the uncertainty associated with the materials property prediction by incorporating neural network components that help perform uncertainty estimation, such as dropout within the network architecture, or by creating an ensemble model using multiple graph neural networks and/or input from multiple checkpoints. One can also explore different sets of features to train the neural network or use more sophisticated neural network architectures for the target model in a bid to boost the performance of the target model for a specific materials property.

Methods

Scratch and transfer learning models

In this work, we implement a scratch (SC) model and two types of transfer learning (TL) models. For SC models, the model training is performed directly on the small target dataset without providing the model with any form of knowledge from source data. We use the graph neural network model, ALIGNN, as the model architecture for the SC model. For TL models, we use a model pretrained on the MP dataset with formation energy as the materials property using ALIGNN as the model architecture. The TL techniques comprise of traditional fine-tuning and a feature extraction method from a graph neural network. Fine-tuning uses the weights from the pre-trained model as the preliminary weight initialization for the network (which is the same architecture as used during source model training) and is further refined using a small dataset. In the feature extraction method, we treat the pre-trained model as the feature extractor and extract atom, bond, and angle-based features from a given layer, each containing a variable number of rows depending on the number of the atom, bond, and angle information present in the input file and 256 columns as features for each row. For example, let us consider a hypothetical compound AaBbCc where a + b + c = x, number of bonds = y and number of angles = z (generally, number of angles > number of bonds > number of atoms) and we extract the features from a checkpoint. Then, the dimensions of the extracted vectors will be (x, 256) for atom-based features, (y, 256) for bond-based features, and (z, 256) for the angle-based features. In order to pre-process them into a form that can be given to the deep learning (DL) model, which takes a one-dimensional vector as input, we take the mean of all features across each column. This creates a (1, 256) vector representation for each of the structure-based features (atom, bond, and angle) for a given compound of the target dataset. The extracted feature from a given layer can then be either concatenated or used separately as an input for any DL model. For example, if we use atom based features from a given layer as the materials representation, each compound will be represented as a 256-dimensional feature vector. Similarly, for atom+bond-based features it will be a 512-dimensional feature vector, and for atom+bond+angle-based features it will be a 768-dimensional feature vector representation. For our analysis, we only use atom+bond+angle (last) as the set of features for the feature extraction-based TL. The ‘Base’ model used in this work always uses the average property value of all the training data provided to it as the predicted property of a test compound as a naive baseline for comparison with SC and TL methods.

Network settings and model architecture

ALIGNN was implemented using Pytorch and a 17-layered neural network (NN-17) was implemented using TensorFlow 2 (with Keras). Detailed configurations for the network architecture is [FC1024-Re x 4]-[FC512-Re x 3]-[FC256-Re x 3]-[FC128-Re x 3]-[FC64-Re x 2]-[FC32-Re]-FC1 where the notation [...] represents a stack of model components comprising a sequence (where FC: fully connected layer, Re: ReLU activation function). The number of layers for the neural network was decided based on the analysis performed in55, where they investigate the performance of deep learning models of different depths in model architecture and show that the error improves with the number of layers up to 17 layers, after which the accuracy stagnated. The hyperparameters used in the ALIGNN comprise of the following: Sigmoid Linear Unit (SiLU) as the base activation function, Adaptive Moment Estimation with decoupled weight decay (AdamW) as the optimizer with normalized weight decay of 10−5, mini-batch size of 64 (32 or 16 where the holdout test set is small or the size of the input files is larger than the available GPU memory), and learning rate as 0.001. We train all ALIGNN models for 300 epochs with a fixed random seed as done in the original work39. The hyperparameters used in the NN-17 comprise of the following: rectified linear activation unit (ReLU) as the base activation function after each layer (except for the last layer), Adaptive Moment Estimation (Adam) as the optimizer, mini-batch size as 64 with a learning rate of 0.0001. We used early stopping with a patience of 200 to stop the model training if the validation loss does not improve for 200 epochs to prevent overfitting. All NN-17 model training used a fixed random seed. Readers interested in in-depth hyperparameter settings for ALIGNN and NN-17 models are referred to those publications22,39,55 for details. We use mean absolute error (MAE) as the loss function as well as the primary evaluation metric for all models. We use DFT-relaxed or experimentally determined structures as input for all the models trained in this study.