The application of machine learning for predicting the methane uptake and working capacity of MOFs

Multiple linear regression analysis, as a part of machine learning, is employed to develop equations for the quick and accurate prediction of the methane uptake and working capacity of metal – organic frameworks (MOFs). Only three crystal characteristics of MOFs (geometric descriptors) are employed for developing the equations: surface area, pore volume and density of the crystal structure. The values of the geometric descriptors can be obtained much more cheaply in terms of time and other resources compared to running calculations of gas sorption or performing experimental work. Within this work sets of equations are provided for the di ﬀ erent cases studied: a series of MOFs with NbO topology, a set of benchmark MOFs with outstanding methane storage and working capacities, and the whole CoRE MOF database (11 000 structures).


Introduction
Methane, as a major component of natural gas, is considered to be an alternative fuel to oil.Natural gas is much cleaner than gasoline or diesel. 1 Methane is a gas at room temperature and pressure and has a very low energy density.To overcome this problem, there are two forms of stored natural gas in automobiles -liquied for trucks and compressed for personal cars.MOFs have been studied for years as promising adsorbents for automobile applications.1][22][23] The US Department of Energy (DoE) has established targets of 50 wt% and 263 cm (STP) 3 cm À3 for methane storage methods suitable for such employment.On the one hand, recent results show that it is almost impossible to simultaneously get both the gravimetric and volumetric methane working capacities at room temperature. 24On the other hand, there have been some recent improvements showing the great potential of MOFs for effective methane storage. 25erformance are identied and generated.A generative adversarial articial neural network has been created to produce 121 crystalline porous materials, employing inputs in the form of energy and material dimensions. 37Finally, there has been a nice overview of ML algorithms for the chemical sciences. 38xperimental and/or computational work needs to be done to reveal the desired property of a structure.In case of identifying the sorption properties of experimentally obtained MOFs, computer simulations need to be done or specic equipment should be employed.This is acceptable for studying several crystals, but in the case of revealing the properties of a big family of structures conventional approaches are too costly in terms of time, money, workforce, etc. ML can help a lot for revealing the properties of MOFs, saving both experimental and computational efforts.In spite of the wide employment of ML techniques, they have been used very rarely for developing equations describing the sorption properties (including the working capacity) of MOFs.
Linear regression is a supervised ML algorithm.Simple linear regression employs the slope-intercept form, where x is the input data (independent variable), f(x) is the prediction (dependent variable), k is the slope coefficient for the x variable and b is the y intercept, which are adjusted via learning to give the accurate prediction: f(x) ¼ kx + b.Multiple linear regression is the most popular form of linear regression.Multiple linear regression is employed to show the relationship between one dependent variable and two or more independent variables: f(x, y, z) ¼ ix + jy + kz + b, where x, y and z are the independent variables, f(x, y, z) are the dependent variables and i, j, k and b are the adjustable parameters.
The main goal of this work is to show that multiple linear regression analysis is an outstanding tool for revealing the structure-property relationships of MOFs.More importantly, by employing multiple linear regression analysis analytical equations can be developed, showing that the methane total and working capacity values at different thermodynamic conditions can be calculated from three variables based on the crystal characteristics of MOFs (geometric descriptors): surface area, pore volume and density.The values of the descriptors can be obtained routinely and very quickly in comparison to GCMC simulations or experimental work by well known and highly efficient simulation packages such as Poreblazer, 39,40 Zeo++, 41 etc.
Therefore, if an experimentalist or theoretician has a le with a crystal structure, or a bunch of les, she/he can easily obtain the methane total and working capacities by simply employing the equations.The performance of the model designed can be measured by several characteristics: the mean absolute error (MAE), mean square error (MSE), root-mean-square error (RMSE) and the coefficient of determination,R 2 , as described below, where x i is obtained from experiments or GCMC simulations, y i is the value predicted by multiple linear regression and y is the average of the predicted values.
It should be noted that a higher value of R 2 and lower values of MAE, MSE and RMSE show the better accuracy of the ML model used.R 2 is in the range beween 0 and 1, where 1 shows that the prediction is performed without any error from the set of geometrical descriptors and 0 means that the prediction cannot be performed by any of the geometrical descriptors.

Application of multiple linear regression analysis for developing equations from a set of MOFs with one topology (NbO)
The application of multiple linear regression analysis has shown very exciting results for developing equations of the methane total uptake and working capacity of a family of MOFs with the same topology.Multiple regression analysis has been applied to reveal structure-property relationships, employing data obtained for MOFs with NbO topology studied by different groups, pioneered by Chen et al., 42 and then Schröder et al. [43][44][45] and Bai et al. 4 The following parameters (descriptors) of the crystal structures of MOFs of different sizes are used to develop the equations for predicting the gravimetric total uptake and working capacity of methane sorption obtained at a pressure range of 65-5 bar at a temperature of 298 K: surface area (Sa), density of a crystal (Dc) and pore volume (PV).The data of the crystal structure parameters, as well as the values of the total uptake (at a pressure of 65 bar) and working capacity (at a pressure range of 65-5 bar) at a temperature of 298 K are summarized in Table 1.
The equations developed are shown below: The results obtained with multiple linear regression analysis show that for the family of MOFs with the same topology (NbO) R 2 ¼ 0.931 for the total uptake and R 2 ¼ 0.913 for the working capacity.Delightfully, the MAE is very small: 7.59 cm 3 g À1 and 9.33 cm 3 g À1 for the total uptake and working capacity, respectively.The RMSE shows moderate values: 10.56 cm 3 g À1 and 12.31 cm 3 g À1 for the total uptake and working capacity, respectively.
The main conclusion from this part is that from using only the crystal structure parameters of a series of MOFs with NbO topology anyone can calculate the methane working capacity and total uptake, using the equations developed, very easily and extremely fast with a high precision.This is extremely useful for discovering new structures and screening MOFs with the same topology.For example, a user can draw and optimize a MOF in Material Studio (or employing other programs), reveal the values of the geometrical descriptors using the simulation packages Poreblazer or Zeo++, or tools in Material Studio, then use the equations to get the accurate values of the methane total uptake and working capacity.Of course, the same approach can be expanded for other MOFs with other topologies.Once equations are developed, there is no need to run simulations and/or perform experimental work.

Application of multiple linear regression analysis for developing equations from the set of benchmark MOFs
Another case to have a look at is the set of benchmark MOFs, obtained by different groups around the globe: NiMOF-74, UTSA-20, MOF-505, PCN-14, HKUST-1, NOTT-109, NU-135, UTSA-80, NOTT-101, UTSA-76, NOTT-103, NOTT-102, NOTT-122a/NU-125, NU-800, ZJU-36, NU-140, NU-111 and Al-soc-MOF-1.These frameworks have outstanding methane total uptakes and working capacities, and this data is considered for applying multiple linear regression analysis for developing equations.It should be noted that there has been a nice attempt to develop an empirical equation for the prediction of methane storage capacity at 65 bar and 270 K for the set of benchmark MOFs. 51The average deviation was found to be below 4%, which shows good applicability.The empirical equation can be used for predicting the methane storage of MOFs, employing the following parameters: density of the crystal and pore volume.The set of benchmark MOFs is always of interest to compare with new MOFs obtained experimentally or theoretically. 24,52,53he following geometrical descriptors are employed to develop the equations for predicting the gravimetric total uptake and working capacity: Sa, Dc, Pv.The data of the crystal structure parameters, as well as the values of the total uptake (at a pressure of 65 bar) and working capacity at temperatures of 240 K, 270 K and 298 K (at a pressure range of 65-5 bar), are shown in Table 2.
In contrast to the previous case considered, several equations are developed for the set of benchmark MOFs, which have different topologies, metals in nodes, etc., therefore the values of R 2 , MAE and RMSE are expected to be more moderate.The equations are developed for the quick and accurate prediction of the methane uptake and working capacity employing only three crystal characteristics of MOF (descriptors): surface area, pore volume and density of the MOFs.The equations are shown below: At 298 K The equations developed in this section will be an opportunity to estimate the methane total uptake and working capacity of newly designed MOFs.The values of R 2 , MAE and RMSE show the robustness of the models obtained.The coefficients of determination for the working capacity are: R 2 ¼ 0.979 at T ¼ 298 K, R 2 ¼ 0.987 at T ¼ 273 K and R 2 ¼ 0.990 at T ¼ 240 K.The coefficients of determination for the total uptake are: R 2 ¼ 0.965 at T ¼ 298 K, R 2 ¼ 0.980 at T ¼ 273 K and R 2 ¼ 0.984 at T ¼ 240 K.An interesting trend is observed: the lower the temperature, the higher the R 2 .

Application of multiple linear regression for developing equations from the CoRE MOF database
Multiple linear regression analysis has been employed to develop equations to describe methane working capacities at 35-5.8 bar considering the database obtained from high-throughput GCMC calculations 61 of the CoRE MOF database (11 000 MOFs).The details of the methane sorption simulation via GCMC simulations are presented in that work.The CoRE MOF database considered contains a wide range of different kinds of MOFs with different topologies, forms of linkers, metals in nodes, etc., and it is obvious that the values of R 2 , MAE and RMSE will be a bit smaller than in case of studying MOFs with one topology.The equation developed based on only three geometrical descriptorssurface area, pore volume and density of the MOFsis shown below: Interestingly, no increase in the accuracy of the model is observed by employing the equation with six descriptors: the R 2 values are the same and the MAE, MSE and RMSE are almost the same.In the case of the equations with three descriptors, these characteristics are even a little bit better.

Conclusions
This work aimed to show that multiple linear regression analysis is a fast and highly efficient approach for revealing the methane total uptake and working capacity of MOFs.Only three variablesgeometric descriptors obtained from the crystal structure information: surface area, pore volume and density of the MOFs need to be employed to develop the equations for the methane total uptake and working capacity values.The analytical equations obtained can predict the methane total and working capacity values with high accuracy employing only these three descriptors.The values of the descriptors can be obtained much faster than the actual GCMC simulations or experimental work for revealing sorption isotherms, employing simulation packages such as Poreblazer or Zeo++, or tools in Material Studio.
A set of equations is developed for predicting the methane total uptake and working capacity for MOFs with the same topology (NbO, in the case studied).The model exhibits very high accuracy.Several equations are developed for the set of benchmark MOFs, which have different topologies, metals in nodes, etc.The values of R 2 , MAE and RMSE show the robustness of the models obtained, for example for the working capacity: R 2 ¼ 0.979 at 298 K, R 2 ¼ 0.987 at 273 K and R 2 ¼ 0.990 at 240 K.The GCMC results from the CoRE MOF database are considered for developing equations for predicting the methane working capacity which take into account only three parameters.The R 2 ¼ 0.899, MAE ¼ 9.23 cm 3 g À1 and RMSE ¼ 12.60 cm 3 g À1 .The further enhancement of the model by employing more descriptors does not lead to increase in the accuracy of the model.This is very convenient for both experimentalists or theoreticians to easily obtain the methane total and working capacities via employing equations and just having a le of a crystal structure(s) and the values of the three descriptors.