Efficient Exploration of Adsorption Space for Separations in Metal–Organic Frameworks Combining the Use of Molecular Simulations, Machine Learning, and Ideal Adsorbed Solution Theory

Adsorption-based separations using metal–organic frameworks (MOFs) are promising candidates for replacing common energy-intensive separation processes. The so-called adsorption space formed by the combination of billions of possible molecules and thousands of reported MOFs is vast. It is very challenging to comprehensively evaluate the performance of MOFs for chemical separation through experiments. Molecular simulations and machine learning (ML) have been widely applied to make predictions for adsorption-based separations. Previous ML approaches to these issues were typically limited to smaller molecules and often had poor accuracy in the dilute limit. To enable exploration of a wider adsorption space, we carefully selected a diverse set of 45 molecules and 335 MOFs and generated single-component isotherms of 15,075 MOF–molecule pairs by grand canonical Monte Carlo. Using this database, we successfully developed accurate (r2 > 0.9) machine learning models predicting adsorption isotherms of diverse molecules in large libraries of MOFs. With this approach, we can efficiently make predictions of large collections of MOFs for arbitrary mixture separations. By combining molecular simulation data and ML predictions with Ideal Adsorbed Solution Theory, we tested the ability of these approaches to make predictions of adsorption selectivity and loading for challenging near-azeotropic mixtures.

In addition to the Supporting Information below, code that implements all models described in the manuscript is available via GitHub at https://github.com/tdytjd/mof_diverse_isotherm_prediction/.Three directories are under the GitHub repository.The 'classification' directory and the 'regression' directory correspond to Section 3. I and 3. II respectively.The 'IAST' directory has the scripts of modified pyIAST code and calculation of binary adsorption properties.Ten supplementary data files are also available in a ZIP file accompanying this publication.The names of the data files and what they represent in this ZIP file are listed below.
• Data S1.csv:Data used for the classification of Set

S1.1 Addition of Intermediate State Points to Isotherms
To obtain more reliable fitting of single-component isotherms, we added intermediate state points to some isotherms.To determine when this step is needed, each state point is assumed to follow a Gaussian distribution, where the mean and standard deviation of the distribution are the loading mean and standard deviation of the RASPA simulation.For each state point, we sampled 10 values by drawing from the distribution 10 times.Each set of sampled loadings was fitted to continuous isotherm functions as described in the main manuscript, and we calculated the selectivities of a near-azeotropic molecule pair from 100 possible combinations of the two sets of 10 isotherms.When the uncertainty of the selectivities is larger than 25%, we calculated the pressure corresponding to 0.35*saturation loading for each isotherm.We then simulated a new state point at the average of these 10 calculated pressures.We double checked the calculated pressures visually and a small number of calculated pressures were updated to more reasonable values manually.We completed this process for all 13 near-azeotropic pairs in 335 MOFs.

S1.3 Calculation of Diameter Descriptors
We used three spherical descriptors for each adsorbates in the ML training calculated using the minimum enclosed ellipsoid method 1 .This method involves finding the smallest ellipsoid that fully encloses the convex hull formed by the 3D coordinates of the atom centers of an adsorbate molecule.The 3D coordinates of cyclic molecules are obtained after DFT optimization at the PBE-D3 level.The 3D coordinates of non-cyclic molecules that are neither linear nor spherical are obtained from their most elongated conformations.In these conformations, all bond angles along the molecule's backbone are set to their force field's equilibrium angles.The enclosing ellipsoid has three principal axes, and the three diameter descriptors are the lengths of these axes.Code to perform this calculation can be found at https://github.com/tdytjd/mof_diverse_isotherm_prediction/.

S1.4 pyIAST Modifications
We used pyIAST to fit isotherms, and we modified the fitting process to give more accurate results in the low-pressure regime.To make quantitative predictions about adsorption selectivity with IAST, the accuracy of isotherm fitting in low-pressure regime is essential 2 .PyIAST calculated the root mean square error (RMSE, equation ( 1)) between the fitted loadings and the actual loadings, and we chose the adsorption model with the lowest RMSE.
When calculating the RMSE, the impact of lower-loading errors on the value is diminished due to their significantly smaller magnitudes.To account for this effect, we used the log10(loading) values for loadings < 0.1 mol/kg when calculating the RSME in our modified algorithm to make their magnitudes comparable to other state points and emphasize the accuracy of fitting in the lowpressure regime.Fig. S3 shows that the modified fitting significantly reduces the relative errors in the low-pressure regime while the distribution of the relative errors of other points is similar to the distribution using the original pyIAST fitting.In Fig. S4, the selectivities were calculated from two sets of fitted isotherms, which were pyIAST fitting and modified fitting.The loadings in the low-pressure regime were fitted accurately using the modified algorithm, which led to a very different selectivity.Modified code can be found at https://github.com/tdytjd/mof_diverse_isotherm_prediction/IAST.

Fig. S3 :
Fig. S3: Comparison of relative errors of all fitting results in (a) low-pressure regime and (b) other state points

Fig. S6 :
Fig. S6: The workflow of our ML approach for predicting binary adsorption for the MOF/molecule pairs

Fig. S8 :
Fig. S8: Distribution of the target values (loadings) for ML predictions before and after scaling the original loadings obtained from GCMC simulations

Fig. S11 :Fig. S13 :
Fig. S11: Parity plots of loadings (left) and selectivities (right) of equimolar methane and 4hexen-2-one separation in 335 MOFs 1 • Data S2.csv:Data used for the classification of Set 2 • Data S3.csv:Data used for the regression of Set 1 • Data S4.csv:Data used for the regression Set 2 • Data S5.csv:Data used for the regression of extra 12 molecules in 6 MOFs • Data S6.
json: Full Isotherm Database • Data S7.xlsx: Binary adsorption properties calculated for the 13 near-azeotropic pairs • Data S8.csv: Binary adsorption properties calculated for the extra 12 molecules in 6 MOFs • Data S9.xlsx: Tabulated data of all figures in the manuscript • Data S10: Example Simulation Input Files

Table S1
Information of the 45 molecules simulatedTableS615 top performing MOFs for propene/propane separation identified by IAST calculations using GCMC-simulated and ML-predicted single-component isotherms, ranked by adsorption selectivity.Materials that appear on both lists are indicated in italics.