Machine learning-assisted crystal engineering of a zeolite

It is shown that Machine Learning (ML) algorithms can usefully capture the effect of crystallization composition and conditions (inputs) on key microstructural characteristics (outputs) of faujasite type zeolites (structure types FAU, EMT, and their intergrowths), which are widely used zeolite catalysts and adsorbents. The utility of ML (in particular, Geometric Harmonics) toward learning input-output relationships of interest is demonstrated, and a comparison with Neural Networks and Gaussian Process Regression, as alternative approaches, is provided. Through ML, synthesis conditions were identified to enhance the Si/Al ratio of high purity FAU zeolite to the hitherto highest level (i.e., Si/Al = 3.5) achieved via direct (not seeded), and organic structure-directing-agent-free synthesis from sodium aluminosilicate sols. The analysis of the ML algorithms’ results offers the insight that reduced Na2O content is key to formulating FAU materials with high Si/Al ratio. An acid catalyst prepared by partial ion exchange of the high-Si/Al-ratio FAU (Si/Al = 3.5) exhibits improved proton reactivity (as well as specific activity, per unit mass of catalyst) in propane cracking and dehydrogenation compared to the catalyst prepared from the previously reported highest Si/Al ratio (Si/Al = 2.8).

These microstructural characteristics are the output of a batch crystallization process, whose inputs include the chemical composition of the mixture, the chemicals and sequence of steps used to prepare this mixture, the temperature and time of crystallization, and the extent of mixing during crystallization (e.g., static or rotating autoclaves) 21,22 . Additional variations further expand the range of synthesis inputs that can affect the crystallization output. For example, mid-synthesis changes in composition and temperature during crystallization can have a significant effect on crystal size and framework type 11,13 . Crystallization mixtures used for zeolite synthesis contain species varying from small ions to colloidal particles and gels, the interconversions and interactions of which cannot be predicted quantitatively 23 . Therefore, the ability to determine the effect of crystallization inputs on the microstructural outcome (output) is very limited, and microstructural optimization requires a large number of experiments exploring all possible input combinations 24,25 . Here, it is demonstrated that Machine Learning algorithms can be used to quantitatively capture the effect of crystallization inputs on key microstructural characteristics (outputs) of faujasite, which is widely used as a catalyst in fluid catalytic cracking and as an adsorbent for oxygen/nitrogen separation 21,26,27 . Comprehensive combinations of crystal morphologies, composition and phase purity are reported, and improved catalytic properties are demonstrated.

Results
The focus is on the synthesis of the zeolite faujasite, and we aim to prepare faujasite crystals with a combination of characteristics (outputs): Si/Al ratio, crystal size, particle size, FAU/EMT ratio, microporosity. Figure 1 summarizes experiments performed initially to outline the region in composition space (details are provided in Supplementary Fig. 1), which results in pure faujasite (i.e., FAU, EMT or FAU/EMT intergrowths). The initial selection of the synthesis region in Fig. 1 is based on our prior works 13,28 and prior work by Rimer et al. 29 , which empirically explored and broadened the boundaries of faujasite synthesis conditions. Within this region, we performed 174 synthesis experiments. From these, 86 experiments (indicated by A1-A86 in Fig. 1, Supplementary Fig. 1, and Supplementary Table 2) did not produce pure FAU or FAU/EMT, and these entries are excluded from further analysis. The remaining 88 experiments (indicated by 1-88 in Fig. 1, Supplementary Fig. 1, and Supplementary Table 1) were used for training (81 entries) and testing (7 entries) of the ML algorithm (the latter suggests 4 more entries as prediction points), except when analyzing crystal size, where we have excluded crystal sizes larger than 60 nm and used 46 experiments (42 entries for training, and 4 entries for testing).
Our synthesis involves 5 parameters representing the crystallization mixture composition (x, y, z, m, n) x SiO 2 : y Al 2 O 3 : z Na 2 O: m H 2 O initial (n H 2 O final ). The initial and final water contents indicate the water present during the aging and crystallization steps, respectively. In some synthesis experiments they are the same, i.e., there is no adjustment of the water content, while in others the water content is reduced by freeze drying to set the ratio of H 2 O final / H 2 O initial equal to  In total, we have 9 independent parameters (inputs) that describe the synthesis (processing) conditions. For the 88 experiments that gave FAU or FAU/EMT with no other phases, we determined 5 microstructural characteristics (structure): the Si/Al ratio by ICP, the particle size by TEM and/or SEM images, the crystal size from XRD peak broadening (in our analysis, we only considered crystal sizes smaller than 60 nm), the degree of intergrowth represented as FAU/(FAU + EMT) (determined by analysis of XRD data), and the Ar adsorption at p/p 0 = 0.01 as an indication of the microporosity. These five quantities/ microstructural characteristics represent the outputs of the crystallization process; we have also considered as a separate, sixth output, the ratio of particle over crystal size (for crystal sizes smaller than 60 nm), as a measure of the level of aggregation. The characterization results for the 88 experiments are presented in Section S2 (Tables S3-S25 and Supplementary Figs. S2-S24).
In many branches of materials science, both experimental and computational (including metal additive manufacturing 30 , polymer science 31 , and drug design 32 and delivery 33 ), there have been extensive studies of structure-property relations with the help of ML. An important ingredient of these is the knowledge-whether from first principles, experience, or intuition-of the appropriate structural features that correlate the properties of interest. ML holds the promise of turning such correlations from an informed art to a reliable, datadriven, computer assisted process; learning such correlations can then lead to the educated design and optimization of developed materials [33][34][35] . The processing/fabrication of materials with desired structure is an equally (if not more challenging) problem; data science and ML have the potential to be transformative in deriving processingstructure relations, leading to breakthroughs in ultimately establishing the ideal "processing-structure-property" pathway to materials design [30][31][32][33][34][36][37][38][39] .
ML algorithms have proven useful for predicting both quantitative and discrete characteristics of various zeolitic materials. Carr et al. 40 constructed a classifier based on the topology of zeolites into different mineral types and framework types 41,42 . Coudert et al. 43,44 used ML algorithms on data from DFT computations to construct a link between structural properties and mechanical properties. Moliner et al. 45 discussed the potential of ML in zeolites synthesis (a) in the construction of high throughput platforms, (b) in the prediction of stable structures for zeolites and guidance of the zeolite synthesis involved with different structures, and (c) in automated data extraction. Gurney et al. 46 presented different ML tools that can be key elements for an ML-based design and discovery of zeolites and other crystalline materials. Ducamp et al. 38 used DFT data to construct a structure-property relation between features of the geometry, topology, and porosity of the zeolitic materials, and their thermal properties. Jensen et al. 47 built a text mining pipeline for extracting zeolite synthesis data from a database of~70,000 relevant journal articles. They further constructed, through ML, an input-output relationship between synthesis conditions and framework density of zeolitic materials.
Here, we study the synthesis (ingredients, composition, processing conditions, operating protocols) leading to the fabrication of faujasite zeolite; and explore the capabilities and avenues that ML opens toward the optimization of desired microstructural characteristics like the framework Si/Al ratio. Selecting appropriate synthesis conditions leading to a particular set of microstructural characteristics is challenging, since the known crystallization mechanism is not adequate to derive predictive models 45 . ML algorithms can be used to construct experimentally-informed candidate input-output (processing/structure) relationships from data in the absence of closed-form (physics-driven) expressions. In our case, given processing/structure information for zeolite fabrication, we aim to construct a function that maps synthesis conditions to final structure. Positing such a model allows us to estimate, predict, and even optimize structure of a zeolite material given unexplored synthesis conditions, thus guiding further experimentation.
Learning a candidate mapping between inputs and outputs can be attempted through several, in principle comparable, ML approaches, including Neural Networks (NN), Gaussian Process Regression (GPR), and Geometric Harmonics (GH). This paper focuses primarily on predicting via GH, but we also provide comparisons with the other two methods in order to illustrate the qualitative similarity of corresponding results. All methods use input and corresponding output data from a (posited) function of interest to construct a surrogate model (an approximation) of the true function. To the best of our knowledge, Diffusion Maps/Geometric Harmonics have not been previously used in this context. We provide a brief description of each method below and additional details in the SI (Section S2, Supplementary Figs. S25-S27).
Geometric Harmonics (GH) uses the input-output data to numerically construct a hierarchical set of data-driven basis functions (to be exact, basis vectors, that constitute discretized versions of basis functions) in the space of inputs. Any function of the inputs (e.g., a structural characteristic of the resulting material) can be approximated as a linear combination of the leading (data-driven) basis functions, in the same spirit as a function of space can be approximated by a truncated sum of its Fourier components 48,49 .
Similarly, a Neural Network (NN) can construct a surrogate function f between inputs (synthesis conditions), x, and outputs (structural characteristics), y = f x; θ ð Þ, by adapting the values of the parameters θ to achieve the best function approximation 50 . The selection of the parameters is achieved via an optimization stage that is called training. During training, the network computes derivatives with respect to its parameters θ in order to minimize a loss function across the training data 46,50,51 .
Gaussian Process Regression (GPR) models an output function f of the inputs as a collection of jointly normal random variables that describe one's knowledge about f ðxÞ at each point x in the function's domain 52 . After the user specifies (via a kernel function) how these random variables are correlated with each other, conditional probability allows one to predict the function value y 0 = f x 0 ð Þ at another input point x 0 . Such a result is expressed as a Gaussian distribution, from which the mean may serve as the prediction and the variance as a measure of uncertainty in the estimate 53 .
We characterize the five outputs of each experiment as functions of nine input quantities. Because six of the inputs are numerical and three of the inputs (oven type, silica source, and alumina source) are categorical, the input for each experiment is represented as a vector in 12-dimensional space (see Section S4 for explanation of the number of dimensions). We developed forty-four GH (nine for Si/Al ratio, etc., see Section S4.2), two NN, and five GPR models, which are described in detail in Section S4 of the SI. Comparisons of experimental and predicted microstructural characteristics (outputs) from one of these models (i.e., GH with rescaled inputs and 10-fold cross validation: "rescaled 10-CV") are shown in Fig. 2 (Details of entry points are provided in Supplementary Fig. 28). For crystal size, we only include experiments and predictions for sizes smaller than 60 nm to ensure that instrumental broadening does not affect the experimental measurements. In addition to the five outputs discussed earlier, we also consider the particle to crystal size ratio as a measure for differentiating single from aggregated/intergrown crystals. Figure 2 shows that ML can approximate output functions of interest (entry numbers for points in Fig. 2(a-f) are labelled in Supplementary Fig. 28). We use a set of 81 experiments as training points (represented as blue points in Fig. 2(a-f)) and 7 experiments as testing points (represented as red points in Fig. 2(a-f)). In addition, the green points are for unseen experiments, which will be discussed later. The blue, red, green color scheme, indicating training, testing, and prediction, respectively, is used in Figs All our GH, GPR, and NN models were able to learn a surrogate model from the training data, choosing hyperparameter values based on cross-validation, log-likelihood maximization, and ADAM minimization of training mean squared error, respectively. These models also performed well on the test set, which consisted of experiments that had already been performed but were set aside for this purpose. Predictions from the different algorithms are provided in the Supporting Information . In order to compare the reliability of these models, we applied error analysis and calculated the R 2 and MSE values for each model and each output from the training (blue entries) and test (red entries) sets as listed in Supplementary Table 26. Error analysis results (Supplementary Table 26) show that if we select MSE training as a performance metric "rescaled 10-CV" is best for Si/Al (second smallest value for MSE training and smallest value for MSE testing) followed by "rescaled LOOCV (w/o categorical)" (ranking third for MSE training and second for MSE test).
By comparison, "rescaled 10-CV (w/o categorical)" has the smallest value for MSE training but ranks fourth for MSE test. Therefore, the algorithm "rescaled 10-CV" is the most reliable one among different CV schemes, and correlations developed by this algorithm are provided in Fig. 2. Correlations for the algorithm "rescaled LOOCV (w/o categorical)" are provided in Supplementary Fig. 36.
An example of training synthesis is Entry 3 with a molar composition of 10 SiO 2 : 1 Al 2 O 3 : 8 Na 2 O: 400 H 2 O, which is heated at 100°C for 18 h in a static autoclave. It leads to the synthesis of large pure FAU crystals with Si/Al ratio of 1.6. In our prior work, we used this FAU material to investigate the accessibility and reactivity of protons located within sodalite cages of the FAU framework that become accessible during ion exchange 54 .
An example of a test synthesis is Entry 81, with an initial molar composition of 12 SiO 2 : 1 Al 2 O 3 : 10 Na 2 O: 180 H 2 O, which after aging at 25°C for 24 h was subjected to water reduction by freeze drying to 80 H 2 O and was then heated at 50°C for 4 days in a rotating autoclave (6 rpm). The trained model predicts the output of this test synthesis that leads to non-aggregated high FAU content nanocrystals with Si/Al ratio of 1.3. Such low Si/Al ratio nanocrystals could be useful for the fabrication of FAU membranes.
ML was then used to suggest inputs aiming at achieving the desirable output of FAU with Si/Al ratio higher than 3. Exceeding a Si/Al of 3, by direct synthesis (i.e., without dealumination treatments) 21 in the absence of organic-structure-directing agents (OSDA) in sodium aluminosilicate sols/gels remains elusive despite many decades of effort in developing FAU synthesis methods, e.g., OSDA structure design (bottom-up) 22,26 , post-synthesis dealumination (top-down) 21,26 , etc. It is a highly desirable outcome, as it may lead to robust catalytic properties 29 , improved stability 26 , and lower manufacturing cost 11 . We test the ability of the best algorithm to predict properties for unseen  (100)), (e) Uptake values at P/P 0 = 0.01 via Aradsorption isotherms, (f) log 10 (particle size to crystal size ratio), lower value means lower aggregation of FAU particles. Blue dots represent training points for model identification, red dots represent testing points for the identified model, and green dots represent prediction points toward obtaining high-silica FAU zeolites. Entry numbers for dots in (a-f) are labelled in Supplementary Fig. 28. Fewer points were involved in (c) and (f), since we only consider entries with crystal sizes smaller than 60 nm for the crystal size analysis.
synthesis conditions by using it as a surrogate model for optimization. Since our input space is 12-dimensional, we use plots that vary two quantities at a time while keeping others fixed, which produces hyperplane "slices" of the complete model in the synthesis conditions space as seen in Fig. 3, which provides projective views of the GH predictions for Si/Al ratio, as a function of different pairs of process parameters (input variables). Entry 20 was reported from the prior work by Rimer et al. 29 , and this entry holds the reported hitherto highest Si/Al ratio to prepare high-silica faujasite zeolites via templatefree routes. The gradients were computed at entry 20 in our training set (Supplementary Table 1) as the base point. Supplementary Fig. 39 includes two similar plots of predictions for a model (Supplementary Fig. 39(a)(b)) that instead uses a Matérn(0.5) kernel (see method description in Section S4) and counterparts developed by the algorithm LOOCV ( Supplementary Fig. 39(c)(d)). Therefore, the predicted contours and gradients developed from multiple ML models suggest decreasing Na 2 O (or reducing pH) and increasing the crystallization time to achieve Si/Al > 3. An inspection of input/output correlations from just plotting the raw data , which demonstrate the dependences among these variables and the complexity of the zeolite synthesis itself, also indicates that low Na 2 O increases the Si/Al ratio (as shown in entries of Supplementary  Table 2 with Na 2 O/Al 2 O 3 ratio of 3.5, have been, however, excluded from the training set because they yield amorphous or impure (e.g., mixtures of FAU + LTA) products. In particular, entries A22 and A20, which were performed at 100°C (i.e., same temperature as entries 4-8 discussed above) for 3 and 7 days yield products that are either amorphous or amorphous mixed with some FAU, respectively. A potential explanation for this observation is that as Na 2 O is being reduced, the associated pH reduction slows down the crystallization kinetics. Therefore, a possible path to FAU with Si/Al > 3 would be to increase the time and/or temperature of crystallization. Since increasing the temperature to 120°C (entries A16-A19 and A21) also yields FAU with amorphous or LTA impurities, we decided to explore longer crystallization times at 100°C. Entries 89-92, with crystallization times 9-13 days, yield pure FAU with Si/Al larger than 3.
The best performing algorithm (based on MSE training in Supplementary Table 26) "rescaled 10-CV" predicts this outcome (see Fig. 2(a) and Supplementary Fig. 28(a)). From the remaining models, the second best performer "rescaled LOOCV (w/o categorical)" also makes good predictions. The rest, except for two models (normalized 10-CV and rescaled 5-CV), fail to predict Si/Al > 3. The "rescaled 10-CV" model successfully predicts additional outcomes of the synthesis for entries 89-92. Particle Size, FAU/(FAU + EMT) ratio, and Uptake Values are shown in Fig. 2 and Supplementary Fig. 28. We note that since the crystal sizes for entries 89-92, as determined from XRD peak broadening, are larger than 60 nm, they are not included in the plots of Supplementary Fig. 28(c) and 28(f). Once Si/Al ratio exceeds 2, FAU fraction increases to near unity, and the high Si/Al materials are pure FAU products (Fig. 4(b)). Similarly, particle size consistently increases with increasing Si/Al (Fig. 4(c)).
Despite the small number of training and testing data, it can be concluded that the selected best performing Geometric Harmonics algorithm ("rescaled 10-CV") is successful in predicting the outcome of unseen experiments with different combinations of properties (outputs). These predictions can also steer experimental conditions (inputs) to achieve desirable outcomes. On the contrary, Neural Networks and Gaussian Process Regression were not successful in providing good predictions.
The dominant role of Na 2 O is evident by the input/output correlation of Fig. 4(a). It becomes also evident in SHAP (Shapley value based) analyses [55][56][57] (Fig. 4(d) and (e), Supplementary Fig. 44). The Shapely values measure the average contribution of each feature's (variable's) value to the prediction and thus provide a sense of how the change of a variable might affect the output 56,57 . We applied the modelagnostic exact explainer algorithm 57 on the model Si/Al trained with LOOCV, rescaling as preprocessing and without categorical inputs (namely the "rescaled LOOCV (w/o) categorical" model). The selection of this model was made based on its performance (second best based on its MSE metrics) and on the fact that, by excluding the categorical inputs, all of its inputs are continuous. The SHAP analysis on the Si/Al model trained with "rescaled 10-CV" is provided in SI ( Supplementary  Fig. 45). We generate the summary plot ( Fig. 4(d)) for all the training points to get a sense of the importance of contribution of each variable (synthesis condition). In the Fig. 4(d), the x-axis is the Shapely value that indicates the contribution of a particular feature to the output. The y-axis reports the variables (synthesis conditions), and the color corresponds to the magnitude of the value for each variable if it is large or small. The variables are sorted in descending order based on their contribution. SHAP suggests that Na 2 O contributes the most to the output (Si/Al) and that deceasing relative Na 2 O amount (Na 2 O/Al 2 O 3 ) will contribute positively to the output (increase the Si/Al). The SHAP analysis for the model "rescaled 10-CV" provides a similar conclusion regarding the role of Na 2 O/Al 2 O 3 , being most prominent and For the four prediction points after optimization (89, 90, 91, and 92) the Shapely values predicted separately. The waterfall plot for the entry 91 is shown in Fig. 4(e) and for the remaining prediction points we included their waterfall plots in Supplementary Fig. 44. In the waterfall plot, the x-axis corresponds to the normalized values of the output variable (Si/Al ratio) f ðxÞ. To obtain the true f True x ð Þ values for the output, denormalization is needed: f True x ð Þ= f x ð Þσ Si=Al + μ Si=Al , where σ Si=Al corresponds to the standard deviation of the training set, σ Si=Al = 0:67, and μ Si=Al corresponds to the mean of the training set, μ Si=Al = 1:91. The Shapely value of each feature is given by the length of the bar. If the contribution is positive is colored red and if is negative is colored blue. The (absolute) Shapely values shows how much a single variable affects the prediction. In Fig. 4(e) it appears that the change in Na 2 O and Crystallization Temperature positively affects (increases) the output prediction of the model.
Next, we compared this as-synthesized faujasite (Na-FAU3.5) with the previously reported highest Si/Al-ratio faujasite made by direct synthesis (Na-FAU2.8 with a Si/Al ratio of 2.8 prepared from the composition of 12 SiO 2 : 1 Al 2 O 3 : 4 Na 2 O: 160 H 2 O) 29 . XRD patterns (Fig. 5(a)) show that these two faujasite materials are pure FAU without EMT intergrowths, or other impurities. According to Ar-adsorption isotherms ( Fig. 5(b)), although Na-FAU3.5 has a lower uptake value at P/P 0 = 0.01 than Na-FAU2.8, the corresponding ion exchanged forms, H-FAU3.5 and H-FAU2.8 exhibit similar isotherms (ion exchange for both was performed using 1 M of NH 4 NO 3 solution for 1 h, and 0.25 g zeolite powder per 40 cm 3 of ammonium solution). As shown in SEM images (Fig. 5(c)(d)), Na-FAU3.5 exhibited a larger particle size than Na-FAU2.8. Solid-state 27 Al-NMR (Fig. 5(e)) proved that both Na-FAU materials did not contain octahedral Al species (typically observed at a chemical shift δ of 0 ppm) 29 prior to ion exchange, reflecting integrity of the FAU framework and the absence of extra-framework Al. Extra-framework Al species were formed only after ion exchange 16,54 . The framework Si/Al ratios (Table 1, columns 4&5, Supplementary Table 27) could be estimated from the 29 Si-NMR spectra (Fig. 5(f)) based on "Loewenstein's rule" (Eq. 1) 58 , which stipulates the absence of Al-O-Al linkages in the zeolite framework.
These framework Si/Al ratios (  Fig. 46) show that pyridine molecules only titrate protons (at~3640 cm −1 ) 54,59 located within supercages over these two H-FAU materials. Protons with sodalite cages are able to be fully titrated only when the framework collapses partially (e.g., steam treatment of FAU materials to prepare ultra-stable Y) 60 . Both H-FAU3.5 and H-FAU2.8 zeolites still sustain bulk framework stability as evidenced by Infrared spectra recorded upon adsorption of pyridine ( Supplementary Fig. 46) and Ar-adsorption isotherms (Fig. 5(b)) after ion exchange with 1 M of NH 4 NO 3 solution. Our prior work compared the reactivities and selectivities for protolytic reactions of propane between protons within opened sodalite cages and protons within supercages over high-silica faujasite zeolites 54 . We reported in this prior work that sodalite cages could be fully opened when 0.6 M of NH 4 NO 3 solution was used to perform ion exchange by virtue of infrared spectra of H-D exchange with deuterated propane 54 . Thus, upon ion exchange using 1 M of NH 4 NO 3 solution, protons within both sodalite cages and supercages could be titrated by propane. Infrared spectra after dehydration at 603 K (Fig. 5(g)) reflect that H-FAU3.5 and and Si/Al ratio of product via ICP (output). b Correlation between Si/Al ratio via ICP (output) and FAU fraction (output), the latter refers to FAU/(FAU + EMT). c Correlation between FAU fraction (output) and particle size (output). In (a-c), blue/red/green dots represent training/testing/prediction points, respectively.  (Table 1, column 8). We also observed in our prior work that unlike lowsilica FAU zeolites, which contain protons on site II and site III within supercages, high-silica FAU zeolites only contain protons on site II within supercages 61 . Thus, H-FAU3.5 and H-FAU2.8 zeolites possess similar H SOD /H SUP ratios and the same atomic configurations (i.e., protons on site II within supercages, and protons on site I′ within sodalite cages).
Having now established the similarities of the two materials in terms of phase purity, porosity, particle size, extra-framework Al, acid site density and location, we proceed to compare their catalytic performance. We compared reactivities of protons on H-FAU3.5 and H-FAU2.8 zeolites by using molecular dehydrogenation and cracking of propane (Eqs. 2 and 3) as probe reactions.
Gounder et al. 62 reported that alkane dehydrogenation can be promoted over extrinsic active sites of carbonaceous deposits formed during reaction, and the removal of remnant reactive carbon species should be taken into consideration to precisely assess intrinsic H + -catalyzed propylene formation rates. Sample pretreatment in H 2 , and H 2 co-feed in the inlet stream were thus incorporated in the experimental protocol to mitigate on-stream deposition of reactive carbon species. Two H-FAU samples were pretreated using H 2 /He mixtures (p H2 = 35 kPa, and H 2 /He = 1:2) and co-fed H 2 (H 2 /C 3 H 8 /Ar/He = 3/3/1.5/60, p H2 = 5.3 kPa,) in the inlet stream at different temperatures (818, 833, 848, 863, 878, and 893 K). Once protons within sodalite cages are rendered accessible by partial framework collapse upon ion exchange at NH 4 NO 3 concentrations exceeding 0.6 M, then propane dehydrogenation and cracking occurs both over H + sites in the sodalite cage and in the supercage as we reported previously 54 . Since these two H-FAU samples share similar proton density ratios (H SOD /H SUP in Table 1, column 8), we directly normalized rates per overall H + site. By comparison, H-FAU3.5 exhibits higher propane dehydrogenation and cracking rate constants per overall H + site than H-FAU2.8 (Fig. 6). Despite the lower proton density (Table 1,  column 7), H-FAU3.5 also exhibits higher rate constants per gram (Supplementary Fig. 47). We surmise that these two samples were partially dealuminated at the harsh ion exchange conditions (1 M NH 4 NO 3 ) used, which could be proved by 27 Al-NMR spectra (Fig. 5(e)). Lercher et al. 63 have examined solid state reactions that occur in partially-dealuminated zeolites to note IR, NMR, and EXAFS spectroscopic signatures of extra-framework Al in close proximity to Brønsted acid sites in H-ZSM-5 materials. Active centers with adjacent Brønsted acid sites and partially dislodged e Solid-state 27 Al-NMR spectra for different FAU zeolites. Asterisks denote spinning side bands 29 . f Solid-state 29 Si-NMR spectra for different FAU zeolites reflecting Q 4 silicon species coordinated with n OAl bonds (where n = 0, 1, 2, and 3). g Infrared spectra after dehydration at 603 K over H-FAU3.5 and H-FAU2.8 zeolites. Black dash lines centered at~3640 and~3550 cm −1 are ascribed to protons located within supercages (SUP) and sodalite cages (SOD), respectively. Black solid lines refer to cumulative lines from deconvolution. framework Al species showed higher rates for H 2 /D 2 exchange and protolytic butane cracking and it was noted that accessible pore space in the zeolite was adjusted to accommodate alkane cracking transition states better leading to higher entropies of activation. The higher reactivity of H-FAU3.5 for protolytic alkane activation relative to H-FAU2.8 likely arises from similar tunability of pore size and space upon ion exchange.

Discussion
Herein, we reported that a ML-based model, created using an inhouse set of synthesis data, directed us to explore synthesis routes to enhance the Si/Al ratio of FAU zeolites via OSDA-free direct synthesis. Based on 81 training synthesis inputs and outcomes, the ML algorithm was validated with 7 testing points, and suggested synthesis conditions that elevated Si/Al to hitherto highest level (i.e., Si/Al = 3.5). Compared to a previously reported high-silica FAU zeolite (H-FAU2.8, entry 24) made by direct synthesis, H-FAU3.5 zeolite exhibits 2.5-and 2-fold increments in propane dehydrogenation and cracking rate constants per H + site, respectively, demonstrating the potential of ML-directed synthesis to improve catalytic performance.
Two solutions were prepared during synthesis. Solution A (Si precursor solution) was prepared by adding a given amount of sodium hydroxide solution to a given amount of deionized water, followed by addition of a given amount of LUDOX HS-30 colloidal silica (or other Si precursors) into the prepared solution. Solution A formed a gel initially, and then it was heated in an oven at 343 K for  15-30 min until reaching a clear sol. Solution B (Al precursor solution) was prepared by adding a given amount of sodium hydroxide solution to a given amount of deionized water, followed by dissolving a given amount of aluminum powder (or other Al precursors) into the prepared solution (Note: the reaction is exothermic and produces hydrogen, hence the addition of aluminum powder should be performed with caution and appropriate safety protocols in place). Solutions A and B were cooled to ambient temperature and then solution B was added dropwise into solution A in a Teflon bottle while stirring. A freeze drying step is applied to remove water to a desired level (e.g., H 2 O final /H 2 O initial = 0.47) within a lyophilizer at ambient temperature with a pressure of 20 mTorr. The synthesis mixture was aged with stirring at ambient temperature for 24 h, and then the vessel was heated in a static oven at a given temperature for a given duration (see details in Supplementary Tables 1 and 2). The products were then washed by repetitive centrifugation and redispersion by deionized water until the pH dropped to 9-10, and then they were dried at 343 K overnight.

Argon physisorption
Measurements were performed at 87.3 K using an automatic manometric sorption Analyzer (Quantachrome Instruments Autosorb iQ MP). Prior to adsorption measurements, the samples were outgassed at 573 K for 10 h under turbomolecular pump vacuum ( < 0.003 Torr). Cumulative pore volume curves were calculated from the isotherms by applying an advanced NLDFT method, which assumes that argon adsorption at 87 K occurs in spherical siliceous zeolite pores in the micropore range and cylindrical silica pores in the mesopore range 59 .

Scanning electron microscopy (SEM)
SEM images for tested samples were acquired using a JEOL JSM-6500 scanning electron microscope operated at 5 kV. SEM specimens were prepared by suspension of the sample powder in ethanol by ultrasonication for 30 min, and then the solution was dropped onto the surface of a silicon chip and dried at room temperature.

Transmission electron microscopy (TEM)
TEM images were taken using a Tecnai T12 microscope operated at 120 kV with a LaB 6 filament. The specimens were prepared by dispersing the sample powder in ethanol and ultrasonicating for 30 min, and then the solution was dropped onto a Formvar-coated Cu grid and dried at room temperature.
Solid-state magic angle spinning (MAS) nuclear magnetic resonance (NMR) spectroscopy 27 Al and 29 Si MAS NMR spectra were acquired with a Bruker DSX-500 spectrometer (11.7 T magnet) and a 4 mm Bruker MAS probe. The spectral frequencies were 78.2 MHz and 99.4 MHz for the 27 Al and 29 Si nucleus, respectively.

Infrared spectroscopy
Infrared (IR) spectra for pyridine adsorption were collected for H-FAU samples on a Nicolet™ iS50 Fourier transform infrared spectrometer with a Hg-Cd-Te (MCT, cooled to 77 K by liquid N 2 ) detector by averaging 128 scans at 2 cm −1 resolution in the 600-4000 cm −1 range and were taken relative to an empty cell background reference collected under dynamic vacuum (~0.01 Torr) at 498 K. Self-supporting wafers (0.01-0.03 g cm −2 , with a diameter of 13 mm) were sealed within an IR transmission cell with ZnSe windows (High Temperature Transmission Cell, Harrick Scientific Products Inc.). Wafer temperatures were measured by K-type thermocouples (Omega) attached to the sample holder. The IR cell was connected to a glass vacuum manifold, which was used for sample exposure to controlled amounts of gaseous pyridine. The temperature program followed for these measurements is described herein: sample dehydration was performed initially, the temperature of the cell was initially raised from ambient temperature to 673 K at a ramping rate of 0.033 Ks −1 followed by holding temperature at 673 K for 6 h; then the temperature was cooled down to 498 K and pyridine was introduced until saturation of the adsorbate was noted with invariance among successive spectra recorded.

Catalytic tests
Proton-catalyzed monomolecular propane reactions were performed in a tubular glass-lined stainless steel reactor (6.35 mm O.D. and 4 mm I.D., SGE Analytical Science) equipped with a thermocouple to monitor the reaction temperature. The catalyst sample was heated in helium flow (0.083 cm 3 s −1 , Matheson) from ambient temperature to the reaction temperature at atmospheric pressure. Prior to data acquisition, we pretreated samples using H 2 /He mixtures (p H2 = 35 kPa, H 2 / He = 1:2, and the total flow rate = 0.5 cm 3 s −1 ) for 20 min to remove any remnant reactive carbon species. Molar ratios of feed gas mixtures were fixed as H 2 /C 3 H 8 /Ar/He = 3/3/1.5/60 with Ar serving as an internal standard, and space velocity of 3600 cm 3 C3H8 ·g cat −1 h −1 with a total flow rate of 1.125 cm 3 s −1 . H 2 is present in the inlet stream to mitigate onstream deposition of organic species. Reactor effluent was vented to atmospheric pressure, system pressure varied from 101 kPa to 120 kPa as measured by a PX209-300G5V pressure transducer. Reactor temperature varied from 818 to 893 K with an interval of 15 K. Propane conversions were <1% and considered differential for assessment of catalytic rates. The composition of the reactor effluent was analyzed by an online Agilent 7890 A gas chromatograph (GC) using a flame ionization detector (FID) and a thermal conductivity detector (TCD). Eluent separation was achieved in parallel using a dimethylpolysiloxane J&W HP-1 column (50 m long, 320 μm diameter, 0.52 μm film thickness) connected to the FID and a GS-GasPro (60 m long, 320 μm diameter) preceding the TCD. Ar was quantified using the TCD, and all hydrocarbon species were quantified using the FID.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
The data that support the findings of this study are provided in Supplementary Information and Source Data file. Source Data are provided as a Source Data file and enclosed with this paper. Details listed in the Supplementary Information consist of the synthesis procedures, characterization results (XRD patterns, Ar-adsorption isotherms, SEM/ TEM images, 29 Si solid-state NMR), reactivity analysis, and machine learning methods and results. Source data are provided with this paper.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/ licenses/by/4.0/.