New insights into hydrogen uptake on porous carbon materials via explainable machine learning

To understand hydrogen uptake in porous carbon materials, we developed machine learning models to predict excess uptake at 77 K based on the textural and chemical properties of carbon, using a dataset containing 68 different samples and 1745 data points. Random forest is selected due to its high per- formance ( R 2 > 0.9), and analysis is performed using Shapley Additive Explanations (SHAP). It is found that pressure and Brunauer-Emmett-Teller (BET) surface area are the two strongest predictors of excess hydrogen uptake. Surprisingly, this is followed by a positive correlation with oxygen content, contrib-uting up to ~0.6 wt% additional hydrogen uptake, contradicting the conclusions of previous studies. Finally, pore volume has the smallest effect. The pore size distribution is also found to be important, since ultramicropores (d p < 0.7 nm) are found to be more positively correlated with excess uptake than mi- cropores (d p < 2 nm). However, this effect is quite small compared to the role of BET surface area and total pore volume. The novel approach taken here can provide important insights in the rational design of carbon materials for hydrogen storage applications. ©


Introduction
Growing environmental concerns have increased the need for environmentally friendly fuels and vehicles. One solution to this is the shift towards a hydrogen economy. Hydrogen possesses highly desirable properties as a clean energy carrier and can be used in fuel cell electric vehicles (FCEVs) without emitting carbon dioxide. Unfortunately, a major challenge remains which still hinders the widespread acceptance of FCEVs, namely on-board hydrogen storage. Although hydrogen gas has good gravimetric energy density, its volumetric density is rather low. To counter this issue, the current state-of-the-art for transport applications is compressed hydrogen at 70 MPa in polymer-lined carbon fiber composite Type IV tanks [1]. However, this option is relatively expensive due to the materials cost, and the energy cost of compression. Alternatives such as cryogenic storage of liquid hydrogen, solid-phase metal hydrides, or liquid-phase organic compounds have been proposed; each with their own set of issues. Liquification of hydrogen is energy intensive, suffers from boil-off losses [2], and does not even meet the US Department of Energy (DOE) targets for volumetric capacity [3]. Metal hydrides can have high capacity, but are expensive, require elevated temperature for hydrogen desorption, and have slow kinetics [4]. Liquid organic hydrides such as methylcyclohexane suffer from a similar problem, with the dehydrogenation temperature being around 400e500 C without the use of platinum catalysts [5].
Another potential candidate which has received less attention is physisorption of hydrogen onto materials with large surface area. This method is deemed to be advantageous mainly due to its high reversibility and fast kinetics [6]. Physisorption can result in higher hydrogen capacity compared to compression alone at the same temperature and pressure. This would allow the use of lower pressure or smaller volumes to store the same amount of hydrogen, improving safety and affording greater flexibility in tank design.
One of the most suitable materials for physisorption of hydrogen is carbon. A common method to generate porous carbon materials with large surface area is heat treatment of a suitable precursor in inert atmosphere, followed by activation. Activation is the development of porosity in carbon materials via etching. This can be achieved either by "physical" activation at e.g. 600 to 1200 C (using gases such as oxygen, steam, or CO 2 to react with the carbon surface) [7,8], or by "chemical" activation at e.g. 250 to 600 C (using acids, strong bases or salts to react with the surface) [9e11], or a combination of the two [12].
Apart from activation, it is possible to synthesize carbon with a specific pore size via sacrificial templating. For example, carbon can be deposited onto zeolite frameworks by vapor phase deposition of a propylene/butylene mixture [13], followed by dissolution of the zeolite template in HF or HCl leaving a carbon negative behind. More recently, novel types of porous carbon materials have also been explored such as carbon xerogels and carbon cloth [14,15]. Several studies have also utilized waste [16,17] or biomass [9,18e20] to produce carbon materials, minimizing the full life cycle CO 2 emissions.
For porous materials to be viable for use as hydrogen storage media in FCEVs, the US DOE has set a target for the gravimetric capacity of 5.5 wt% by 2025, equivalent to a range of 400 miles (644 km), assuming a hydrogen storage system mass of 108 kg [21]. Since this system target includes the tank and auxiliary systems, the porous carbon storage medium should significantly overshoot this value of 5.5 wt%. Unfortunately, most materials have not achieved this to date. Therefore, there is still a need for optimization of the textural properties and chemical composition of porous carbon materials in order to be feasible for real-world applications.
To optimize porous carbon materials to achieve higher hydrogen uptake values, studies typically take an approach of choosing a carbon precursor, carbonizing and activating it using several different treatment parameters, measuring the excess hydrogen uptake at 77 K (or 298 K in some studies), and comparing the results at a selected pressure [9,11,16,20,22e27]. This empirical approach is similar to that taken in the field of CO 2 capture, and as Zhu et al. pointed out, may not necessarily give insights into how each variable affects the final outcome [28]. This is because even methodical alterations to the preparation of porous carbon samples will often result in changes not only in one but in many different textural properties and surface characteristics of the carbon at once (e.g., as were the case in Refs. [10,29e31]). As a result, although it is possible to get some qualitative ideas about what variables are important through experimentation alone, it is extremely challenging to determine clear quantitative structure-property relationships.
For example, the literature has qualitative consensus on the importance of overlapping potential fields in narrow pores in improving excess hydrogen uptake [10,22,23], but it has not been possible to quantify the strength or profile of this relationship. Consequently, the tradeoffs between pore size distribution and other variables cannot be established clearly. Some studies assert that optimized pore size is more important than having large surface area [10], while other studies maintain the opposite [22]. Such contradictions can only be resolved by quantitively and unambiguously establishing the relative importance of the relevant variables.
Worse, in some cases no agreement can has even been reached on whether a particular variable helps, hinders, or has no effect on uptake. One such example is the issue of how oxygen content affects adsorption. It is easy to find both experimental and theoretical studies that support the notion that oxygen content improves adsorption [16,20,32], as well as those claiming that oxygen hinders hydrogen adsorption [29,30,33]. The contradictory conclusions of these studies prevent the optimization of carbon materials through rational design, without a clear answer to which factors are important, and to what extent they matter.
In other fields, this kind of issue can sometimes be resolved through, for example, computational experiments using quantum chemical or molecular simulation models. However, it is difficult to design models that are representative of real porous carbon materials, which means most theoretical studies in the field tend to focus on studying only specific types of carbon nanomaterials (e.g. Refs. [34e38]), for which it is unknown whether they can be generalized to other, especially amorphous, types of carbon. This presents a crucial gap in knowledge which is not likely to be solved through conventional approaches.
To resolve this issue, we have performed a meta-analysis on a wide variety of experimental data on hydrogen uptake in porous carbon materials, using machine learning algorithms to understand the complex non-linear relationships between the different variables. Previously, a similar approach has been taken to understand the quantitative structure-property relationships in gas storage materials [28,39] and, more specifically, hydrogen storage materials [40e44]. However, the hydrogen storage studies are focused on well-defined materials for which large (and mostly simulationbased) datasets already exist, such as metal hydrides, metal organic frameworks (MOFs) and porous crystals. As we have stated earlier, because porous carbon materials are amorphous and difficult to model in a way that would be representative of reality, the more sensible approach is to build a new database using literature experimental data as we do in this study, which is difficult and time consuming. Perhaps for this reason, we have not found other studies using machine learning to study hydrogen physisorption on carbon materials, although there are studies about adsorption of CO 2 . For example, Zhang et al. used a deep learning algorithm to predict the CO 2 uptake capacity of porous carbon materials based on their textural properties as well as the measurement temperature and pressure [39]. They tried to interpret the predictions made by using contour plots, but this approach might not be viable for a dataset with larger dimensionalities (more predictor variables). In another study, Zhu et al. used a random forest model to understand CO 2 adsorption on porous carbon materials [28]. Their approach to understand adsorption at different pressures was by splitting the dataset into four separate pressure bins and then fitting a different model for each of them. The relationship is then analyzed by looking at random forest feature importance weights and matching them with the Pearson correlation r value to see the direction of the relationship. However, this approach might present a problem because although the random forest model itself may not be susceptible to multicollinearity between predictor variables, the signs of the r values may not stay consistent when used in multiple regression [45].
As can be seen in previous studies, some of the biggest challenges in machine learning studies are visualizing and understanding what the machine learning model is doing. Therefore, we utilize a supplemental technique known as Shapley Additive Explanations (SHAP), which enable enhanced analysis of the relative importance of each variable in a machine learning prediction, via game theory. This approach has recently gained popularity and has been used to explain machine learning models in a variety of subjects, such as accident detection [46], structural failure prediction [47], and medical compound activity prediction [48]. To the best of our knowledge, this is the first time that explainable machine learning with SHAP has been used in the context of hydrogen storage. The objective of this study is to establish clear structureproperty relationships for excess hydrogen uptake on porous carbon materials by building an accurate machine learning model and analyzing it using SHAP.

Data collection
Data for 68 different porous carbon samples were collected from 14 different studies performed at various pressures, totaling 1745 data points [9,10,13e17,19,26,49e53]. The criteria for inclusion were that all samples must have had the following characterization techniques performed: chemical composition analysis; pore size distribution analysis via nitrogen physisorption; and excess hydrogen uptake measured at 77 K whilst varying the pressure. The selected hydrogen adsorption data was converted from the published graphs to numerical data using a dedicated software program (Plot Digitizer, http://plotdigitizer.sourceforge.net/). The inputs were: the measured weight percentages of C, H, O, and N; the micropore volume (d p < 2 nm, cm 3 /g); the ultramicropore volume (d p < 0.7 nm, cm 3 /g); the total pore volume (cm 3 /g); the Brunauer-Emmett-Teller specific surface area (BET SSA, m 2 /g); and pressure (MPa). The output variable is excess hydrogen uptake, in wt%.

Training and evaluating the ML models
Five different models were evaluated for their predictive performance: (i) least squares linear regression (LR); (ii) support vector regressor with linear kernel (SVR(L)); (iii) SVR with radial basis function kernel (SVR (RBF)); (iv) extreme gradient boosted trees (XGBT, implemented using the XGBoost library); and (v) random forest regressor (RF). To tune the hyperparameters of each model, we performed group 5-fold cross-validation using either the function GridSearchCV(), or RandomizedSearchCV() in scikit-learn (a free Python library for machine learning), with parameters specified in Table S1. Group 5-fold cross-validation is used here instead of regular cross validation to ensure that the models generalize well to unseen samples. The sample names are used as group labels so that in each fold, every test set will not contain data from carbon samples in its respective training set. If regular K-fold cross validation were used instead, where the test-training split are completely randomized, the model may only have needed to interpolate or complete an isotherm for a known carbon sample, rather than generate an entirely new isotherm for an unknown sample. The difference between the two cross-validation methods is summarized in Fig. 1.
However, the performance estimates from hyperparameter tuning (inner folds) alone would be too optimistic, since the hyperparameters are specifically picked based on them resulting in good cross-validated scores in that step [54]. To counter this, an additional outer 5-fold group cross validation was performed (making this a nested cross validation method) to obtain a more accurate generalization performance of each model after hyperparameter tuning. Nested cross validation was chosen instead of a test/train split to minimize bias in the performance metrics caused by the relatively small dataset [55]. The model with the best overall cross-validated performance is then selected and refit with the whole dataset.

SHAP values
Finally, after refitting, the importance and roles of different predictors are analyzed using SHAP (Shapley Additive Explanations). In this approach, additive feature attribution is performed, wherein the complex machine learning model, Here, M is the total number of input features, 4 0 represents the expected value when all inputs are missing, and 4 i is the measure of contribution of a given feature i to a prediction. From game theory, it is known that Shapley values are the only solutions to 4 i which satisfies the criteria of local accuracy, missingness, and consistency [57]. They are also very intuitive because they adopt the same units as the model output (in this case, excess H 2 wt%). SHAP values are the Shapley values of a conditional expectation function f x , which can be computed as follows: where R is the set containing all possible feature ordering, P i R is the subset of features that come before feature i in ordering R. This study uses an algorithm by Lundberg et al. called TreeExplainer that calculates these values efficiently for tree-based models [58]. TreeExplainer also allows for the decomposition of each SHAP value into a main effect and interaction effects, allowing us to see possible synergistic effects between the variables. SHAP values are calculated locally for each individual prediction in the dataset, then the results are plotted for all predictions to show global explanations.
The overall flowchart of this study can be seen in Fig. S1.

Exploratory data analysis
The univariate statistical distribution of the carbon samples explored in this study is summarized in the violin plots in Fig. S2. The dataset covers a wide range of porous carbon materials and follows a mostly normal distribution (although the C wt% is rightskewed, whilst the N wt%, O wt% and mesopore volume are leftskewed). Bivariate correlation analysis was also performed, and the results are shown in Fig. 2. The values outside the brackets are the Pearson's correlation coefficients, r, and the values inside the brackets are the corresponding p-values (the probability of the two variables having this distribution in the dataset despite having no actual correlation). Hydrogen adsorption has a strong positive correlation with pressure (r ¼ 0.605, p < 0.01) and the BET SSA (r ¼ 0.539, p < 0.01). Meanwhile it is only moderately positively correlated with the total pore volume (r ¼ 0.408, p < 0.01), oxygen content (r ¼ 0.354, p < 0.01), and ultramicropore volume (r ¼ 0.326, p < 0.01). It is also weakly negatively correlated with the carbon (r ¼ À0.100, p < 0.01), hydrogen (r ¼ À0.290, p < 0.01), and nitrogen content (r ¼ À0.084, p < 0.01). Although these results seem compelling on their own, linear correlation coefficients are often biased by multicollinearity amongst the features. For example, the strong correlation between BET SSA and total pore volume (r ¼ 0.912, p < 0.01) prevent us from distinguishing the individual effects of each of those variables on uptake from bivariate analysis alone. Therefore, to understand the relationships more clearly, it is beneficial to use SHAP values, an approach which is robust to multicollinearities [59]. In addition, by looking at the pairwise scatterplot of the input variables (Fig. S3), we see that the relationship between nitrogen content and hydrogen uptake may well be a fluke caused by some extreme values which could have a large influence on the linear correlation coefficient. This issue is addressed through the use of the random forest algorithm, because tree-based methods are known to be robust against extreme values in the input space [60,61].

Model selection results and estimated generalization performance
We tried five different machine learning models to predict excess hydrogen uptake based on the textural and chemical properties of the different carbon materials: (i) least squares linear regression (LR); (ii) support vector regressor with linear kernel (SVR(L)); (iii) SVR with radial basis function kernel (SVR (RBF)); (iv) extreme gradient boosted trees (XGBT, implemented using the XGBoost library); and (v) random forest regressor (RF). The crossvalidated performances of the different models are compared in Table 1. In addition, a comparison between the predicted and actual hydrogen uptake values for different models is shown in Fig. 3. Clearly, linear approximations are not well suited for this prediction task, since LR and SVR(L) performed significantly worse than the non-linear models. This result is to be expected for two reasons: first, the strong relationship between pressure and uptake is nonlinear; second, linear models don't perform as well with multicollinear predictor variables [62]. Based on the performance metrics, the random forest (RF) regression method was selected due to its high performance (R 2 > 0.9), and refit with the entire dataset. Notably, even with nested cross-validation, the difference in runtime is not too severe compared to the other models, taking only several minutes to run. RF also has the advantage of being robust against multicollinearities [28], which we have demonstrated to exist in this dataset.

Feature analysis with SHAP
SHAP values derived from the RF model are shown in Fig. 4, where the variables are ordered based on overall importance (degree of influence on model output). The "importance" here is defined as the mean absolute SHAP value of all the points in the dataset. The colors in the figure denote the value of the input variable, where red means high and blue means low values. This means that red on the right-hand side of the plot indicates positive correlation with the excess hydrogen uptake, whilst red on the lefthand side of the plot indicates negative correlation. Again, the pressure is clearly the most important variable, followed by the BET SSA, as expected. Meanwhile, the oxygen content, total pore volume, and micropore volume are also positively correlated to excess hydrogen uptake. However, the hydrogen and carbon contents are negatively correlated with uptake. According to this analysis, the nitrogen content is not important in determining hydrogen uptake, in contrast to some claims made in the literature [27]. Interestingly, in this graph, oxygen is more important than total pore volume, despite the shorter tail. This means that although the effect of oxygen is more moderate, most samples experienced the effect, whereas the positive effect of total pore volume is only experienced by a few samples with very large pore volumes. This is evidenced by the clump of purple-colored points at the middle with almost 0 SHAP values for V total .
The dependence plot in Fig. 5 shows that the BET SSA has a very clear positive relationship with the excess hydrogen uptake. This is consistent with the well-known "Chahine's Rule" which states  , it is evident that the slope of the red points is much higher than that of the blue data points. This confirms that the BET SSA has a greater influence on excess hydrogen uptake at high pressure than at low pressure [10]. This result makes sense, since the BET SSA is associated with the number of available adsorption sites, which is more important at higher pressures where the coverage is expected to be higher. Fig. 5 also shows that the often-quoted slope of Chahine's rule is a  . The reason for this substantial difference in models is that SHAP estimates the contribution of BET SSA based on multivariate analysis of many variables, whilst for Chahine's rule the BET SSA is used as the sole linear predictor of hydrogen uptake. The BET SSA is positively correlated with the total pore volume and the micropore volume, which themselves are both positively correlated with uptake (Fig. 2, p < 0.01). Consequently, any linear bivariate correlation between BET SSA and uptake will show a larger slope than multivariate models, because BET SSA gets the "credit" for the contributions of total pore volume and micropore volume. SHAP values can be further divided into "main" and "interaction" effect values. SHAP interaction values capture the contribution of a synergistic effect between a pair of variables to their overall SHAP value. The main effect is the SHAP value with all the contributions from synergistic effects removed. Here, the SHAP impact values of oxygen were separated into their constituting main and interaction effects to better understand the impact on hydrogen uptake. The SHAP main effect values (Fig. 6a), show that oxygen content has a positive effect on the excess hydrogen uptake. Specifically, it shows that between 8 and 11 wt% oxygen content there is a roughly linear increase of~0.3 wt% in excess hydrogen uptake. Surprisingly, at~11.5 wt% oxygen content, there is a sudden and discontinuous jump in excess hydrogen uptake. Above this threshold value of 11.5 wt% however, there is little added benefit of adding more oxygen. There is a lack of in-depth theoretical work in the literature on the effect of varying oxygen content on hydrogen adsorption (especially at higher pressures). Therefore, it is difficult at this stage to confirm if this step and saturation is a real physical phenomenon, or an artifact of the decision tree ensemble method. Still, these findings are generally consistent with a theoretical study by Gotzias et al. [32], which shows that oxygenfunctionalized slit pores (simulated by stacked graphene layers) result in increased hydrogen density compared to pristine slit pores. Meanwhile, in another theoretical study by Georgakis et al. [33], it was concluded that an oxygen content of 3 wt% had no beneficial effect. However, since the effect is only observed in SHAP for an oxygen content greater than 8 wt% their conclusion is not in contradiction to our findings. This also seems to confirm the suspicion of Blankenship et al. [16,53]. that the oxygen content is beneficial not only to spillover, but also to molecular physisorption. However, this finding also apparently goes against some other experimental results which concluded that surface oxygen content either hampered or had no effect on hydrogen adsorption [29,30]. In their case, however, the BET SSA and pore volume decreased with increasing oxygen content. From the SHAP values in Fig. 4 we already know that decreasing those two variables can easily mask any benefits obtained from increased oxygen content.
The interaction effects between oxygen content, pressure, and hydrogen content are shown in Fig. 6b and c. There is a positive synergistic interaction between oxygen content and pressure, as observed in Fig. 6b. In this case, the blue points (i.e., at low pressure) follow an inverse trend compared to the main effect in Fig. 6a. Thus, this interaction effect significantly weakens the effects of oxygen content on the hydrogen uptake at lower pressures. At higher pressures (red points) the slope is 0, so the main effect of oxygen can be observed.
Further, a positive interaction effect is also observed between the oxygen content and hydrogen content of the porous carbon materials in Fig. 6c. In the case of low oxygen content (blue points), the hydrogen content is not beneficial towards excess hydrogen uptake, with a small negative effect (as also observed in Fig. 4). In contrast, for higher oxygen content (purple and red points), the hydrogen content has a small positive interaction effect on the excess hydrogen uptake. This is again consistent with the study by Gotzias et al., which concluded that hydroxyl groups may result in slightly higher adsorbed hydrogen density compared to epoxy functional groups [32]. In addition, although performed at a different temperature, the results here are consistent with the findings of Schaefer et al., who experimentally reported that carboxyl groups have a stronger correlation to excess hydrogen adsorption compared to quinoid carbonyl functional groups [24]. However, the machine learning technique utilized here is not able to distinguish between specific functional groups at present to confirm such hypotheses.
Next, considering the effect of pore volume (i.e., the total pore volume, ultramicropore volume (d p < 0.7 nm), and micropore volume (d p < 2 nm)) on the excess hydrogen uptake, some interesting trends emerge. First, Fig. 7a indicates that there is a roughly linear relationship between pore volume and excess hydrogen uptake -an increase of 0.1 cm 3 /g in total pore volume results in ã 0.06 wt% increase in uptake. The machine learning model splits the total pore volume into three different bins, with a step-like increase in excess hydrogen uptake between each group. Again, this is likely an artifact of the random forest decision tree model rather than being attributable to a physical trend. This highlights that care must still be taken when applying such models.
Meanwhile, hydrogen uptake increases linearly with the ultramicropore volume (Fig. 7c) up to 0.3 cm 3 /g with a slope corresponding to an increase in hydrogen uptake of 0.1 wt% for every 0.1 cm 3 /g increase in ultramicropore volume. Despite this higher slope compared to total pore volume, however, the overall feature importance is lower than total pore volume because the distribution is much narrower (i.e., it is difficult to get a large ultramicropore volume) and because it seems to saturate after a point. More specifically, above 0.3 cm 3 /g the adsorption saturates and there is little benefit in further increasing the ultramicropore volume. A similar linear relationship is observed for the micropore volume (Fig. 7e), but the slope is around five times smaller, with a 0.1 cm 3 /g increase in micropore volume corresponding to only ã 0.02 wt% increase in excess hydrogen uptake. As such, the micropore volume does contribute to hydrogen uptake, but the effect is much smaller than that of the ultramicropore volume.
Overall, Fig. 7 shows that the total pore volume is more important to excess hydrogen uptake than either micropore volume or ultramicropore volume. This conclusion contradicts some experimental studies. For example, Sethia and Sayari [10] concluded that pore size distribution is more important than the total pore volume or surface area because the uptake of a material with large surface area (~2400 m 2 /g) was matched by another material with much lower surface area (~1300 m 2 /g) but higher ultramicropore volume (0.2 cm 3 /g). However, this experimental trend was observed at atmospheric pressure, where a 1000 m 2 /g Fig. 7. SHAP values and pressure interaction effect values for total pore volume (a and b), ultramicropore volume (c and d), and micropore volume (e and f). All SHAP values are in the same unit as the output of the model (i.e., excess H 2 wt%). (A colour version of this figure can be viewed online.) difference in surface area would only correspond to a difference of 0.25 wt% in uptake, which could be easily counterbalanced by a combination of better pore size distribution and different chemical composition. Meanwhile, another study claiming that pore size is more important than BET SSA did not account for the multicollinear increase of BET SSA with ultramicropore volume [31]. In contrast, other experimental studies do support our conclusion. For example, Zhao et al. concluded that it is impossible to achieve higher uptake values without utilizing higher BET SSA, which would necessitate broadening the pore size distribution [22]. Fig. 7b, d, and 7f show that the overall effect of each type of pore volume on the hydrogen uptake is weaker at lower pressures (the slope of the blue points is more negative than the slope of the red points). Again, like the case of BET SSA, this correlation makes sense since the improvements resulting from an increase in potential adsorption sites should be more apparent at higher coverage. Interestingly, the slope of the red data points (i.e., high pressure) in Fig. 7f (micropores) is positive whereas the slope in Fig. 7d (ultramicropores) is slightly negative. This may indicate that the advantage of having ultramicropores rather than only micropores diminishes slightly at higher pressures, which would be consistent with the findings other studies [10,64]. Cabria, Lopez, and Alonso [64] explained that the pore size distribution is strongly correlated with adsorption energy (because of potential overlap), which in turn affects the equilibrium constant, but this constant is much less important at determining hydrogen capacity at higher pressures according to the equation of state they used (i.e., Mills-Younglove).
Most of the evaluated parameters in this study display a synergistic effect with pressure. This means that any improvements gained from optimizing the physicochemical properties of porous carbon materials do not have as much impact at low pressure.
Consequently, it is difficult to draw conclusions about the relationship between the physicochemical properties of porous carbon materials and their excess hydrogen uptake by measuring at close to atmospheric pressure. This must be considered in future experimental studies.

Discussion
It is important to discuss how well this model would generalize to a wider set of carbon materials. Although the dataset used here included a wide variety of carbon materials, data from many more studies failed the inclusion criteria by not providing elemental composition analysis and/or not including detailed pore size distribution analysis. Despite the limited dataset, many precautions have been taken in this study to ensure an unbiased performance estimate such as the use of group cross-validated scoring, as well as the inclusion of both synthetic and biomass-derived materials in the study.
We should also be careful to keep in mind that studies such as this are correlational, not causal. There is a possibility that there exist as yet unconsidered confounder variables (i.e., variables that are correlated to both hydrogen uptake and one of the variables here). This is especially true for variables which previously did not hold much attention within the research community, which therefore have little or conflicting theoretical basis (e.g., oxygen content). Still, the trends demonstrated in this study should serve as a guideline as to which variables warrant further controlled experimental and theoretical studies in the future.
In addition, although the SHAP explanations of the model introduced here show important insights, they are in fact still "true to the model" rather than "true to the data." One potential issue, then, is that slight changes of the sample may result in wildly different interpretations if the model used is not truly robust against multicollinearity. To account for this, we conducted a kind of sensitivity analysis on our model by bootstrap sampling (sampling with replacement) our entire dataset many times to generate many different datasets, retraining our random forest model on the bootstrapped datasets, and then reexplaining the new models using SHAP in order to estimate the variability in SHAP relative importance weights (the average absolute SHAP value from all data points). We took 500 different bootstrap samples consisting of 850 data points each (around half the original dataset size), and the results are shown in Fig. 8. Although there is some variability in the values, the feature importance attributed here is generally correct. The effect of model choice was also studied by trying the same bootstrapping method but using the XGBT model, which has a similar accuracy to the RF model. The results are shown in Fig. S4. We see that the importance has not changed much compared to RF. This is likely because the model, which has roughly the same accuracy as RF as well as being in the same family of ensemble methods based on decision trees, has learned in a very similar way. However, this may not always be the case if we try out different models. This is because SHAP uses the models to generate its explanations, not the data, and if the models are sufficiently different from each other, so too will the SHAP explanations. An easy way to understand this is by applying the SHAP explainer to a linear model, in which case SHAP will simply trace out the lines.
Aside from this, we also tested the sensitivity of the model and its resulting SHAP explanations against extreme values. Using boxplots of the input variables (Fig. S5), we identified and removed data points which conventionally would be considered extreme input values (more than 1.5 times the interquartile range away from the first and third quartile). If we were to do this for every value in every descriptor, we end up removing around 20% of the original dataset. The resulting dataset was then used to refit the RF model and the SHAP values were analyzed. Notably, the refitted model now has a lower R 2 value of only 0.866 while the RMSE has gone up to 0.690 wt%, showing a poorer fit which is probably a consequence of the significantly smaller training set.
The direction and strength of the relationships in general did not change, but as can be seen in Fig. 9a the model now puts H wt% and O wt% at larger importance compared to BET SSA. The reason for this was that by removing some extreme values, some data points with SHAP values close to zero for H wt% and O wt% were removed, thus increasing the mean absolute SHAP value for those variables. This phenomenon can be clearly seen by comparing Fig. 9b with 9c, as well as Fig. 9d with 9e. Thus, we believe this change is merely an unfortunate limitation of how we defined the average variable importance. However, we see also through Fig. 9bee that the actual strength and shape of the relationship themselves have not changed much from the original model. We believe that the dependence plots are therefore robust to the removal of those extreme values. We also would like to note that the extreme values recorded here are not the result of input errors to the best of our knowledge, and so we decided not to remove them from the final model as we did not have a legitimate reason to do so.
To further evaluate the adequacy of the RF model, we also tried to identify outliers, which are data points with abnormally large regression errors. To do so, we plotted the distribution of the residuals (regression errors) using a scatterplot of standardized residuals, a histogram, and a normal Q-Q plot as seen in Fig. 10. The plots show that our residuals are homoscedastic (have constant variance) and closely follow a normal distribution with a mean of À0.04 and a standard deviation of 0.54 wt%. There is very little kurtosis (0.1795) and skewness (0.1034) in the residuals. These plots indicate a good fit since they follow the original assumptions of the model fitting process. We see no clear outliers from the error distribution.

Conclusion
In summary, we developed a random forest machine learning model to predict the excess hydrogen uptake of porous carbon materials from their chemical structure and textural properties. This model has good performance (R 2 ¼ 0.910) on a dataset from the literature covering 68 different carbon samples and 1745 data points. SHAP values were used to analyze the relationships between the predictors and excess hydrogen uptake, resulting in the following conclusions: BET SSA is the dominating parameter and is linearly correlated with excess hydrogen uptake, as expected. Chahine's Rule significantly overestimates the effect of BET SSA on hydrogen uptake. Increasing the oxygen content of carbon materials from 8 to 12 wt% could potentially lead to an increase in hydrogen uptake of around 0.6 wt%. The total pore volume and BET SSA have stronger effects than the pore size distribution. Increasing the ultramicropore volume is much more effective than increasing the micropore volume. The above relationships are stronger at higher pressures compared to lower pressures, and therefore it is preferable to perform experiments at >1 MPa to observe trends.
The conclusions we have come to from this study will be helpful in the design of new porous carbon materials for hydrogen storage.
In addition, this work shows the incredible potential that explainable machine learning holds in uncovering new counter-intuitive insights from material science datasets. In the future, the accuracy and power of such studies could be improved by larger datasets, as well as the standardization of experimental characterization methods between different studies and groups, since a major challenge in this work was the fact that most studies in the literature failed the inclusion criteria.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.