Machine learning for investigating the relative importance of electrodes ’ N: P areal capacity ratio in the manufacturing of lithium-ion battery cells

This work studies the impact of the ratio between the areal capacity of Graphite anode to NMC622 cathode for Lithium-ion batteries compared to the electrode characteristics of thickness, mass loading and cathode areal capacity, on their electrochemical properties. The influence of factors on energy capacity and gravimetric capacity at various Crates starting from C/20 up to 10C is quantified by combining experiments obtained via design of experiment techniques, machine learning modelling and explanation techniques. The results highlight that the performance at all Crates is highly affected by all features however their relative importance, and the linearity and nonlinearity of the dependencies is quite unique for each Crate capacity. N:P ratio is showing a relatively smaller effect on electrochemical performance compared to thickness, mass loading of active material and cathode areal capacity. It is also concluded that while the impact of N:P ratio is almost linear at lower Crates, it is nonlinear with a local optimum at medium and high Crates. This study offers a methodology for smart selection of a ratio between anode and cathode aerial capacity for a balanced performance of cells at all Crates.


Introduction
Lithium-ion (Li-ion) batteries lead the market for electric vehicles [1] as well as large-scale energy storage systems [2]. As a result, an increasing attraction to this technology has been developed from academic and industrial researchers. The Li-ion cell electrodes, their properties and manufacturing techniques are recognised to be crucial when defining the cell performance. As a result, revealing the impact of electrode chemistry, structural features, and manufacturing variables on cell characteristics has turned to the top priority of a considerable number of recent studies [3][4][5].
Balancing the anode and cathode characteristics defined by the active material mass loading, thickness, and the capacity ratios between the negative and positive electrodes, which is called N:P ratio, is critical however challenging for Li-ion cells [6,7]. N:P ratio by itself is a decisive factor for battery performance and safety [8][9][10]. It has shown to highly affect the rate of lithium plating for Silicon Carbide (Si-C), Lithium Iron Phosphate (LFP) cells, such that a ratio of 0.8 leads to no plating and a long cycle life [6]. N:P ratio also influences the energy density of cells as confirmed by Ref. [10]. Graphite (Gr), LFP cells with N:P ratios lower than 0.87 have higher cell-level specific capacity at early cycles of discharge but degrade much faster later on. For Silicon Oxides Graphite (SiOx-Gr), LiNiCoAlO (NCA) cells, the most stable cycling performance after 500 cycles is shown to be achieved by a ratio of 1.03 [9]. For Lithium (Li) metal cell the optimal N:P for a stable performance was detected to be 1, where a balance between the Li consumption, electrolyte depletion and solid electrolyte interphase formation was achieved [11]. N:P ratio, as the anode to cathode effective area [12,13], has also been confirmed to influence the rate capacity which declines with increased anode area. Discharge capacity retention during 1C/1C charge and discharge remains almost unchanged for increased anode area and in fact only a minor capacity decrease appears for cells with larger anodes.
Although the impact of N:P ratio on the performance and characteristics of the cells has proven to be significant and covered by abovementioned research, its relative importance compared to the other physical characteristics of the cells has not yet been fully quantified to the best of the authors' knowledge. On the other hand, a vast majority of the existing studies have performed experiments for the purpose of N:P impact studies based on try-and-error approach and have conducted the analysis via conventional visualisations rather that systematic evaluations and investigations.
Motivated by this research gap and in search for a comprehensive understanding of the impact of N:P ration on cell performance, this paper has conducted a complementary analysis. To be specific the objective of this paper is to answer the following questions.
1. How important the N:P ratio is in defining the cells characteristics compared to other electrode properties? 2. What is the nature of the dependency between the cell performance and N:P ratio at various conditions?
To answer these questions, a methodology based on design of experiments (DoEs) and machine learning (ML) modelling techniques is developed. ML techniques has already been proven to be very powerful in investigation of the complex electrochemical characteristics of cells [14,15], and especially during the manufacturing process with a large number of control factors [16][17][18]. However, they have not been fully utilised for N:P impact analysis.
In this methodology, the impact of N:P ratio on cell performance and its energy density has been studied via experiments designed to cover the ranges with suitable breakpoints, and then conducted in strictly controlled conditions so that the influences can be distinguished from the effects of other factors effectively.
In general, the correlated characteristics of electrodes, such as mass loading, thickness and the expected capacity necessitates a large number of experiments to be run by the common approach of "change one factor at a time", which is very costly and time consuming. In response to that, the novelty if this paper is taking a DoE approach [19,20], based on the practical ranges of the control variables to obtain the maximum amount of information from the minimum number of experiments. The design of experiment approach when tailored to the problem of electrode manufacturing helps to identify the influential factors and pinpoint the optimal combination of those for an effective study and analysis. A comprehensive review of various DoE techniques for the context of Li-ion battery electrode preparation is traceable in Refs. [21,22].
Based on the data generated vis a purposeful DoE, this paper then utilises the Explainable Machine Learning (XML) models [23,24] to reveal the correlation, interactions, and dependencies in between the control and response variables. While the conventional ML techniques have been utilised to solve the problem raised in the field of Li-ion battery electrode manufacturing and very well reviewed in studies such as [25][26][27], the XML methods have not been used comprehensively for such analysis to the best of the authors' knowledge.
With the power of DoEs and XML, the study offers a methodology where experiments with more than one factor changing at a time can still facilitate systematic analysis very effectively and via affordable experimental and computational costs. This methodology while is discussing the particular case of Gr/NMC Li-ion battery coin cells, it is highly adaptable to other chemistries as well as different manufacturing processes of Li-ion batteries. This study is believed to guide the researchers at laboratories or pilot-scale lines as well as the manufacturers in industry-scale to move towards smart, optimized, and clean manufacturing of electrodes for high-quality Li-ion batteries.
The structure of the paper is as follows. Section 2 describes the methodology, including the details of the experiments, the cell manufacturing, instrumentation, as well as modelling approach via machine learning and analysis backed up by explainable machine learning methods. Section 3 reports the main results which include prediction by models, correlation, and dependency analysis. This section also offers discussions. Section 4 concludes the findings and elaborates the future works. Appendix includes the formulations and the reproducibility guides.

Methodology
The methodology of this study includes three main steps as described on Fig. 1, step 1 experiments, step 2 modelling, and step three explanation.
Step 1 starts with the design of experiments where the range and break points of each control variable is decided by the experts. The ranges of the variables are such that the response variables cover a practical range for manufactured cells within the pilot-line. The values are selected to be within the range used in commercial lithium-ion cells, but sufficiently different that significant changes in performance can be anticipated. For example, cathode coatings at 2 mA h cm − 2 are expected to have a much higher discharge capacity ratio (5 C: 0.2 C) than 4 mA h cm − 2 cathodes. The full list of control and response variables are given in Table 1.

Coating Mass(g) × Dry Composition(%)
100 × Capacity Active Powder(mAh/g) N : P Ratio = Expected Areal Capacity Anode(mAh/cm 2 ) Expected Areal Capacity Cathode(mAh/cm 2 ) In the second step, the experimental data are fed into a machine learning model to relate the control and response variables. The data as well as the models are utilised in the final step to explain the dependencies and the impact of each factor or responses. In the upcoming subsections details of the experiments and models are provided.

Experiments
The experiments for this study are conducted in the WMG, University of Warwick, pilot-line electrode and cell manufacturing facilities. They follow the set of variables and break points given in Table 2.
For these experiments, the N:P ratio and the cathode areal capacity are independent control features. The anode areal capacity is calculated for the given N:P ratio and the active material weights are also calculated based on the areal capacities. As the table shows, the coating process was planned by considering a range from low (2 mAh/cm 2 ) to high (4 mAh/cm 2 ) coating weights for cathode (5 target values) and determining the anode coating weight needed for the N:P ratio of interest (9 target values). Experiments of 12-16 were aimed to be a repeat of the previous ones to ensure the repeatability and reproducibility of the data of this study.
The coating was performed using a pilot-line coater with three drying zones (Megtec). The anode slurry was coated onto copper foil with thickness 10 μm, and the cathode slurry was coated onto Aluminium foil with 15 μm thickness, using the same line parameters for all cases. The parameters of coating process included the coating ratio of 150% and line speed 1.2 m/min. The drying conditions for anode were temperature at 45 • C in the first zone, 60 • C in the second and third zone. The drying air speed was 5 m/s. For cathode, the ovens were set at 85 • C for first zone, 110 • C for the second z and 95 • C for the third zone. For cathode, the drying air speed was set to 5 m/s. Electrode sheets were calendered to reach a target porosity of ~30% using a laboratory calendar (Innovative Machine Corporation). Anode was calendered at room temperature and cathode with the rolls heated to 85 • C. after calendering, the electrode discs were cut (15 mm diameter anode and 14.8 mm diameter cathode for 2023 coin-cell). Electrodes with coating weight within ~1% of the target were selected for assembling coin cells. Anode and cathode electrodes were matched to obtain N:P rations within ~1% of the target value.
After electrode preparation, 3 cells were made for each experiment and the electrode properties were measured (thickness, mass) or calculated (coating weight, density, porosity, area capacity, N:P ratio) and a databased was created. The manufactured cells were finally put under formation processes and various testing conditions for characterisation. The cell formation protocol included one cycle (charge and discharge at C/20 rate) followed by 5 conditioning cycles (charge and discharge at C/5 rate). During the testing protocol, capacity was measured during discharge at different C rates: C/5, C/2, C, 2C, and 5C (each followed by a charging at C/5 rate before ethe next experiment) at room temperature and nominal voltage range between 2.5 and 4.2 V. The gravimetric capacity was calculated by diving the capacity to the cathode's coating weight. After manufacturing and testing, there were a total of 48 cells ready for modelling and analysis activities.

Machine learning model
The machine learning models enable relating the control and response variables. Here the models are utilised for the prediction of cell capacity and gravimetric capacity as response variables given the impact factors or the control variables of cathode active material weight, Anode

Table 1
List of input and output variables.  active material weight, cathode thickness, anode thickness, anode areal capacity, and anode to cathode areal capacity ratio (N:P). The machine learning model developed for this study is a Random Forest (RF) [30]. RF is a decision tree-based model with a more efficient development compared to single decision trees [31].
In training of a RF model, multiple de-correlated decisions trees are created where each tree is developed via a random subset of all the predictors and hence suggests the model's name as random forest. Decisions trees of RF are constructed from a finite number of decision as well as the leaf nodes. When training samples are fed into the model, they are evaluated by each decision node through a test function and passed to the different branches based on their features [32].
If consider the control and where X is the control vector and z is the responses for N samples, then the RF model of G can be implemented taking the three steps below: • Bootstrap sampling, sampling the data with replacements and create bootstrap sets of O i n , i = 1, …, BS with the same size of the original set, O. The purpose of bootstrapping is to sample data with replacement from the original training dataset repeatedly for BS times to create multiple training sets. The bootstrapping would help the model to reduce its variance when predicting the responses as generally decision trees tend to be very high variance estimators and the addition of limited number of training data points may result in a different performance [32].
• Creation of regression trees, G i for each bootstrap set, and obtaining the prediction results of Random forest algorithm is illustrated in Fig. 2.
The RF model here is developed and validated by the cross-validation approach [33]. For this purpose, first the data are divided into K non-overlapping batches of equal size. The model training and validation is attempted for K iterations. At each iteration K-1 batches are used for model hyperparameter optimisation, and one batch is left for validation and test. The average performance of all iterations is defined as the final performance of the model. The evaluation of model performance in estimating the cell is performed via goodness of fit metrices that include root mean squared error (RMSE), mean absolute error (MAE), and R-squared (R 2 ) [34]. While the first two focus on the average error between each sample's experimental value and its model-based estimation, the third metric describes the percentage of the data that can be described by the ML model successfully.

Machine learning explanation
Generally, the ML models are black boxes that relate the response variables to the control factors. While they can perform very accurately in predicting the responses, understanding their performance and their internal mechanisms are not very straightforward.
Explainable (interpretable) machine learning techniques could help shedding light on the model behaviour [23]. These techniques can show why a particular decision is made and why a specific predicted values are obtained as an output of the model. While there exist various techniques for XML, the explanation of the machine learning model in here is performed via methods of accumulated local effects, (ALE) [35], relative and global feature contribution (importance) via SHAP (Shapley additive explanations) [36,37] and linear correlation analysis [38]. For each technique the dataset, and the predictions from the ML model are utilised in a framework given by Fig. 1.
All techniques are model agnostic which give the opportunity for applying them on any models of preference including RFs. ALEs are used to quantify how each control variable is contributing towards the predicted values of the responses. To show the ranked and relative impact of each feature on the total predictions the SHAP is calculated. SHAP is a particular version of Shapley values called Shapley additive explanations. It is equal to the classical Shapley values of a conditional expectation function of the model, but easier and more efficient in terms of calculation [23,37]. SHAP describes the contribution of a coalition of features to the total prediction. The contribution of each feature is the expected value across all possible coalition of features not coalition that specific feature of the prediction change caused by adding this feature to the coalition. The final item which is a linear correlation analysis is to show the strength of linear dependency between control variables and responses. The correlation coefficients are unitless values between − 1 and 1 where a positive value shows a direct correlation and means that an increase in the control variable would lead to an increase in the response variable. A negative value means an inverse correlation. When the coefficients are close to zero, positively or negatively, they indicate that there is not sufficient evidence regarding a strong linear correlation between variables [38].
The formulation of the three methods mentioned above, are included in Appendix Section and the readers are referred to Ref. [35] for ALE [36], for SHAP and [38] for linear correlation analysis to gather further information regarding the formulations and detailed mathematical equations.

Results and discussions
In this section, the focus is on the impact of different N:P ratios on the rated performance of cells, their possible causes and implications. Beside the main factor of N:P, the relative effect of other correlating factors on cell performance is quantified and the possibility of predicting the cell performance given the control factors defined in the previous section is also investigated. It is worth mentioning that the ML model with the list of inputs and outputs/responses of Table 1, has been iterated for 40 times for a stable output.

Impact of factors on cell capacity
Cell performance in this section is characterised by its capacity (mAh) at C/20, C/5, C/2, 1C, and 5C. accordingly the cell capacity ratio The distribution of N:P ratios with respect to cell capacity at various Crates of this study is visualised at Fig. 3. As the figures show the N:P ratio has an impact on the performance of the cell. At each N:P ratio the cell capacity increases for thicker cells up to 5C.
At 5C the trend is opposite and thicker electrodes at all N:P ratios result in lower capacity values. Similar trend is witnessed for capacity ratio, where thicker electrodes at almost all N:P ratios end up with full liion battery cells with lower 5C to C/5 rates. Furthermore, for all performance characteristics, the N:P ratio of 1.2 shows two clusters at very low and very high capacity, while for the rest of the N:P ratios there is a smoother transition from low to high capacity with cathode coating weight.
Considering the abovementioned finding from the data, clearly although the impact of cathode thickness at a constant N:P is inferable visually, determining N:P ratio's influence on the cell performance is not straight forward due to the correlation with other electrode characteristics such as cathode weight. The ML models and explainability techniques are necessary reveal this correlation.
The Machine learning model's estimation of the cell performance compared to the experiments is illustrated in Fig. 4. According to the results demonstrated at this figure and the model performance metrices at Table 3, the model can estimate the cell performance given its electrode characteristics. For all metrices the results are reported in mean and standard deviation of the multiple runs of the model. Also, the cross validation has been performed with K = 5, which means that at each try 80% of data are used for training and 20% for validation and testing.
For capacity at C/20, C/5 and C/2 the R-squared is above 0.9 which means that the model has captured the variation is the data for more than 90% accuracy. for 1C capacity the accuracy is about 89.7% and for 5c slightly lower at 77.9%. the capacity ratio is also predictable with an accuracy of 88.5%. in all cases the RMSE and MAE values are below 1 mAh which shows the models ability in relating the input and output variables very successfully. When comparing the accuracy of models for various Crates, it is worth noting that, in lower Crates due to the slow rate of the electrochemical reactions, the impact of the factors such as N: P ratio and active material weight is more of linear type, this is shown via the ALE analysis on Fig. 5. As the Crate increase the impact of factors tend to be more nonlinear and more complex due to an increase in sidereactions that are not easy to be quantified. In such cases the ability of model for capturing the relation between the predictors (factors) and the responses reduces. Therefore, the model's performance is slightly weaker in higher Crates.
Considering the performance of the ML models, the explainability techniques can now be utilised to reveal the input-output relationships. The effect of the N:P ratio on the cell performance via ALE analysis is depicted at Fig. 5. It is worth noting that while the x-axis of each figure refers to the control variable, the y-axis is centred to the mean value of each response variables.
According to Fig. 5, as the N:P ratio increases the capacity at C/20 decreases linearly. The C/5 capacity decreases with an increase of N:P from 0.9 to 1, it then remains flat till 1.1 and demonstrates further decrease when N:P rises to 1.2. For C/2 and 1C capacity the trends look almost the same, it starts with a plateau between N:P of 0.9-1, a local increase in capacity around 1.05 and then a sharp drop till N:P of 1.2. For capacity ratio, the trend is much like the 5C capacity, a sharp increase is witnessed from 0.9 to 1 N:P, after which the ratio drops slightly till the N:P of 1.2.
To further increase the transparency of models when estimating the cell performance, a feature contribution (importance) analysis via Shap, as mentioned in Section II, is conducted. The global feature importance plots are given at Fig. 6  According to Fig. 6 (a), cathode active material weight has an impact of average 0.5 mAh on the cell C/20 capacity, while this impact from N:P ratio is only 0.04 mAh, therefore cathode active material weight is more important in defining the cell C/20 capacity compared to N:P ratio. Similar analysis and justifications apply to Fig. 6 (b)-(f).
As this set of figures demonstrate, the importance of features is quite different from one Crate to another. While for lower Crates cathode active material weight and its thickness are more important, for 5C capacity anode active material weight also appears among more important features. A common explanation between all feature is about the importance of N:P ratio compared to others. For all Crates, the N:P ratio is the least important feature. In full cells, the rate performance is usually limited by the cathode, rather than the anode. Most lithium-ion electrode show a transition from a resistance limited process to a mass transport limited process, as the rate of discharge increases. The transition point will depend on the electrode thickness, and hence the coat weight and areal capacity. For these cells, the transition points were 5 C for the 2 mAhr/cm 2 cathodes, 2C for the 3 mA/cm 2 cathodes, and 1C for the 4mAh/cm 2 cathodes. In thicker coatings and at higher rates, it is more difficult for the electrolyte to provide the flux of ions required, and only a limited proportion of the electrode will be utilised.

Impact of factors on cell gravimetric capacity
The impact of N:P ratio and other cathode and anode characteristics are reported on the cell's gravimetric capacity (GCap) in this subsection. The gravimetric capacity of a cell is defined as its capacity (mAh) divided by the cell's cathode weight in (g) as Eq. (5).

Gravemetric Capacity
According to the analysis and modelling results, while the ML model performance was very desirable at all Crates for cell capacity, the accuracy obtained for Gravimetric capacity is about 65%. Due to the space limitation, this subsection only reports the feature ranking and dependency for two Crates of 5C and 10C where the models have the highest predictability. The 5C model has an accuracy quantified by a RMSE of 15.598 ± 2.026 (mAh/g), MAE of 11.414 ± 3.360 (mAh/g) and R 2 of 0.885 ± 0.022%. The 10C model accuracy is reported as 10.569 ± 5.712 (mAh/g), MAE of 7.587 ± 3.072 (mAh/g) and R 2 of 0.685 ± 0.018%.
The ALEs graphs as well as the feature raking charts for the gravimetric capacity is depicted in Fig. 7. According to this figures, cathode active material mass and its thickness appear as the most important features for both 5C and 10C gravimetric capacity and the contribution of N:P ratio on 10C capacity is more significant than that for 5C case.
According to the ALEs graphs of Fig. 7 (c) and (d), both Crate capacities has a nonlinear dependency with N:P. As the figures show, the highest 10C gravimetric capacity values are at medium ranges of N:P and the gravimetric capacity drops at high ratios while for 5C, it happens at slightly higher N:P ratios. The difference in shapes between Fig. 7 (c) and (d) is interesting. At 10C, all the cathodes are operating under mass transport control, whether that be ionic diffusion or migration. At high N:P ratios, this decreases the gravimetric capacity. However, at 5C, the low coat weight cathodes (2 mAh/cm 2 ) are still under resistance control. Since the cathode active material weight is a more significant factor than the N:P ratio, this complicates the interpretation of the N:P ratio. All three of the N:P = 0.9 tests were at medium to high coat weight, while two out of the three N:P = 1.2 tests were at the lower coat weight.
The implication is that the extreme N:P ratios have a bigger adverse influence when the cathode is operating under mass transport control. To further highlight the impact of features on the gravimetric capacity at other Crates, especially the lower Crates, linear correlation analysis is  performed. Results can be found in Fig. 8. Fig. 8 (a) includes the linear correlation coefficients, as calculated in Appendix A.3, while Fig. 8 (b) has the associated p-values [39,40], related to the significance of that correlation. As Fig. 8 shows, while the correlation between all feature and gravimetric capacity values at 5C and 10C is significant, the correlation at other crates is less considerable. This is compatible with what earlier was found regarding the accuracy for ML models at low and medium Crate. This confirms that while the impact of N:P on higher crates is accurately quantifiable, for a better understanding of about gravimetric capacity at low and medium crates further studies with more break points and a more extended set of control variables is required.

Conclusions
The anode to cathode ratio of expected capacity is a critical factor in defining the electrochemical properties of Li-ion cells with a particular formulation and chemistry. The impact of this factor is very different on cells rated performance. The cell capacity at low Crates such as C/20 and C/2 is linearly dependent to N:P ratio, but for higher Crates, such as 2C and 5C, as well as the capacity ratio of the cells, of a nonlinear relationship is evident. For higher Crates the existence of a local optima for maximising the rated capacity is obvious. Similar justifications apply to the gravimetric capacity of cells. The GCap results confirm that the contribution of N:P ratio to the capacity value is not equal at every Crate. In fact, the higher the Crate the clearer the correlation between N:P and gravimetric capacity. the analysis of this study showed that, although the N:P ratio is a crucial factor for the cell characteristics, its relative importance compared to the other factors such as anode and cathode active material mass and thickness is much less. This confirms that the individual electrode features are masking the impact of N:P ratio on responses and if this factor is to be investigated in more details, a DoE  taking care of masking factors would be necessary. Based on the analysis provided here, further works for future can be recommended to be conducted; a new design of experiment and model-based study with a focus on the gravimetric capacity, as well as other electrochemical features of cells such as internal resistance is one of interesting ones. It is also worth expanding this study to other material and combinations of anode and cathode formulations, as well as other cell characteristics such as cycle life. While the ML-based approach for analysis in this work is tied to the specifications of the data used for that and it is expected that other materials and features show a different behaviour, the proposed methodology which consists of 1) Design of Experiments, 2) Electrode Preparation, 3) Machine Learning Modelling, and 4) Explainability is highly transferable from one case study to another. It should be fairly mentioned that during this transfer, a new hyperparameter optimisation is necessary but straightforward via the techniques such as grid search optimisation that has been addressed within this study.

Declaration of competing interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: James Marco reports financial support was provided by The Faraday Institution.

Data availability
The dataset has been shared in the Supplementary material section.

Acknowledgement
This research was undertaken as part of the NEXTRODE project, funded by the Faraday Institution, UK. Grant Number: FIRG015.

Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.org/10.1016/j.jpowsour.2022.232124.

A.1. Accumulated local effects
Accumulated local effects [23] are calculated for a single output with respect to a particular feature, or input of x f . For this purpose first the feature space is partitioned into P f (l) intervals with the limits specified by w l,f for l = 1,2, …, L f , where l is the indicator of the interval and f refers to the fth feature and L f is the total number of features. Here, P is selected by the modelling researcher considering the range and the distribution of the data. In fact, P can be decided considering the number of the clusters of the feature values. Considering the feature values within each interval, the difference between the prediction of the random forest model, RF, with the exact value of the feature and the feature at the upper and lower limits of the interval is used as inputs. The accumulated differences R F f,ALE (x f ), are obtained via equation (a.1).
The centred value is the main effect of the fth feature at a certain point in comparison to the average prediction of the response via model.

A.2. SHAP Values
SHAP values quantify the contribution of each feature to the overall prediction achieved by the model for the particular response. Considering L f as the total number of features which is 5 in here, including the N:P ratio, Cathode Areal Capacity (mAh/cm 2 ), Anode Areal Capacity (mAh/cm 2 ), Cathode Active Material Weight (g/m 2 ), Anode Active Material Weight (g/m 2 ), SHAP is calculated via equation (a.3).
Here, S is a set (or coalition) of features, v(S) is the prediction based on this coalition or the response value obtained based on this subset, φ f (v) is the Shapely value for feature f. L f \{f} refers to all the possible coalitions or subsets excluding the feature f.

A.3. Linear correlation
The correlation analysis between the characteristics of the electrodes, f, which include N:P ratio, Cathode Areal Capacity (mAh/cm 2 ), Anode Areal Capacity (mAh/cm 2 ), Cathode Active Material Weight (g/m2), Anode Active Material Weight (g/m 2 ), and the electrochemical features of the half-cells, y, which indicates the capacity at a specific Crate, is performed via (a.4) [38]. Here the correlation strength is rf,y and recognised as Pearson product-moment correlation coefficients. Here, μ refers to the mean value of each variable, σ is the associated standard deviation and E denotes the expectation.
rf,y is a value within the range of [0, 1] and could be positive or negative which refers to a direct or reverse correlation respectively. For each correlation coefficient obtained via (a.4) a p-value, which is with the range of 0 and 1, is also calculated which refers to the probability of 'Null Hypothesis' being true. The null hypothesis is about having no relationship between the control and response variables [39]. p-values are necessary to validate the correlation analysis and provide insights about the generalisation of the results from sample to population [40]. A value of correlation coefficient closer to 1 or -1 when accompanied with a low, usually below 0.05, p-values confirms a strong relationship between control and response variables with a confidence of 95%

A.4. Model Hyperparameters
To increase the reproducibility of the results reported in this research, the model configuration and the hyper-parameters are listed as below. All the analysis are performed in Python 3 via scikit-learn. A grid search optimisation approach has been taken to optimise the model.