Design of experiments for optimizing the calendering process in Li-ion battery manufacturing

Calendering is a key yet complex manufacturing process that has varied effects on the Li-ion battery cell performance. Finding the optimal compaction can require many experiments if using the traditional one-factor-at-a-time method, which would be both complex and resource intensive. Design of Experiments (DoE) coupled with modeling via multiple linear regression (MLR) are used in this study to better understand the complex process of electrode calendering. The factors studied in this report are rolling temperature, post-calendered porosity


Introduction
Energy storage is seen as the next key hurdle that must be overcome to fully enable the integration of renewable energy sources into both the grid and clean transportation. For both applications, Li-ion batteries (LIBs) have seen significant use due to their relatively high energy and power density. Over the past 30 years, the cost of LIBs has significantly decreased (from $4500/kWh to $100-250/kWh) while simultaneously increasing in both energy (from 80 to 250 Wh/kg) and power density (from 200 to greater than700 Wh/L) [1,2]. However, the cost of LIBs needs to go at least below $100/kWh before mass adoption into both grid storage and electric vehicles (EVs) [2,3]. Unfortunately, LIBs have reached maturity, and thus the improvements in energy density have slowed down to ~1-2% per year [2]. In order to keep pushing the capabilities of LIBs (or other energy storage technologies), a more holistic understanding of the entire LIB system is needed.
The manufacturing of LIBs is very complex and is done over many steps and chemistries, each of which requires bespoke optimization [4][5][6][7][8][9]. One key step is calendering, where the dry electrode is rolled and pressed between cylinders in order to improve the compaction of the electrode [4]. By compressing the electrode between two cylindrical rollers, the thickness is decreased, resulting in an increase in the volumetric energy capacity (due to the decrease in volume). Given a similar tortuosity, a reduced thickness would translate to a shorter electrode diffusion length, which means that Li + and electrons will need to travel a shorter distance to reach the ends of the electrodes (essentially lowering the critical electrode diffusion length), which translates to superior rate performance [10][11][12]. In this report, the minimum electrode thickness where the Li + is able to easily diffuse through is called the "critical thickness" of the electrode [10]. When the thickness of the electrode is greater than the critical thickness, then there is not enough time for the Li + to traverse the entire electrode, resulting in poorer performance (especially at higher C-rates where the discharge time is short).
Pressing the electrode also has the added effect of improving the contact between the particles of the active material with the other components. Compaction also causes a reduction in the porosity of the electrode. Simulations and experiments show that the Li + gradient is greater within a less porous electrode [13]. Although this gradient difference has been predicted to be negligible at lower C-rates, it becomes detrimental to the electrochemical performance at higher C-rates for both charge and discharge currents. This Li gradient is also exacerbated in thick electrodes, adding to the idea that thick electrodes have poor performance at higher C-rates [10,14].
A change in the porosity would also affect how the electrolyte interacts with the active materials. The voids in the porous network are usually filled with electrolyte, so a reduction in porosity may translate to a reduction in the active surface area (i.e. contact between the electrolyte and active material). Since the surface reaction is often the ratelimiting step, this will have detrimental effects to the rate performance of the cell. If the amount of inactive materials is high, the reduction in porosity may also cause the polyvinylidene fluoride (PVDF) to cover more of the active surface, further reducing the rate performance [15].
Compaction has also been shown to affect many of the mechanical properties of the electrodes. Compaction generally improves both the particle-to-particle adhesion [16] and the adhesion of the electrode to the current collector, though only up to a point [17][18][19]. Adhesion is of interest because it may improve long-term cycling performance, since peeling is often cited as a cause for cell failure. A reduced porosity is also tied to an increased breaking stress, which means the electrodes are less likely to break during manufacturing or operation. However, this also means that the Young's modulus is higher, which translates to a less elastic electrode [20,21]. Too much compaction may also cause the electrodes or particles to crack [22], which generally results in poorer electrochemical performance [23].
Overall, calendering involves optimizing many contradictory requirements, with the maximum electrochemical performance often attained at some intermediate amount of calendering [14,24]. The traditional method of analyzing such a complex system is to optimize each factor one at a time. However, this is both expensive (in terms of cost, time, and effort) and is susceptible to missing any interactions or dependencies between the different factors.
To more efficiently study these complex systems, there has been a growing interest in the use of data-driven models, such as machine learning [25][26][27][28][29][30]. The accuracy of these models is often dictated by the quality and size of the data set used to train them. The use of Design of Experiments (DoE) tools is one efficient method of creating these data sets [31]. DoE is the use of statistical techniques which allow for the creation of information-rich data sets while simultaneously using a relatively small number of experiments. This data set is created by first focusing on the most important parameters (called "factors") and conducting experiments at different key values of these parameters (called "levels") in order to see how each permutation affects the final result (called "response"). The combinations of these factors and levels are called the design space. Using DoE, it is possible to identify which points in the design space would yield the most amount of information and focus the efforts in running experiments in those regions. This allows DoE to also identify the best regions to run additional experiments in order to improve model statistics or to help decide on the first points to remove if the total number of experiments needs to be reduced. The data sets generated from the DoE matrix can be analyzed using regression modeling to extract the greatest amount of insights from them. Specifically, the factors and levels are used to model/predict the responses.
This study is not the first to use DoE in order to analyze and optimize a facet of the LIB manufacturing process. Roman-Ramírez recently published a literature review [32] outlining many of the other publications which used DoE for studying LIBs. Several of the studies mentioned in the review focused on improving the electrochemical performance via different methods, including, but not limited to, formulation optimization [33][34][35][36][37], materials synthesis optimization [38,39], and optimization of experimental parameters [40][41][42][43] or computational models [44,45]. Among these studies, several have used DoE to study the effect of porosity [43][44][45] or thickness [44,45] in various cathode materials, but none have worked on LiNi 0.6 Mn 0.2 Co 0.2 O 2 (NMC622), the active material used in this study. Although there have been several publications on how both porosity and thickness affect the electrochemical performance of NMC [13][14][15]24], these reports did not use DoE nor regression modeling as part of their analyses.
In this study, DoE will be used to generate a data set which will then be analyzed via multiple linear regression. The experimental section will first contain a discussion on the design space, including the selected factors, levels, and the list of experiments. This is followed by a discussion on the modeling methods (Pearson's correlation coefficients and multiple linear regression) and key goodness-of-fit statistics. The results and discussion section contains the important outputs and analysis from each method. Pearson's correlation coefficients are used to show if any linear correlations exist between the calendering factors and the electrochemical performance, specifically capacity, gravimetric capacity, and volumetric capacity. Multiple linear regression is then used to create a more robust model which includes interaction and curvature effects.
This study aims for fulfill three main objectives: (1) quantify how each factor (and their interactions) affects the responses, (2) identify which combination of these factors results in the optimal value of each response and (3) show the use of design of experiments methodologies and multiple linear regression to efficiently yet accurately study a complex system such as Li-ion battery calendering. The first objective is attained by comparing the individual coefficients from the multiple linear regression models. These results are important to any reader attempting to identify and quantify which calendering factors have the greatest effect on the electrochemical performance, which can then be used for further in-depth studies on these factors.
The second objective is attained by using the model to predict the responses over the entire design space and identifying the combination of factors and levels which results in the maximum values for the responses. These results are important to the reader who needs to identify the optimum calendering parameters for their specific use-cases (i.e. the optimum calendering parameters to get the maximum electrochemical performance at a specific C-rate).
The final objective is attained by comparing the results of the models to what is reported in literature. Readers will gain more confidence in the models if the results match what have been previously reported in literature. In addition to reaffirming previous results, the models contribute to these previous results by recommending specific levels (instead of the general recommendation of "high" or "low") of porosity, temperature, and GSM in order to attain the maximum capacity at given use-cases.
This study has three main novelties and contributions to the literature. Firstly, while this study is not the first paper to report the use of DoE for understanding and optimizing LIB manufacturing nor is it the first to analyze the effect of calendering on the electrochemical performance, it is the first report on the application of DoE focused on mapping the effects of the calendering process to the electrochemical performance. Secondly, this study uniquely uses the model coefficients to show at what C-rate each factor overtakes another factor as the driver for electrochemical performance. Finally, instead of simply stating recommendations, this study shows concisely and graphically how the optimal calendering conditions change depending on the specific usecase of the cell.

Design of experiments
The breakdown of the factors and levels are shown in Table 1. The factors for this study are roll temperature (denoted "temperature"), electrode porosity (denoted "porosity"), and coating mass loading (denoted "GSM," or g/m 2 ). Instead of considering the many potential calendering process parameters (i.e. line load, pressure, speed, etc.), porosity is selected as a factor since many of the calendering process parameters mainly affect the performance of the electrode by affecting the compaction (i.e. porosity). This drastically reduces the number of parameters needed to be modeled. Additionally, it makes interpretation of the results simpler and clearer, since changes in performance can directly be attributed to changes in porosity (instead of correlating changes in performance to specific combinations of line load, speed, pressure, etc.). The exception to this is temperature, as this is not tied to porosity and instead affects performance by affecting binder distribution. Although not directly associated with calendering, GSM was included as a factor since the initial thickness of the electrode will affect how the calendering affects the electrode. Both temperature and porosity were run at three levels each in order to capture any curvature in the response surface of the experiment. GSM was set to two levels in order to compare the effects of a low mass loading electrode (thinner electrode) vs a high one (thicker electrode). The limits were decided based on the experts' recommendations and literature [26,46].
Using these factors and levels, the design matrix was prepared using the commercial software Design-Expert. The design matrix consists of a total of 18 different experiments, which are listed in Table 2. The experiments were then split into an 80-20 training-testing set, with the testing set selected at random. Fig. 1 shows the experiments plotted within the design space, clearly showing the different permutations of experiments.

Pearson's correlation coefficients
Prior to modeling, it needs to be checked whether there are any direct correlations between each factor and (1) the other factors as well as (2) the responses. This is specifically shown using Pearson's correlation coefficients (ρ) [47], with the equation shown in Equation S1.
One requirement for multiple linear regression is that the inputs (factors) must not be strongly correlated with one another. In other words, their Pearson's correlation coefficients must be low. The Pearson's correlation coefficients are shown in Fig. S1a in the supplementary section while the p-values are shown in Fig. S1b p-values are a measure of the probability that an effect is due to random noise. Thus, a low pvalue signifies that the correlation is statistically significant. To simplify the analysis, both figures are combined into a single figure, shown in Fig. S1c. In this figure, the upper right half of the correlation matrix (circles) represents the values of the correlation coefficients. The size of the circle denotes the strength of the correlation (a larger circle means a larger correlation), transparency denotes statistical significance (more solid color means more statistically significant, with the statistically significant terms outlined with a thick black outline), and the color denotes the sign of the coefficient (green is positive while red is negative). The sample legend for this figure is shown in Fig. S2. In summary, there is a strong and statistically significant correlation when there is a large circle with a solid outline. The lower left half of the correlation matrix shows the scatter plots for each specific correlation. This will be used mainly to supplement any discussions on the correlation coefficients.   Set   1  85  40  121  45  Train  2  85  36  118  41  Train  3  85  31  116  38  Train  4  120  40  123  46  Test  5  120  35  120  41  Train  6  120  29  119  38  Test  7  145  41  121  46  Train  8  145  34  121  41  Train  9  145  29  123  39  Train  10  85  39  180  67  Train  11  85  34  180  61  Train  12  85  29  176  56  Test  13   From Fig. S1c, the three factors (temperature, porosity, and GSM) are not strongly correlated, meaning they can be used for the models. Additionally, thickness was added to this matrix to show that it is more strongly correlated with GSM and not porosity. In fact, it is possible to model the thickness as a function of porosity and GSM for the cathode, as shown in Equation S2. From this equation, it can be seen that the contribution of GSM (10.39) is more than twice that of porosity (4.00). Since there is a strong correlation between thickness and GSM, any conclusions made on how GSM affects a response may be extended to thickness.

Multiple linear regression (MLR)
The data was modeled via multiple linear regression [32,48], with the full equation shown in Equation (1). The terms A, B, and C represent temperature, porosity, and GSM, respectively. The output (y) is the response: capacity (mAh), gravimetric capacity (mAh/g), or volumetric capacity (mAh/cm 3 ). The levels for these factors were also encoded [46] prior to modeling, with the conversion shown in Table 1. Specifically, the lower level is coded as − 1 and the higher level is coded as +1.
Encoding the levels is an important step prior to modeling so that all the factors will be unitless and in the range of [− 1,+1], making them comparable to one another. Note that the model is made up of the mean of all the experiments (β 0 ) and the contributions (called "terms") of each individual factor (β A , β B , and β C ), their interactions (β AB , β AC , and β BC ), and their curvature (β A2 and β B2 ). Note that GSM does not have a quadratic term (that is, there is no β C2 term), since only two levels were used for GSM, which are not enough to model curvature. The values of β were estimated by minimizing the sum of the squares of the differences between the observations and the predictions from the model [48]. To avoid overfitting (when the model accurately predicts the training data set but is not generalizable to points not part of the training set) and improve model statistics, model reduction was done via minimizing the corrected Akaike Information Criterion (AICc) [49]. No Box-Cox transformations were conducted on the data [48].
One key feature of this model is that β can be used for identifying and comparing the most significant terms since the different values of β show how each response changes with changes in a factor. For example, β A shows how much y changes with each increase in A. Thus, by comparing the values of each β to one another, it is possible to quantitatively show which terms affect the value of y the most. For example, if |β A | > |β B |, then it can be said that factor A has a greater impact on y than factor B.

Model goodness-of-fit
Before using a model, its goodness-of-fit must be confirmed [48]. This is done using R 2 , R 2 adj , and R 2 pred . R 2 is calculated using Equation S3 and is a measure of how much variance is present in the model compared to the average. That is, if the error in the model's prediction (y i -ŷ i ) is low, then R 2 would be close to 1. One drawback of R 2 is that its value also increases with an increase in the number of factors or terms, which can lead to overfitting. Thus, R 2 adj is used. The calculation is shown in Equation S4, which is similar to R 2 but with a penalization factor which causes R 2 adj to decrease when the number of factors or terms increases. The final fitting statistic is R 2 pred , which is calculated using Equation S5. R 2 pred is a form of leave-one-out cross-validation. First the data is modeled without the ith experiment. This model is then used to predict the value of the ith experiment. This is done for all experiments. The sum of squares of these errors is then used to calculate R 2 pred . The rationale is that a generalized model should be able to accurately predict each value in the training data set. If the model significantly changes after removing a single data point, then it is a sign of possible overfitting or lack of enough data points. Model fitting, model reduction, term estimation, and calculation of statistics were done using both Design-Expert 13 and the OLS function from the statsmodels package in Python.

Electrode and cell preparation
Mixing of the cathode slurries was performed using a 1L Eirich mixer, following the procedures described in previous work [28,40]. The details of the formulation, mixing, and coating are described in the supplementary section.
After coating, a lab calendar (Innovative Machine Corporation) with roll diameter of 203 mm diameter was used for the experiment. The temperature for the calendar rolls was set as either 85 • C, 120 • C, or 145 • C. With the aim of reducing the difficulties of calendering thin electrodes, stainless steel shims were used. The calendar was heated and a pair if stainless-steel shims were placed on a hot plate to reach the same temperature. The coating was placed between the shims and the assembly was immediately passed through the heated calendar rolls. The ~28 cm × 11 cm electrodes were calendered to the target porosity (passed through the rolls 1 to 3 times, until the thickness is±1 μm within the target). The electrode thickness was measured on each of the sheets in 15 points in all the stages of calendering: initial (before calendering), intermediary (all passes) and final (after calendering).
The pressure applied on the electrode during the calendering process is controlled by a machine setting (the roll gap), which was manually set for each porosity level. The line speed (0.8 m/min) and the hydraulic pressure set-point (4000 psi) were kept constant for all experiments.
The electrodes were studied in half cells (2032 coin cells), with the cell preparation methods detailed in the supplementary section. The cell tests involved a sequence of charge and discharge cycles at different rates: formation (±C/20), conditioning (five cycles at ± C/5), then rate tests (charge at C/5, successive discharges at C/5, C/2, 1C, 2C, 5C, and 10C). The gravimetric capacities (mAh/g) were calculated from the discharge capacities and the weight of active material in the electrode. The volumetric capacities (mAh/cm 3 ) were calculated using the diameter of the electrode and the thickness of the electrode measured prior to cell making. The electrochemical performance is summarized in Fig. S3.

Pearson's correlation coefficients
Pearson's correlation coefficients are useful at checking for correlations between the factors and the responses prior to modeling. Although this will only capture linear relationships (unlike the multiple linear regression which captures interactions and curvature), it is still a useful tool for quickly checking which factors may arise as being significant later on. Fig. 2 shows a summary of the correlation coefficients for all the electrodes in this study, with the complete data set shown in Fig. S4 to Fig. S9 in the supplementary section. Again, the correlation is significant if the correlation coefficient (shown by the size of the circle) is large and statistically significant (denoted by the solid black outline). Fig. 2a shows that only GSM has a significant correlation with the capacity (mAh). At rates 2C and below, this correlation is positive, meaning an increase in the GSM would result in an increase in the capacity. This is due to the increase in the amount of active material as GSM is increased. At rates 5C and higher, the correlation becomes negative, meaning an increase in GSM results in a decrease to capacity. This is expected, as heavier/thicker electrodes perform poorer at higher C-rates [10][11][12]. Fig. 2b shows that the gravimetric capacity (mAh/g) has a very strong negative correlation with porosity at lower C-rates. This matches with what is seen in electrodes with high active mass loadings [15,16,21]. However, at higher C-rates, GSM seems to drive most of the gravimetric capacity, similar to the capacity results. Temperature seems to have a weak correlation at lower C-rates, but is not statistically significant. Fig. 2c shows that the volumetric capacity (mAh/cm 3 ) has a similar but more statistically significant trend than the gravimetric capacity. That is, porosity drives the volumetric capacity at low C-rates, while GSM drives the volumetric capacity at higher C-rates.
The discussion on the implications of these correlation coefficients will be done along with the discussion on the MLR results.

Goodness-of-fit statistics
Overall, R 2 adj shows the goodness-of-fit of the model while R 2 pred shows whether the model is well generalized. It is generally understood that a model is considered useable if R 2 adj > 0.75 and (R 2 adj -R 2 pred ) < 0.2. Values for R 2 , R 2 adj , and R 2 pred for discharge are shown in Fig. 3, with charge shown in Fig. S10. However, it is noteworthy that specific thresholds vary within different research articles and may be considered context specific to the problem under investigation and the objectives of the model.
All the models are useable, with R 2 adj > 0.75 and R 2 pred within 0.2 units of the R 2 adj . Capacity and volumetric capacity have the best fit models (R 2 adj very close to 1), while the fits for the gravimetric capacity are not as good (R 2 adj at ~0.8) but still deemed to useable for the purpose of estimating coefficients (β) and predicting the desired responses.

MLR model validation and prediction capabilities
Now that the models have been trained, they can be used to predict the values of the test data set. As a reminder, these are manufacturing runs 4, 6, 12, and 17, defined in Table 2.
Two statistics will be used to determine the quality of the model in predicting the test data: the 95% prediction interval (PI) and the percent error. The PI can be seen in the predicted vs actual plots, shown in Fig. 4 for discharge and Fig. S11 for charge. The training data has been greyed out and the testing data is colored as green (if the actual value is within the PI) or red (if it is outside the interval). Additionally, the symbol of each training point is set as either a triangle (if the percent error is greater than 5%) or a circle (if the percent error is less than 5%).
Generally, both the PI and percent error have their relative merits. When the PI is calculated, the expected variations in the data vs the models are already considered. However, the PI does not take into consideration how "important" a prediction or error is. This can be seen in Fig. 4 (capacity, C/20). 2 of the 4 data points are outside of the PI (red). This would normally imply that the model has poor predictability. However, visually it can be seen that all the testing points are close to their predictions (including the red testing points which were outside of the PI). Note that, although the red points are outside of the PI, the percent errors of these points are <5% (as shown by the symbol being a circle). This example shows that looking solely at the PI would have resulted in the model being labeled as poor, when in fact its percent error was very low.
Thus, in order to quantify the quality of the predictions (and by extension, the model), both PI and percent error are analyzed. Table 3 summarizes how many points fall within (1) both the PI and have <5% error, (2) only within the PI but has >5% error, (3) not within the PI but has <5% error, and (4) both outside of the PI and has >5% error.
83.3% of the testing data are within their respective PI, represented by all the green symbols in Fig. 4. This shows that a majority of the data fits within the model's predictions after taking into account possible errors. Note that this value is less than 95%, meaning the PIs of the models do not take into account enough variance in the data and are possibly smaller than they should be. Among all the points that are outside of their respective PIs (red), most have <5% errors (circles), meaning their predictions are still quite reliable despite being outside of  Similarly, 75.0% of the testing data have <5% error, represented by the circles in Fig. 4. Most of the testing data with >5% error are at rates 5C and higher. This is expected, since the values at the higher C-rates are very small and percent error calculations tend to grow in size as the true/actual value decreases. With the exception of those in the capacity (mAh) at 5C, all the points with percent errors >5% are within their respective PIs and are still close to the 45 • line in. This means that their results are still good.
Overall, it can be concluded that models can accurately predict the capacity, gravimetric capacity, and volumetric capacity within a   specified error. In other words, any predictions and conclusions from the models can be considered reliable.

Capacity (mAh)
The terms for the model for the discharge capacity are shown in Table S1 and are plotted in Fig. 5a. As a reminder, the terms are all the terms of the model (β) as shown in Equation (1). In other words, the value of each term is a measure of how that term (and by extension, that factor) affects the response. All discussions will focus on the discharge results while charge results are shown (for completeness) in the supplementary section since the charge results generally mirror those of the discharge. Fig. 5a shows the values of the terms at different C-rates. When a term has a large value, it means that small changes in that term result in larger changes to the response. In a way, this value can be seen as the physical significance of that term. Fig. 5b shows the t-values of each term. The t-values are a measure of the statistical significance. The larger the value, the more statistically significant the term is. For the statistical significance plot, the critical t-values which correspond to pvalues of 0.10, 0.05, and 0.01, are also shown. Any terms above the pval = 0.10, 0.05, and 0.01 lines are considered possibly statistically significant, likely statistically significant, and very likely statistically significant, respectively. In general, when a term has a large contribution to the model, it also has a large t-value. However, there can be times when a term has a larger contribution than another term while also being less statistically significant than that other term. In these cases, both statistical significance and contribution to the model must be considered.
Looking at Fig. 5a and b, capacity is overwhelmingly driven by GSM. That is, GSM (green) has the greatest values and t-values among all terms at all C-rates. Note that, at rates 2C and slower, this coefficient is positive (shown by the green circles). That is, an increase in the GSM would result in an increase in the capacity. This makes sense, as adding more active material results in having more capacity. At higher C-rates (specifically 5C and faster), this coefficient becomes negative (green triangles), meaning increasing the GSM results in a decrease in the capacity. Studies have shown that, once an electrode is past a certain thickness (i.e. the "critical thickness"), Li + ions and electrons start having difficulty travelling through the entire thickness of the electrode, resulting in poorer rate performance [10,13,50]. The value of this critical thickness decreases with C-rate, resulting in the negative correlation between GSM and capacity at higher C-rates. These match with the results from the Pearson's correlation coefficients.
The next significant term is porosity (blue), which has negative coefficients. This means that capacity is maximized with a low porosity. As explained previously, for electrodes with higher % active material, the negative effects of low porosity are not as pronounced [14,15,20], so lower porosities will often result in improved performance due to the improved particle-to-particle contact. It can also be seen that the contribution of porosity shown in Fig. 5a increases at higher C-rates, meaning that porosity becomes more and more significant as the C-rate increases. This is still attributed again to the improved particle-to-particle contact [10], since better contact would mean that more particles are within the critical particle-to-particle diffusion length. Fig. 5a and b shows that temperature is not as significant as GSM and porosity. As discussed previously, the role of temperature is mainly to allow the binder to more easily plastically deform, filling up the voids and better dispersing within the electrode, possibly even combining with other binder particles to form a superior structural conductive network. The lower values of temperature (red) and temperaturesquared (brown) compared to GSM and porosity suggest that this filling of the binder is not as significant as the other factors.
Finally, note the appearance of the porosity-GSM interaction (light blue) at 1C and 2C. This is discussed further in the volumetric capacity section.
Overall, the contribution to the model of GSM is approximately 2 orders of magnitude larger than the other terms, so it is the main driver of capacity. The dominance of GSM becomes less pronounced at higher C-rates, as porosity starts to become more and more significant. However, GSM is still the greatest driver of capacity, even at the higher Crates. There are also several interaction terms which appear as significant at higher rates. However, there is no general trend in these interaction terms.
Although Fig. 5a has the distinct advantage of breaking down how each factor individually affects the capacity, it is difficult to see from this figure how all these factors come together. To show this, the combined effect of all factors on the capacity was calculated over the entire design space and is shown in Fig. 5c for discharge (with charge being shown in the supplementary section). Note that the capacity is scaled for each Crate, with the purple regions showing the zones with the lowest capacity for that specific C-rate and the yellow showing the zones with the highest capacities for that specific C-rate. The design space is encoded in the range of [− 1,+1] for easier comparison.
Overall, at lower C-rates, capacity is maximized at high levels of GSM, regardless of porosity and temperature. That is, even if porosity and temperature were shown to be statistically significant, their contributions were generally overshadowed by the contribution of GSM. At higher C-rates, however, there is a slight shift to lower porosities and temperatures, followed by an inversion of the GSM term to negative.

Gravimetric capacity (mAh/g)
A similar analysis was conducted here for the gravimetric capacity. Fig. 6a and b shows that GSM, while still statistically significant, is no longer the dominant term. This is not surprising, as the gravimetric capacity is the capacity normalized with the active mass. The fact that GSM is still statistically significant shows that the effect of GSM to the gravimetric capacity is not limited just to the mass but is also coupled to other factors, specifically electrode thickness (post-calendering). At low C-rates, this contribution is small. However, this contribution significantly increases at higher C-rates, again due to the effect of thickness to the electrochemical performance [10][11][12]. However, unlike the capacity, GSM has a negative value for the gravimetric capacity for all C-rates. This is, again, tied to the thickness. Increasing the value of GSM causes an increase in the capacity due to the increase in the amount of the active material. However, since the capacity is normalized with the active mass in the gravimetric capacity, any increases in capacity due to having more active material is cancelled. Instead, what is seen here is the effect of thickness on the electrochemical performance, which was discussed previously to be negative.
Instead of GSM, porosity has the greatest contribution to the model while also being the most statistically significant term at lower C-rates. This is, again, likely tied to the improved particle-to-particle contact after calendering. As with the capacity, as the C-rate increases, the contribution of porosity also increases. However, at the highest C-rates, GSM overtakes porosity in terms of contribution to the model, showing that GSM is still the dominant term at extreme C-rates. Porosity also has a negative value, meaning the gravimetric capacity can be maximized when porosity is minimized. These results also match those in the Pearson's correlation coefficient section and of other studies [14,15,20].
Temperature-squared and temperature are also present in these models, both having negative values. However, since temperaturesquared has a greater value, it will overshadow the effect of temperature. That is, for lower C-rates, gravimetric capacity can be optimized by using an intermediate temperature.
Finally, note the presence still of the porosity-GSM interaction at 1C and 2C (approximately the C-rates where GSM overtakes porosity in terms of contribution to the model).
The combined effects of all terms are shown in Fig. 6c. At C/20, capacity is maximized at a lower temperature and lower porosity, as shown by the yellow points being concentrated in the region of low porosity and temperature. At C/5 to 1C, the maximum capacity is attained at intermediate temperatures, low GSM, and low porosities. Then, at the highest C-rates, GSM becomes the dominant term since all the yellow points are found at a low GSM, regardless of porosity and temperature.

Volumetric capacity (mAh/cm 3 )
The volumetric capacity generally follows a similar trend to that of the gravimetric capacity, as shown in Fig. 7a. That is, at low C-rates, porosity is generally the most significant factor, followed by both temperature-squared and GSM. At 5C and above, GSM overtakes porosity as the most significant factor. It is noteworthy that the effect of porosity on the volumetric capacity is overwhelmingly greater than its effect on the gravimetric capacity.
Also note the presence of the porosity-GSM interaction at 1C and 2C, the rates where GSM starts to become more significant relative to porosity. The effect of this interaction term can be clearly seen in the interaction plots shown in Fig. 7d. Here, the x-axis is porosity while the y-axis is the volumetric capacity at that specific C-rate. The black lines show the volumetric capacity when the GSM level is low, while the red lines show the volumetric capacity when the GSM is high. For the two left-most plots (C/5 and C/2), both lines are parallel, meaning there is no interaction between the porosity and GSM. The middle two plots show the model for 1C and 2C. At these rates, the black and red lines are no longer parallel and diverge from one another due to the interaction term. Specifically, the volumetric capacity more strongly diverges at higher porosities. That is, when the porosity is low, the effect of increasing GSM (that is, going from black to red) is low. However, when the porosity is high, the effect of increasing GSM is high.
As the C-rate continues to increase, the difference between the red and black lines continues to increase until they are completely separated, as shown in the two right-most plots. Here, both porosity and GSM have significant contributions to the model. However, no porosity-GSM interaction remains, so the lines are parallel.
Overall, this shows that the porosity-GSM interaction term first affects the volumetric capacity when both porosity and GSM are at a high level. This is likely tied to the effects of thickness and porosity to the critical diffusion lengths [10]. At rates below 1C, the various diffusion lengths are possibly longer than the critical diffusion lengths. At 1C and 2C, the diffusion lengths are possibly close to the critical diffusion length, so small changes in both porosity and thickness have big and synergistic effects on the electrochemical performance. At rates higher than 2C, the interaction terms are no longer significant because all diffusion lengths are likely shorter than the critical diffusion length.
The combined effects of all these terms are shown in Fig. 7c. At lower C-rates, porosity is the main driver for the volumetric capacity since all the maximum points are at a low porosity. However, at higher C-rates, GSM dominates and the maximum volumetric capacity can be attained when GSM is minimized.

Optimization and recommendations
Since the goal of LIB design and manufacturing is often to maximize energy/capacity, Figs. 5c, 6c and 7c can each be simplified by focusing only on the top 10% of values from each simulation (i.e. the yellow points in these figures). These are all shown in Fig. 8a, b, and c for the capacity, gravimetric capacity, and volumetric capacity, respectively. The colors represent the different C-rates, where the C-rate increases when going from blue to purple. Fig. 8a-c can be used to identify the calendering parameters to use when an electrode is expected to be used at a specific C-rate. For example, at low C-rates, Fig. 8a shows that capacity is maximized when GSM is maximized, regardless of porosity and temperature. However, as the C-rate is increased, there is a shift towards preferring lower porosities and eventually lower GSM. The gravimetric capacity is maximized at a low C-rate when both porosity and temperature are low. As the C-rate is increased, there is a shift towards intermediate temperatures, then low GSM and low porosities. For volumetric capacity, it is maximized at low porosities when the C-rate is low. As the C-rate is increased, there is a shift towards also having lower GSM.
Unfortunately, there is no universal set of parameters where all responses are maximized at all C-rates. That is, if good performance at both low and high C-rates is desired, then a compromise will need to be met. Fig. 8d and e shows the change in rate capability at a low and high GSM, where rate capability is calculated as the ratio of the capacity at 10C vs C/5. GSM will, obviously, have the greatest effect on rate capability, since it is the most significant term that negatively affects the capacity at 10C. However, there may be cases where the user would still prefer to have an electrode with a higher energy density for use at higher C-rates. Thus, the rate capabilities at both the low and high GSM cases are shown.
Regardless of the GSM level, Fig. 8d and e shows that a low-tointermediate porosity is desired to maximize rate capability. Decreasing the temperature also allows for more porous electrodes while still maximizing the rate capability. Lower temperatures (which are advantageous for lowering the overall cost of manufacturing) can still result in maximum electrochemical performance by decreasing the porosity.
The common recommendation from literature is to use thick electrodes with low porosities for low C-rate applications and use thin electrodes with high porosities for high C-rate applications [10][11][12][13][14]. Thick electrodes with low porosities are good for low C-rate applications since they take advantage of the high capacity from the high mass loading and the improved contact of a dense electrode. On the other hand, thin and porous electrodes are recommended for high C-rate applications due to the electrode being closer to the critical thickness and the larger active surface area of a porous electrode. The results of this study reaffirm these recommendations except for the porosity for high C-rate applications. Fig. 8 shows that a denser electrode (in other words, having a lower porosity) is, often, superior to a porous electrode. This discrepancy is because most of the recommendations from literature were done using electrodes with lower active mass loadings (~90%). When the active mass loading is higher (96% in this study), the detrimental effects of having a low-porosity cathode are also less pronounced [14,15,20].
One key highlight of this study is the use of DoE to create a relatively small, yet information-rich, dataset which was then modeled using a simple MLR model. The model was possible to quantify the non-linear effects of each feature, including their interactions. The fact that the results match well with literature adds confidence in the combined use of DoE and MLR to efficiently yet accurately understand complex systems. Additionally, the simple mathematical model can be used to find optimum values, given conflicting features. For example, instead of the general recommendation of "intermediate levels of porosity and temperature are recommended" the model is able to recommend specific values of porosity and temperature in order to maximize capacity (or any response).
Overall, this study is able to show that DoE + MLR are simple yet effective and efficient tools to understanding and modeling complex systems, such as Li-ion battery electrode calendering.

Future work
Calendering is only one of many LIB manufacturing processes. It is expected that calendering would have some interactions with all these other processes. A high-level DoE involving calendering and these other processes would be key to further optimizing the electrochemical performance of LIBs. A mixture-process optimal design DoE would especially be of interest, since the active mass loading is expected to have an interaction effect with calendering. The cells used in this study were half-cells. Expanding the study to larger format cells and conducting validation experiments at the manufacturing scale are recommended. Scaling-up from lab-scale to plant-scale electrode manufacturing is not always linear. Thus, a separate DoE + MLR study may be needed when scaling up. This study also focused mainly on NMC622, but the methodology can be replicated for other cell chemistries such as NMC811, LFP, or other promising active materials. If the exact same design is used, then cell chemistries can be set as a categorical factor and difference in effects between chemistries can be quantified. This study also focused on how calendering affects electrochemical performance but did not probe the intermediate properties such as how binder distribution or conductivity change with calendering (and by extension, how these intermediate properties affect the final performance). A study of these intermediate properties would be useful in better understanding the mechanisms behind calendering.

Conclusions
As mentioned previously, this study has three main objectives: (1) quantify how each factor (and their interactions) affects the responses, (2) which combination of these factors results in the optimal value of the response, and (3) demonstrate how DoE + MLR are simple methods which allow for the efficient modeling of complex systems. The first objective is achieved in Figs. 5, Figure 6, and Fig. 7, since the values of the terms show which factors have the greatest effect on the response. These trends were then correlated to physical phenomena within the electrode based on reports from literature. The second objective of this study is summarized by Fig. 8, which shows where in the design space the different responses are maximized. The final objective is achieved by showing that the results from the models match well with results reported in literature while adding to this by providing specific values of porosity, temperature, and GSM to maximize each response.
Overall, this study was able to show the use of Design of Experiments as a tool in order to generate an "information-rich" data set which could then be used to (1) create a quantitative model of how different factors affect different responses, (2) use the model to understand how each factor and their interactions affect each response, (3) use the model to make accurate predictions, within a specified error, and (4) make recommendations on how to maximize each response, all while requiring only a relatively small data set.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Data will be made available on request.