Piecewise-linear modelling with feature selection for Li-ion battery end of life prognosis

The complex nature of lithium-ion battery degradation has led to many machine learning based approaches to health forecasting being proposed in literature. However, machine learning can be computationally intensive. Linear approaches are faster but have previously been too inflexible for successful prognosis. For both techniques, the choice and quality of the inputs is a limiting factor of performance. Piecewise-linear models, combined with automated feature selection, offer a fast and flexible alternative without being as computationally intensive as machine learning. Here, a piecewise-linear approach to battery health forecasting was compared to a Gaussian process regression tool and found to perform equally well. The input feature selection process demonstrated the benefit of limiting the correlation between inputs. Further trials found that the piecewise-linear approach was robust to changing input size and availability of training data.


Introduction
Using data-driven approaches for modelling lithium-ion battery health degradation has been the focus of a significant amount of recent literature [1].A bulk of that literature has used machine learning approaches to map to a range of targets, from current state of health (SoH), future SoH or remaining useful life (RUL).Machine learning is a fantastic tool but has significant issues.Training a machine learning tool can be computationally challenging, either by scaling poorly or by requiring extensive data quantities.Storing those models can be a limiting factor for real-world systems and they are hard to use for control purposes [1].Linear models are a simpler alternative [2].Here, we propose a piecewise-linear model for forecasting capacity without compromising model performance relative to a machine learning tool.
Battery degradation is complex due to the range of possible degradation modes and is further complicated when use and cell-to-cell variability must be considered [3][4][5].The flexibility inherent in machine learning approaches produces good results when predicting SoH, RUL and knee points.[1,[6][7][8][9][10][11].When used alongside machine learning, feature selection based on correlations can act as a source of modelling flexibility that produces very good results [12][13][14].
These linear relationships mentioned were found in specific experimental scenarios so cannot be described as universal.But the existence of approximate linear relationships across the range of use cases suggests that there are correlated features to be found which could produce a linear model for lithium-ion battery ageing.The correlation-based selection process which produced good results for machine learning appeared well-suited to finding a requisite set of inputs for a flexible but faster linear model.
However the flexibility of an automated selection step may not be sufficient to accurately map complex degradation profiles.For example, the "knee point" appears in lithium-ion cells undergoing arduous use and manifests as a sudden collapse in health [7,[28][29][30].Such changes in degradation can result in features moving from linearly varying SoH to non-linear relationships [28,31].A linear model, even if flexibly selected, needs to be adaptable to changing degradation rates.
Piecewise-linear models, if used correctly, should be capable of exactly that required adaptability.They have previously been used in state of charge models [3,25,[32][33][34] and SoH models [2,35,36].A common approach is to split models apart according to the stage in a cell life time [3,32,34,36].However cells degrade at different rates, even if identically used [37].Other approaches have split their linear functions according to voltage or state of charge regions [33,38,39].It has even been possible to use separate linear models according to how a cell is being used [40] or to use locally linear regression, where linear models are constructed based on a number of nearest neighbours, to predict capacity loss [41].
Here we proposed to use piecewise-linear regression (PLR) models to forecast battery SoH to end of life.The approach automated the locations of the boundaries between the linear models and the selection of the variable used to split.Then the number of linear models was chosen automatically based on a compromise between complexity and performance.There was a comparison to using a machine learning approach and a number of smaller investigations into how resilient the proposed piecewise-linear approach is to varying modelling conditions.

Data sources
This work used open-source battery cycling and capacity fade data from two datasets [8,42].The datasets were produced for fast-charging experiments and used lithium iron phosphate/graphite 18650 lithium-ion cells, manufactured by A123, all cycled in a temperature chamber set at 30 °C.Capacity estimates were calculated from the repeated 4C discharge cycles.Figure 1: The data used in references [8] and [42].Image adapted from ref. [13].
Cells were cycled to failure, defined for both as 80% of the 1.1 Ah nominal capacity.The first work contained 135 cells, with peak charging currents varying between 3.6C and 8C [8].The follow up work, with a further 45 cells, had a fixed charging window of 10 minutes but different paths to full charge [42].Here, all cells with lifetimes between 15 and 40 days were chosen.After this, the data for 157 cells remained available, shown in Fig. 1.

Data generation and selection
The raw data was reduced from 100s of millions of rows down to thousands by producing input features.In each 12 hour time step the input features were calculated based on cell use.The available variables were current, voltage, temperature, power, absolute current and absolute power.
The input features were all proportions of time spent in given regions of those variables within a specific period in time.The regions were bounded by the behaviour of the full data set.Every variable was split up according to how long was spent at each value so that a cumulative distribution could be produced from which to draw the bounds.
The values for the 33 rd percentile represented the value below which the 157 cells cumulatively spend 33% of their lifetimes.The approach aims to capitalise on past literature where linear relationships have been found between time spent in voltage regions and battery ageing [13,14,[23][24][25].The thresholds for all variables are shown in table 1.
There were 36 potential input features produced for each cell when calculated for all variables over all variable ranges.Another 36 were added by including how those variables change between time intervals and the final two were experimental time and its square root, leaving 74 features.These values were calculated for   each 12 hour interval in a cell's life.The approach will be modelling the changes in capacity ∆Q over the 12 hour time intervals with capacity estimated from the nearest discharge cycle.
The input features which correlated best with the changes in capacity, ∆Q, were selected in order to reduce that number down to an appropriate set of inputs for a data driven model.Pearson's rank ρ P was used to evaluate the degree of correlation.Unlike some methods, no selected features were allowed to share a correlation coefficient more than ρ P,max = 0.85 so that the input features would cover a wider range of variability.
Features were labelled according to the raw variable and the numerical ID of the percentiles which defined the thresholds.For example, the most commonly selected features was V 2,3 , i.e. the proportion of time spent between the 33 rd percentile and the 67 th .
Five input features were used for the main piecewise-linear model here.There is also a small investigation looking at how many features are required, but one who's conclusions are data set specific.

Piecewise-Linear splitting
The splitting process in piecewise-linear modelling searches for the best break points among the training data.
In all cases, the feature selected first was the input with the best correlation with the loss of capacity.Consequently, that feature was used to calculate the piecewise splits.A function of ∆Q = ∆Q(x) was created using a weighted moving average, seen as a black line in Fig. 2a.The second derivative was used to find the points of maximum curvature in Fig. 2b, between which linear models were produced.For clarity, these will be referred to as sub-models from here.
The weights in the moving average were calculated using a squared exponential function of the distance between the target value and the data points.For feature x at value x i , the weight applied to a value of ∆Q at point x j was given by: The function f ∆Q (x) was then calculated at all points x i using all n data points in the training set (equation 2).
However, that moving average was likely to have large values for the second derivative at the extreme values of feature x because there were fewer relevant data points.A data density function, ρ = ρ(x i ), was calculated (equation 3) and multiplied by the second derivative of f ∆Q so that changing gradients in regions with lots of data points were prioritised.The final expression for the break point selection function, f bp , was: The process is detailed in Fig. 2, where the dotted lines in Fig. 2b are multiplied together to produce the final function for selection.Maxima in that composite function matched the significant changes of gradient in the training set or were close by.
Other approaches to piecewise splitting were considered.K-means and Matlab's fminsearch functions were both used as a comparison.K-means was initially used over the full input but that gave too high a weighting to the importance of latterly selected features.K-means was found to be most successful by using just the first two features, at which point the results are extremely similar to that using curvature.Matlab's fminsearch allowed for completely free break points selection across the range of the first selected feature.

Linear Regression
The sub-model for each partition was calculated using Bayesian linear regression.The target variable y was ∆Q in each time step.It was assumed to be a linear function of input X with associated noise, .
Training Bayesian linear regression involves fitting the parameter vector w to create model f (X) = Xw based on the posterior distribution over parameters.The input X has a column for every input and a row for every data point.
The parameters w were assigned a mean zero Gaussian prior and covariance Σ w .
w ∼ N (0, Σ w ) Here, Σ w was assumed to be a diagonal matrix with a constant variance, σ 2 w = 10 2 .This leads to a mean estimate of the parameters w as a function of the input variables X, output target ∆Q and estimates of observation noise σ n and covariance Σ w [43,44].
The predictions of capacity loss ∆Q * are therefore produced by then multiplying the test set input matrix, X * , by the parameter estimates in equation 5.

Piecewise model construction
The number of sub-models in the final piecewise model n m was calculated by a compromise between predictive performance and complexity.The procedure is depicted in table 2. All available model sizes are trained on the training set up to some maximum, taken as 10 in the work here.The selected n m was the minimum model size with an accuracy below the optimal RMSE ∆Q score multiplied by (1 + β improv ).For most models here, β improv = 0.01 was chosen.
The piecewise-linear model produced ∆Q estimates over time steps.Capacity profiles were calculated over full cycle life of each test cell by summing the forecasted transitions and assuming knowledge of initial capacity.

Performance metrics
Three performance metrics were calculated for each forecasted capacity profile.Firstly, the root mean squared error (RMSE) observed and predicted ∆Q provided a measure of performance as a transition model.The capacity profile forecast quality was calculated by using the root mean square error capacity, RMSE Capacity, calculated in % capacity.EoL Error [%] Frequency median = 1.6% 95 th = 6.0%The principle metric used to assess performance was the lifetime accuracy, with end of life (EoL) defined as reaching 80% nominal capacity.The percentage difference between observed EoL, tEoL , and predicted, t EoL , was taken as the error to create the metric PE EoL .
EoL Error = 100% × ( tEoL − t EoL )/ tEoL Many trials (minimum 2140 test cells) were performed in at all test points here so that the quoted form of the above metrics will be the median and 95 th percentiles.

Trial setup
The first trial was a comparison in performance against a Gaussian process regression (GPR) tool.The same features were used as the inputs for the piecewise-linear and GPR models so this trial tests the flexibility of using piecewise-linear against that of a machine learning approach.The results for all three performance metrics were calculated for this test.Here, there were 200 repeats of the trial with each repeat using 50 training cells and 107 test cells.Consequently, there are 21,400 predicted capacity profiles to draw from.
The piecewise-linear and automated selection combined approach was also tested to find end of life prediction performance under varying conditions.Smaller tests of end of life error as a function of each of maximum similarity, input features and number of piecewise models were all performed.Each of these held the other

Results
The distribution of results in Fig. 3 suggested that the piecewise-linear models produced accurate forecasted capacity profiles.The median RMSE ∆Q of 0.17 % capacity for the capacity transitions represents a very good fit for the original model.RMSE Capacity was typically very tight with a median value of 1.1 % capacity.Finally, the predicted lifetimes of the cells were accurate to within 1.6 % capacity of the observed lifetime in half the test cases.
The median performance of the piecewise-linear models was extremely similar to that of the GPR models despite GPR performing better at the lowest percentiles.However the 95 th percentiles were improved by using piecewise-linear modelling relative to GPR in Fig. 4.
The piecewise-linear approach used between 3 and 5 linear models to map the capacity loss based on the cells here in over 75% of cases (79%).In all cases, the input variable used to calculate the thresholds of those linear models was V 2,3 which is the proportion of time spent between 3.12 V and 3.51 V.
Median lifetime predictive performance was unaffected over significant changes in control values in Fig. 5. Any n m greater than 1 appeared equally effective, β improv could be raised to 0.5 without impacting performance and the input feature selection procedure was dependent on 0.6 < ρ P,max < 0.9.The 95 th percentiles of performance varied more as a function of the control variables.Increasing the number of input features reduced the end of life estimation error up to 3 features, from where the improvements were small before weakening as the number reached 10.
The three splitting methods presented in Fig. 6 failed to produce a significant distinction, although calculating using fminsearch was found to be far slower with bigger training sets.All three produced good performance with 20 or more training cells.There was only a small improvement by increasing the training set up to 100 cells worth of data.
The histogram in Fig. 7a shows how there were preferred thresholds for the linear models.The most common was at around V 2,3 ≈ 0.37, which roughly corresponded to the change from slow linear ageing to faster degradation in Fig. 2.

Discussion
The piecewise-linear approach produced tight capacity forecasts and accurate lifetime prediction.For the same sets of inputs piecewise-linear matched GPR for median performance then outperformed GPR for the 95 th percentile of performance.On the other hand, the linear approach appeared less susceptible to smaller patterns in the training data, potentially reducing overfitting.Using over 8 input features produced weaker performance in Fig. 5, suggesting that the model was beginning to be overfit.
The distinction between the two data-driven approaches appeared to be a function of how poor the poor predictions were.Fig. 7c directly compared the results while grouping the results by test cell.The GPR results for the majority of cells were distributed wider than those for piecewise-linear.The same effect was seen at higher percentiles in Fig. 4. Most data-driven approaches struggle to extrapolate, but linear approaches will diverge slower thus reducing the impact of a substandard model.With a typical value of 4 linear models used, the piecewise-linear approach was sufficiently detailed to map the more complex trajectories of these rapidly decaying cells.The uniformly primary feature V 2,3 was used to create the boundaries for the linear models.The distribution of those boundaries in Fig. 7a shows how the region about V 2,3 ≈ 0.37 was most popular.That point corresponds to the end of purely linear ageing which is approximately half way through cell life -over 50% of all data points in the full data set were above that value.
Median performance was consistent across a large range of testing conditions in Fig. 5.It must be acknowledged that the quality of the data set contributed to that success.However the selection process guaranteed that input features were chosen based on their linear correlation with the target variable and thus contributed to a model that would work for a majority of cells.
Successful prognosis for atypical cells required more generous supplies of input data, as demonstrated by the improving 95 th percentiles as more cells, inputs, models and input distinction were introduced.There was a distinct improvement when the maximum correlation between inputs was reduced below 0.90, thereby giving more flexibility to the subsequent linear model.That improvement suggested that the maximum shared correlation constraint in the feature selection process was increasing performance.
According to Fig. 5, 3 input features and 2 sub-models were required for successful modelling of the majority of the cells in the data set used here.All three piecewise splitting techniques produced good performance from as few as 20 training cells, despite those training cells needing to represent a reasonable range of lifetimes.This efficacy was also a function of the quality of the data set, but it suggested that the piecewise-linear model was capable of good performance even without significant amounts of input data.Detailed evaluation of the performance of the credible intervals produced by Bayesian linear regression is beyond the scope of this work.However the intervals appeared flexible to changing degradation rates, with faster decaying health being associated with increased uncertainty in Fig. 8.Each grey line represents the values for a given item in the posterior covariance matrix, equal to A −1 , within each sub-model.The flexibility afforded by using a piecewise model allowed the posterior covariances to reflect changing uncertainties in addition to the changing degradation rates.

Conclusion
A combined feature selection and piecewise-linear approach to capacity forecasting was detailed and tested.Under the testing conditions used here, the combined approach produced median RMSE capacity of 1.1% and median lifetime error of 1.6%.The piecewise-linear model performed comparably to a Gaussian process regression model when given the same inputs for typical cells, while outperforming the machine learning method at the 95 th percentile.The whole approach was robust to reduced input and training set sizes and to limitations being imposed on the piecewise-linear model construction.The feature selection step was shown to improve performance by avoiding input features that correlated too well with each other.
The ability of piecewise-linear models to adapt to wider ranges of use remains uncertain.A user must still be careful to have reliable and appropriate training data.Similarly, uncertainty estimates appeared to be capable of tackling the varied distributions over battery lifetime, but must validated before confident use in the real world.
The work here combines easily understood inputs with simple mechanisms to construct a flexible degradation model that rapidly computes and is easy to store.The model produces accurate capacity forecasts that hold up to and after the collapse of health in later life.

Appendices A Alternative Splitting Techniques
The two alternative approaches to finding the break points were using the fminsearch Matlab function and using K-means.
The method using fminsearch Matlab function represented completely free selection of the break point position.The objective function was RMSE ∆Q across the entire training set, and a limit was put in such that breakpoints must be fit in size order.
K-means was performed using the Matlab function kmeans.A small trial found that performance was best when using K-means with the first two selected input features, instead of the full training set.

B Gaussian Process Regression
Gaussian process regression (GPR) is a non-parametric, probabilistic approach to regression.It has been used in health and lifetime prediction previously [1,6,[11][12][13].GPR was used as a comparison to the piecewise-linear model by performing the same mapping between the five automatically selected inputs and the changes in capacity, ∆Q.The rest of the model is identical to the piecewise-linear approach.
The choice of kernel function was a radial basis function (a.k.a.squared exponential), a commonly used stationary covariance function (equation 7) [43].Automatic relevance determination allows for a different length-scale hyperparameter, σ l , for each input (equation 6).
The GPR model was the same as the one used in ref. [13], but with 50 training cells and the radial basis function kernel.

Figure 2 :
Figure 2: Example calculation of three break points using the best correlating feature, V 2,3 .

Figure 3 :
Figure 3: Full results for the piecewise linear modelling.

Figure 4 :
Figure 4: Comparison between piecewise modelling and GPR for capacity forecasting.

Figure 5 :
Figure5: Median and 95 th of the end of life predictive performance of the piecewise-linear approach against the maximum correlation among input features, sub-model performance threshold, the number of input features and the maximum number of sub-models.

Figure 6 :
Figure 6: Median and 95 th of the performance of the piecewise-linear approach against the number of training cells.Results shown for three splitting techniques: curvature (purple), K-means (yellow) and fminsearch (green).

3 0
Histogram of selected breakpoints in the large trial in Fig. 3. Number of sub-models used in the large trial in Fig. EoL errors for GPR and PLR, sorted by test cell number.

Figure 7 :
Figure 7: Analysis of the piecewise-linear approach and its results

Figure 8 :
Figure 8: Plot of how posterior covariance elements vary according the piecewise sub-model number.The values are normalised relative to the maximum value found for that item.For comparison, typical degradation rate is also plotted, also normalised relative to the maximum value.

Table 1 :
Variable bounds used to generate features.

Table 2 :
Piecewise model selection by selecting the smallest n m within β improv of the peak performance.