SOH Estimation and SOC Recalibration of Lithium-Ion Battery with Incremental Capacity Analysis & Cubic Smoothing Spline

Conventional state of health (SOH) estimation often requires capacity measurement from battery ’ s full charge or discharge pro ﬁ le between fully charged state and cut-off state. Incremental capacity analysis can improve estimation ef ﬁ ciency by extracting features to estimate SOH or recalibrate state of charge estimation without using full pro ﬁ le. While direct numerical derivatives often do not show smooth result due to measurement noise, this paper utilizes robust cubic smoothing spline method on producing incremental capacity curve, which is superior over typical ﬁ lters that require tuning on window size usually by trial&error because smoothing parameters in the proposed method can be determined by cross validation. Comparison through simulated data shows that the proposed method maintains good ﬁ delity on data and feature of interest with low RMSE values under derivative form. This paper also proposes a peak height ratio feature for SOH estimation. While a linear relationship is noted between SOH and peak height ratio, estimation of SOH from peak height ratio is demonstrated using linear regression. A more generalized version of SOH estimation method is also demonstrated using multiple linear regression with covariates of both peak height ratio and the height of peak associated with “ last phase-transition of Li ions intercalation during charging. ”

With rapid advancement on electronic systems, such as electric vehicles and energy storage system for intermittent renewable energy, the performance of the rechargeable batteries is of great concern to users. Monitoring of the performance, state of health (SOH) estimation and remaining useful life prediction are the hot research topics among academia and battery industry.
Prior to estimating the present SOH or predicting the remaining useful life of a battery cell, for many methods, especially the datadriven methods, 1-4 a health indicator or feature at current state has to be measured. One simplest way is to determine the battery capacity by following the capacity definition: discharge the battery with nominal current at nominal temperature from a fully charged state until the battery's cut-off voltage is reached, and take SOH as a ratio between the actual battery capacity at nominal conditions (nominal temperature and nominal discharge current) and the battery's nominal capacity. 5 To improve the efficiency of obtaining SOH, according to Waag et al., 6 some studies develop methods based on measurement of the absolute voltage or voltage drop when a certain load or pulse is applied, 7,8 some studies utilize the measured impedance parameters and/or other measured or calculated battery characteristics to classify the SOH in a fuzzy logic manner, such as rather new, slightly aged, or at the end of life. 9,10 Besides, there are studies inferring capacity or SOH from impedance. For example, Galeotti et al. 11 built a diagnostic map upon the relationship between ohmic resistance, state of charge (SOC) and capacity, and then determined the SOH with reference to such map. Hung et al. 12 proposed a projection method making use of the relationship between change in SOC and dynamic impedance (changes in voltage over changes in current) to estimate SOH. In addition to that, since the typical capacity measurement requires full charge or discharge process, incremental capacity analysis (ICA) is a useful method to improve the efficiency in estimating the SOH in the sense that it allows SOH estimation without full charge or discharge profile. Besides SOH estimation, it can also be helpful in SOC estimation where the maximum available capacity can be updated from time to time with ICA using specific parts of the charge or discharge profile.
As pointed out by the study of Ng et al., 13 there are losses in releasable charge during charging and discharging, and the declination of the releasable capacity should be considered for more precise SOC estimation.
The idea of ICA is illustrated as follows, given a charge or discharge process, the relationship between capacity (Q) and voltage (V) is shown in Fig. 1a. ICA involves taking the derivative of capacity (Q) with respect to voltage (V), i.e., D D Q V, as shown in Fig. 1b. As observed in Fig. 1b, the derivative, also defined as incremental capacity (IC) curve, reveals some characteristic peaks. Those IC peaks at different aging states have unique shapes, amplitudes and positions, which is the key idea for indicating the SOH through the change of the peaks. 14 Many studies have demonstrated that the IC peaks show strong correlation with battery degradation mechanism and battery capacity. [15][16][17][18][19][20] By correlating the characteristic peaks and SOH, we can estimate the SOH and remaining useful life with partial charging or discharging data as long as the peaks in IC curves are present in the partial charging or discharging data. This kind of application motivates the development of ICA for estimating the SOH.
When utilizing ICA to estimate the SOH, a general approach is to identify the correlation between the SOH and certain features arising from the ICA, such as the amplitude of peak(s), area of peak(s) and peak shifts in terms of voltage etc. Li et al. 21 used a series of points in the IC curve within a specific voltage range as a sequence of health indexes and estimated the SOH by means of the grey relational degree, which represents the correlation on geometric proximity, between reference and the particular sequence of interest. Li et al. 22 identified several points in the IC curve as feature points according to their correlation with capacity and modeled the SOH with those feature points by Gaussian process regression. Li et al. 23 used the voltage positions of two peaks and one valley in the IC curve to estimate the SOH since they realized each of those voltage values change linearly with SOH. However, the above-mentioned methods rely on voltage value to identify the feature points of interest, thus they require accurate and consistent voltage values. They may not be reliable when there is voltage shift due to external contact resistance change, e.g., the loosening of contacts. Weng et al. 16 identified the "second" peak in the IC curve as a robust z E-mail: cplin3-c@my.cityu.edu.hk signature for battery degradation and used the empirical correlation as a battery degradation model for capacity estimation, but they did not explain explicitly the algorithm in identifying the peak of interest automatically. Riviere et al. 17 used the "IC peak 3" area to estimate the SOH where the "peak 3" area shows linear relationship with SOH. Their algorithm identifies the "IC peak 3" by scanning the IC curve from high voltage side to low voltage side and looks for the first IC slope change, however their algorithm cannot handle the situation when "IC peak 3" completely fades, in which all the peaks will be mis-identified. Instead of using voltage position, Zheng et al. 24 proposed a SOC-based IC curves and estimated the capacity by feature points in the sense that those feature points indicate a fixed SOC, however, their SOH estimation method has a limitation that feature points at SOC 12% and SOC 92% have to be identified, meaning that the charging or discharging profile within such wide range will be required. Tang et al. 18 used regional capacity, which is equivalent to the area of peak with self-specified voltage range in IC curve, to estimate the SOH while the regional capacity and SOH show linear relationship. The challenge in their method is related to the choosing of voltage range as it has been shown to have significant effect on the SOH estimation. Different types of battery or even same type of battery with different brands may require different voltage ranges in order to achieve accurate SOH estimation.
Note that as reported from much literature, 16,17,23 the peaks in IC curve can change in terms of amplitude and shift over voltage under battery aging and degradation. Such peak shift over voltage can be due to the internal resistance change, which can be used as an indicator for quantifying battery degradation. However, it can also be due to the resistance change of external components, for example, the loosening of contacts connected to measurement devices, which is irrelevant to the degradation of battery but will interfere the measurement and our interpretation. When this issue happens, the method relying on accurate and consistent voltage values may not work perfectly. 21-23 Therefore, we aim at developing a feature and a feature identification algorithm which are robust against peak shift over voltage to avoid misleading information when the abovementioned issue happens.
Besides, an important step before applying features from ICA to estimate the SOH is the construction of IC curve. Due to measurement error or noise, direct numerical derivatives often do not show accurate and smooth result. The capacity vs voltage curves have to be pre-processed in order to obtain desirable IC curves with characteristic peaks, with methods such as voltage window method, 21 Kernel filters (such as Gaussian filter), 21-23 moving average filter, 21 Butterworth filter, 17 Kalman filter, 18 support vector regression with Gaussian kernel, 16 peak fitting function such as Lorentzian function 25 etc. Of course, every method has its own superiorities and limitations. For example, for voltage window method and moving average filter, it is simple, easy to implement 21 and fast. However, a limitation for most of the filtering methods is that parameter tuning is required, such as window sizes in voltage window method & moving average filter, cut-off frequency in Butterworth filter and noise covariance matrix in Kalman filter, where subjective judgement is usually needed. In fact, the tuning of filter parameters is non-trivial, where over-filtering will cause much loss of information and under-filtering will produce noisy ICA curves which is hard to be interpreted. For support vector regression, the optimal parameter can be identified by grid searching method with cross validation, however, it is computationally costly 26 and multiple parameters have to be dealt with after selecting a particular kernel function in order to produce desirable fitting, including the cost, epsilon and kernel function parameters such as the standard deviation in Gaussian kernel. For peak fitting functions, the advantage is that it aims at modeling each peak on the IC curve explicitly and therefore each peak can be analyzed independently, 25 however, it may not be able to accurately capture the intermediate section between consecutive peaks, which can be informative for modeling capacity. Also, peak fitting functions such as Lorentzian function and Gaussian function assume the IC peaks to be symmetric, which may not be accurate enough.
In this paper, to overcome the two issues mentioned above (construction of IC curve and identification of features in IC curve), a robust cubic smoothing spline method is proposed for obtaining IC curves and a peak height ratio is proposed as a feature to estimate SOH. While the capacity vs voltage data is fitted with the proposed spline method, owing to its nature, the derivative form of the spline fit is readily achievable and the first derivative is equivalent to dQ/dV, the IC curve. The main advantage is that the smoothing parameter can be obtained using cross validation method, which is less subjective than the filtering method where trial and error is usually adopted to tune the corresponding smoothing parameter. A comparison under simulated data shows that the proposed spline method can maintain better fitting of data under first derivative (i.e., IC curve) than the other filtering methods. After obtaining the IC curve, the peak height ratio feature is extracted and used for estimating the SOH. With a linear relationship being noted between the peak height ratio and the SOH, robust linear regression is demonstrated to estimate the SOH from peak height ratio. This feature can also be used for updating the maximum available capacity at a cycle to recalibrate the SOC estimation. A more generalized method is also demonstrated considering features of both the peak height ratio and the peak height associated with "last phase-transition of Li ions intercalation during charging" to estimate the SOH with robust multiple linear regression.
The remaining contents are organized as follows, we will describe the battery cycle aging data being used in this paper, and present the methodologies used for obtaining the IC curve and extracting the feature of interest for SOH estimation. We will also present a comparison between the proposed method and several filtering methods in terms of the fidelity when acquiring IC curves, followed by a section demonstrating the application on SOH estimation and SOC recalibration from the proposed feature, and a discussion section on some considerations when applying the proposed method and future research opportunity.

Experimental Data
The data in this paper comes from the cycle aging test in our laboratory. Three batteries manufactured from A123 Systems (battery model: A123-18650, nominal capacity: 1.1 Ah, cathode: lithium iron phosphate (LFP), anode: graphite) were tested with Arbin-BT2000 battery tester and were put under "C1D1" cycle aging test at 45°C inside Votsch VC 3 7100 temperature chamber. Figure 2 shows the voltage and current change in one cycle. Under C1D1 regime, during charge process, the cells were firstly charged at a constant current of 1.1 A and then they were charged at a constant voltage of 3.6 V. The cut-off current for charging is 0.055 A. During discharge process, the cells were discharged at a constant current of 1.1 A. The cut-off voltage for discharging is 2 V. For the sake of presentation, we define cell 1, 2, 3 as the three LFP battery cells respectively. In this paper we define SOH as the normalized maximum available discharge capacity under "C1D1" regime, i.e., = SOH maximum available discharge capacity at current cycle nominal discharge capacity Note that in calculating SOH we consider both constant current and constant voltage charging process, but in the analysis of incremental capacity (dQ/dV) we consider the constant current charging process only as the constant voltage charging process does not contain useful information on incremental capacity.

Methodologies
This section presents the methodologies for obtaining the IC curve from the capacity vs voltage data and extracting the feature of interest (peak height ratio) for SOH estimation.
Robust cubic smoothing spline method.-In order to obtain the IC curve, we propose a robust cubic smoothing spline method for modeling the capacity vs voltage data and take the first derivative afterwards. While filtering methods are the widely used methods for constructing IC curve, 17,21-23 the main advantage of the proposed method is that the smoothing parameter can be determined by cross validation method, instead of the subjective trial and error tuning adopted in filtering methods.
The proposed method is described as follows, a (k + 1)-th order spline function f is equivalently a piecewise polynomial function of degree k. It is continuous and has continuous derivatives of orders 1, …, k − 1, at its knot points, That is, f has the following properties, 27 • f is a polynomial of degree k (or order k + 1) on each of the intervals -¥ ¼ ¥ ) where "degree" denotes the highest power defining the polynomial and "order" denotes the number of coefficients defining the polynomial, within each interval, • f (q) , the qth derivative of f , is continuous at can be represented by the truncated power series basis functions, g , i (·) where = ¼ + + i m k 1, 2, , 1.
{ } Given a set of observations {x y , j j }, j = 1, …, n, to regress Y on X with splines, Under the above general spline framework, cubic smoothing spline is a spline of degree 3 (i.e., k = 3). Given a set of observations {x y , j j }, j = 1,…, n, all observation points ( ) are being used as knots (i.e., ; ; m n n 1 1 2 2 ) in smoothing spline. In addition to the above properties of general spline function, cubic smoothing spline function has the following additional boundary constraints, 28 • f is a polynomial of degree k = 3 on each of the intervals ¼ • f is a polynomial of degree (k − 1)/2 = 1 on (-¥ p , 1 ] and ¥ p , With those boundary conditions, the cubic smoothing spline can be represented by a variant of truncated power series basis functions, 28 Given a set of observations {x y , j j }, j = 1,…, n, to regress Y on X with cubic smoothing splines, where q i is the ith coefficient. Note that the above representation is one choice of basis for cubic smoothing splines with an advantage of easy interpretation, and there are other options which can be more computationally efficient, such as B-spline basis. 29 The estimation of cubic smoothing spline function can be obtained by solving the below minimization problem of trading off between fidelity to the data and roughness of the function estimate, where f x ( ) is the estimated cubic smoothing spline fit,  f x ( ) is the second derivative of the estimated fit, and λ is the smoothing parameter where the roughness penalty here is based on second derivative. The smoothing parameter can be determined using generalized cross validation or leave-one-out cross validation. Note that L1-norm is adopted instead of L2-norm so as to increase the robustness of the smoothing spline estimate in consideration of the high noise level in data which arises from data measurement. From statistical point of view, the methods based on L1-norm regularization are more robust to outliers than the methods based on L2-norm regularization. 30 A general concept is that the L2 norm involves y f x j j 2 (ˆ( )) and the L1 norm involves y f x , j j |ˆ( )| so the cost of outliers in L2 norm is larger than that in L1 norm. The above minimization problem to obtain the cubic smoothing spline (also known as L1 smoothing spline) can be solved by the iterative splitting method suggested in the study of Rytgaard. 31 In our application we model the capacity vs voltage curve with the proposed spline method and hence x j is the jth voltage value, y j is the jth charge or discharge capacity value.
Feature of interest.-As shown in the example IC curves shown in Fig. 1b, there are typically several peaks. According to Dahn, 32 Groot 33 and Riviere et al., 17 each peak in the IC curve can be linked to the electrochemistry inside battery as it corresponds to a cohabitation of two stages of Li ions intercalation in graphite electrode, where the leftmost peak in IC curve contains information about the first phase-transition of "Li ions intercalation during charging," i.e., "stage 1 (dilute) & stage 4" (available online at stacks.iop.org/JES/167/090537/mmedia) as described in studies, 17,32,33 and the rightmost peak contains information about the last phasetransition, i.e., "stage 2 & stage 1". 17,32,33 "dilute" is a sub-stage of graphite where Li ions occupy randomly under in-plane ordering. Detailed description on different graphite stages and sub-stages during Li ions intercalation can be found in the studies of Dahn 32 and Groot. 33 Since the IC curves contain information on the Li ions intercalation process, it changes along with battery aging and degradation. We can observe how the IC curve changes with cycle number and extract feature for estimating the SOH. Two general observations can be noted when comparing the IC curves of the same battery cell at different SOHs in our experimental data. (1) As SOH decreases, only the rightmost peak situated at high voltage side, which is associated with "last phase-transition of Li ions intercalation during charging," fades significantly. According to Riviere et al., 17 this can be attributed to the loss of lithium inventory inside lithium-ion battery cell. (2) As SOH decreases, the whole curve shifts to a different voltage range. Shifting to lower voltage range is expected typically because of the increase in internal resistance of battery cell. However, there can be other factors affecting the voltage range, e.g., loosening of the contacts connected to measurement devices, which can change the resistance of the electric circuit and hence can affect the measured voltage range of the battery. In this sense, the voltage shift may not be reliable for estimating the SOH. Therefore, we focus on the peak value changes and abandon the information of voltage shift. The idea is illustrated in Fig. 1c, where we align the IC curves together based on the absolute maximum and focus on the peak value changes at different SOHs. In this paper, we propose to use the peak height ratio (h h 3 1 ) as the feature to estimate SOH (see Fig. 3a).
Similar features proposed by other literature include the rightmost peak height value (i.e., the peak associated with "last phasetransition of Li ions intercalation during charging") 34 and the area of the rightmost peak. 35 The reason for considering the peak height ratio instead is that the peak height ratio actually makes use of the relationship between the height of the rightmost peak, the height of the second rightmost peak and the valley in between the two peaks. Essentially the peak height ratio contains more information than the peak height value alone.
Feature extraction.- Figure 4 illustrates the flowchart to extract the peak height ratio feature: Figure 3. Illustration of the idea of (a) peak height ratio; (b) P 3 peak area and P 3 peak height.
(a) Constructing ICA plot by fitting cubic smoothing spline on raw capacity vs voltage data Fit the capacity vs voltage data with cubic smoothing spline (Eqs. 2 and 3). Due to its piecewise polynomial nature, the derivative forms of the cubic smoothing spline fit are readily achievable, where the first derivative corresponds to IC curve. (b) Identifying the reference point (P 1 ) for aligning the IC curves across different voltage ranges By means of calculus, all the peaks in the IC curve (first derivative form of the fit) can be identified with the help of second derivative and then the reference point, usually being the absolute maximum point (defined as P 1, shown in Fig. 3a), can be identified as well. As mentioned earlier, each peak in the IC curve corresponds to a cohabitation of two stages of Li ions intercalation in graphite electrode. 17,32,33 Thus, the reference point should consistently correspond to a particular cohabitation of stages (In our case, the reference point is the coexistence region of "stage 2 (liquid-like, a sub-stage of graphite where Li ions occupy randomly with no in-plane ordering) & stage 2" during Li ions intercalation in the context of studies from Groot 33 and Dahn, 32 which usually corresponds to the absolute maximum in a IC curve). In some special cases, the solely absolute maximum rule may mis-identify the desired reference point. For example, in Fig. 5, the peak associated with "stage 2 (liquidlike) & stage 2" is not the maximum, so it will be mis-identified as P 2 . To deal with such cases and make our algorithm more robust, we impose further requirement to ensure that the desired reference point is correctly identified. After the absolute maximum (P 1 ) is identified, we search for the second maximum peak (defined as P 2 ) at a lower voltage range than P 1 , provided that the voltage separation between P 1 and P 2 is greater than a threshold v 0 : Note that the minimum voltage separation threshold is included so as to avoid the peaks adjacent to P 1 resulted from measurement error or noise to be is close to 1, i.e.,   P 1, where P 0 is a threshold close to 1, then we set P 1 = P 2 , because in such case P 2 is more likely the desired reference point and this is the situation when the rightmost peak is the absolute maximum, as illustrated in Fig. 5. (c) Searching for "P 3 " peak At the identified reference point (P 1 ), we search if there are peaks at a higher voltage than P 1 . While there can be several peaks at the higher voltage side, we identify the highest peak (defined as P 3, shown in Fig. 3a) among such region. To avoid interference from measurement error or noise, the peak identified should possess a minimum voltage separation v 0 to the reference point (P 1 ): Identifying local minimum between P 1 and P 3

as baseline
After getting the two peaks, P 1 and P 3 , the local minimum point in between is identified. And using such minimum point as baseline, the height of the two peaks are measured. We define the height of P 1 from the baseline as "h 1 ," the height of P 3 from the baseline as "h 3 ." (e) Obtaining ratio of peak heights to characterize the degradation status The ratio of the two heights is extracted as a metric to represent the degradation status of a battery cell at particular SOH.
= h h peak height ratio 3 1 In our analysis, we adopt v 0 as 0.02 V and P 0 as 0.9 empirically from our data. Theoretically, v 0 can be deduced precisely according to the potential difference on corresponding phasetransition in graphite electrode. However, owing to measurement error, it is safer to make reference empirically from experimental or historical data. The general idea is to set a v 0 value so that any zigzag of P 1 will be ruled out as P 2 or P 3 . For the threshold P 0 , it is also recommended to be set based on experimental or historical results as the case in Fig. 5 is likely associated with measurement noise. In general, P 0 should be close to 1, otherwise, for example, P 2 in Fig. 3a would have been incorrectly set as P 1 .

Results-Comparison on IC Curve Fitting between the Proposed Method and Filtering Methods with Simulated Data
One challenge associated with ICA is obtaining a genuine IC curve. Taking direct numerical derivative usually does not work mainly due to the measurement error or noise of the capacity and voltage data. Smoothing the data is usually needed and a concern of losing fidelity arises. This section shows the comparison of data fidelity with simulated data between the proposed spline method and the filtering methods which are commonly used in literature. [21][22][23] In this section we simulate 1000 curves approximating capacity vs voltage curve from Gaussian kernel regression. The flow chart for generating the simulated data is shown in Fig. 6. First of all we pick a cycle of real capacity vs voltage data and fit it with some Gaussian kernel functions. And then we introduce variations on the kernel parameters and generate 1000 simulated capacity vs voltage curves. The ground truth of the corresponding IC curves can be obtained by taking the first derivative of Gaussian kernel functions mathematically. The overview of the ground truth IC curves from the simulated data is shown in Fig. 7. Before comparing the ground truth with the IC curves obtained by other methods, we introduce an addictive Gaussian noise to the simulated capacity vs voltage curves (u x ( )) for mimicking the measurement error. Here we consider two different noise scenarios and the simulated curve becomes  36 We compare the proposed method with several conventional filtering methods in terms of the extracted peak height ratio and the model fit of IC curve. We compare the cubic smoothing spline  method with three filtering methods under different window sizes: kernel filtering method (linear least square filter with Tukey's biweight and linear least square filter with exponential weight) and moving average filter with interpolation. Interpolation is required in moving average filter as the voltage (x-axis) is not equally spaced. In linear least square filters, the filtering is achieved by linear regression of capacity on voltage along with different weightings (Tukey's biweight and exponential weight) within a specified window. The Tukey's biweight is given as and the exponential weight is given as where x is the voltage value (x-axis) within the moving window and x j is the jth voltage value which is under filtering. Figures 8a and 8b show the sketches of the general shape of weight within the moving window given by the Tukey's biweight and the exponential weight respectively. Table I shows the comparison results of the root mean square error (RMSE) of the peak height ratio respectively, under two different levels of noise introduced to the simulated curves and three different window sizes for the filtering methods. The errors are calculated from the difference between ground truth peak height ratio and the peak height ratio obtained with different smoothing methods on each of the 1000 simulated capacity vs voltage curves. The minimum values in low and high noise level scenarios are shown in blue with bold text. It can be observed that the spline method with cross validation performs the best in low noise level scenario, while the linear least square filter method with Tukey's biweight performs the best in high noise level scenario. Although the spline method does not perform the best in the high noise level scenario, its RMSE value is comparable to the best value, being the second-best value among other methods and different window sizes. Table II shows the comparison results on the discrepancy between the ground truth IC curve and the IC curves obtained from different methods in terms of RMSE, where errors are calculated from the value difference at each voltage point between ground truth and IC curve obtained from different methods within 1000 simulated curves. There are different RMSE values for spline method under different window sizes as the number of observations for comparison are different. This is because the values at the start and at the end are abandoned when the number of observation points around is less than the corresponding window size. From the result of Table II, the spline method outperforms the other methods at all three different window sizes and in both low and high noise level scenarios, in terms of the fitting of the whole IC curve.
From the above comparison, it should be noted that different window sizes can produce different results. At particular scenario and with particular window size, the linear least square filter with Tukey's biweight can obtain the most accurate peak height ratio. However, in reality it is difficult to identify a perfect window size without knowing the ground truth. In contrast, the spline method does not have to be bothered by the window size choosing issue and the smoothing parameter can be determined by cross validation method such as generalized cross validation or leave-one-out cross validation. And the above results also show that in all the cases, the spline method shows either the best performance or the performance comparable to the best without big difference. Nevertheless, in high noise level scenario, subjective intervention on the smoothing parameter may be needed as cross validation cannot well handle high noise level cases.

Results-Proposed Feature and Its Potential Application
We apply the proposed method to obtain the peak height ratio feature which can be used to estimate the SOH, the degradation level or to recalibrate the SOC. Figure 9a shows the relationship between peak height ratio and the SOH for the three LFP battery cells (cell 1, 2 and 3). It can be observed that in general the peak height ratio and the SOH possess a linear relationship. To demonstrate one potential application of the proposed feature on estimating SOH, we make a strong assumption on the linear relationship between SOH and peak height ratio and apply linear regression. In consideration of the outlier problem, we utilize a robust linear regression with bisquare method 37 (i.e., regression with differently weighted observations based on their tendency of being outlier) and the prediction interval is obtained with bootstrap method 38 (i.e., obtain the prediction distribution by prediction error resampling) based on 1000 replications in our case. Figures 9b-9d show the estimation of SOH from the peak height ratio of cell 1, 2, 3 respectively with cross validation arrangement, where the SOHs of a cell are estimated from the linear model trained by all other cells. For example, Fig. 9b shows the results for cell 1 while using cell 2 and 3 as training. The black dots indicate the true relationship between SOH and peak height ratio. The blue solid line shows the prediction of SOH from peak height ratio with linear regression trained by all other cells. The blue dash lines show the 90% confidence interval of the prediction.
For comparison, we also study the relationship between SOH and the peak height & peak area associated with "last phase-transition of Li ions intercalation during charging" (i.e., P 3 ), similar to that performed in the literature. 34,35 Figure 3b shows the illustration of extracting P 3 peak area and P 3 peak height from the IC curve. Figure 10a shows the relationship between P 3 peak area and SOH while Fig. 10b shows the relationship between P 3 peak height and SOH. Both of these figures show a general linear relationship. Thus, we compare the RMSE of the SOH prediction with linear regression using different features. Table III shows the RMSE of the SOH prediction using peak height ratio, P 3 peak height value and P 3 peak area for different battery cells, in which the other two cells were used as training. From the result, the SOH prediction with peak height ratio works the best, producing the least RMSE values among the three features. Observing from Fig. 10a, the relatively poor  Figure 9. (a) Relationship between the SOH and the peak height ratio; Estimation of SOH from peak height ratio for (b) cell 1, (c) cell 2 and (d) cell 3.
prediction performance from the P 3 peak area is likely due to the noisy and inconsistent relationship with SOH from cell 3. For P 3 peak height, the poor prediction performance is likely due to the discrepancy in slopes among the three cells as observed in Fig. 10b.
Here we assume the features and the SOH possess a linear relationship. The prediction using the above-mentioned linear regression will be relying on the consistency on the linear relationship between the features and the SOH. We then take a closer look at what can cause a deviation from the linear relationship and affect the prediction result.
Here we take a look on two types of "outlier." Figure 11a shows the SOH vs peak height ratio plot for cell 3 with highlights on some points, where we are going to take a look on the corresponding IC curves of such cycles. Firstly, one type of outlier comes from spline modeling. For example, we investigate four cycles, where three cycles (cycle 509, 519 and 520) are falling into the major trend and one cycle (cycle 515) is falling out of the major trend as shown in Fig. 11a. Note that cycle 515 is regarded as "outlier" because the point will go back to the major trend in later cycles (e.g., cycle 519, Figure 10. Relationship between (a) SOH and the P 3 peak area, (b) SOH and the P 3 peak height.   Fig. 11b which shows the IC curves of cycle 509, 515, 519 and 520 in cell 3, it can be noted that the peak height ratio in cycle 515 deviates much from adjacent cycles because the peak of interest P 1 increases sharply along with a sharp tip compared with the adjacent cycles, probably due to the measurement error or noise. Another type of outlier comes from abnormal shift in battery voltage. For example, Fig. 11c shows the IC curves of cycle 2614, 2615, 2616 and 2733 in cell 3. As shown in Fig. 11c, the IC curves of cycle 2614, 2615 and 2616 (the outliers) do not show our peak of interest P 3 , however this is not because of the fading of P 3 , as P 3 reappears in a later cycle (cycle 2733). We suspect that the observation is due to the increase in internal resistance or external resistance, causing the whole charge profile to shift towards a higher voltage range, so the constant current charging process prematurely ends when reaching the cut-off voltage of 3.6 V. Hence, P 3 disappears because P 3 does not take place in the battery cell before the cut-off voltage at such case. Thus, this kind of phenomenon will definitely affect the SOH estimation since our peak of interest P 3 has not appeared in the data. Note that the capacity or SOH does not drop significantly while P 3 is not taking place because there is constant voltage charging process after the constant current charging.

520). From
The above two phenomena indicate the limitation of our proposed method. Since the cubic smoothing spline method utilizes cross validation to determine the smoothness of the model fit, it can maintain higher fidelity of the data as illustrated in the comparison on simulated data as shown above, but at the same time this means the method is more susceptible to measurement error or noise. This is also consistent with the comparison result as shown above that the least square filter with Tukey's biweight outperforms the proposed spline method under high noise level scenario. Also, when internal or external resistance increases, the IC curve shifts towards a higher voltage range and the charge process can prematurely end at cut-off voltage without showing the characteristic peak(s) of interest, then a crucial section of data can be lost and the proposed method will not be able to estimate the SOH at this case.
Some general comments here are that when the measurement error or noise is low (e.g., cell 1&2), the cross validation method is able to obtain appropriate model smoothness for extracting features and estimating SOH. But when the measurement error or noise is high (e.g., cell 3), the cross validation method may not be reliable. One way to deal with such high noise level scenario (where the feature values fluctuate much) can be specifying a lower bound on the smoothing parameter λ in Eq. 3 in order to restrict the curvature of the model fit. A reference lower bound can be determined by taking the mean of the smoothing parameters obtained in training data. Besides, when charge process prematurely ends without showing characteristic peak(s) of interest (which can be noticed by observing the individual IC curve explicitly), users may consider conducting maintenance on the applications to resolve the possible external resistance increase. Also, users can consider increasing the cut-off voltage so as to reduce the chance of not showing peak(s) of interest in a charge process, which can be safely applicable especially in LFP battery since it has a wide overcharge tolerance and overcharging up to 4.2 V is safe. 39 However, users have to be cautious that other battery types, such as lithium cobalt oxide battery, can have much narrower tolerance on overcharging. In addition, one may consider the discharge data for estimating the SOH, provided that the discharge has to be at constant and appropriately low rate (to avoid characteristic peaks being masked by charge-transfer related overpotential 6 ). Of course, the model training will have to be done all over again with discharge data for consistency. Figure 12 shows the relationship between SOH and the peak height ratio with discharge data and the linear relationship can still be noted, indicating the feasibility of applying the proposed method with the discharge data.
In addition, according to Riviere et al. 17 and Dubarry et al., 40 the fading pattern of the peaks in IC curve is related to the dominant degradation mechanism. Among the two major degradation mechanisms related to LFP battery, 40,41 loss of active material and loss of lithium inventory: when the loss of active material is the dominant degradation mechanism, all the peaks in IC curve will fade proportionally; when the loss of lithium inventory is dominant, the peak (P 3 ) associated with "last phase-transition of Li ions intercalation during charging" will fade much more significantly than the others. Thus, when the loss of lithium inventory is the dominant degradation mechanism, the fading ratio of peaks in IC curve will change significantly and the method with peak height ratio alone will work well, which is the case of our LFP battery as well as many commercial LFP cells where the loss of lithium inventory is reported to be the dominant degradation mechanism. [42][43][44] However, it is also possible that for some LFP battery cells the dominant degradation mechanism is the loss of active material. In such case, the peak height ratio change will be less significant and the method with peak height ratio alone may not work well. Therefore, in the following paragraphs we extend the originally proposed SOH estimation method to a multiple linear regression using both the peak height ratio and the P 3 peak height as covariates, where the P 3 peak height Figure 12. Relationship between the SOH and the peak height ratio with discharge data. Figure 13. An illustration of estimating SOH from both peak height ratio and P 3 peak height covariates (cell 2). covariate in the extended method can complement the limitation of peak height ratio when peak height ratio change is not significant.
An example of applying the extended model is shown in a 3D plot in Fig. 13, where the black dots are the true relationship between SOH and peak height ratio & P 3 peak height, the blue plane shows the prediction of SOH from a robust multiple linear regression using bisquare method 37 with peak height ratio and P 3 peak height covariates. The confidence interval plane for prediction can again be obtained through bootstrap method, 38 although it is not demonstrated in Fig. 13. Table IV shows the RMSE values of the prediction between peak height ratio alone and peak height ratio plus P 3 peak height. Comparing to the originally proposed method with peak height ratio alone, the prediction performance with the extended method is enhanced in cell 1 and cell 3 but the prediction performance is worse in cell 2, which can be due to the relatively poor linearity between SOH and P 3 height in cell 2 as shown in Fig. 10b. Incorporating additional covariate introduces extra uncertainty and does not always improve the prediction result in all circumstances, but we believe this extended SOH estimation method with both peak height ratio and P 3 peak height is expected to be more generalized to different degradation mechanisms in LFP battery as it can complement the limitation of the method with peak height ratio alone in the situation when loss of active material is the dominant degradation mechanism.
Besides estimating SOH, the above-mentioned method can also be applied to update the maximum available capacity at a cycle and recalibrate the SOC estimation. According to Chang 45 and Ng et al. 13 one common way to estimate SOC is coulomb counting, where the accumulated charge is compared to a reference capacity: where SOC t z , ( ) is the SOC at time t of cycle z; Q t z , releasable ( ) is the released capacity if it is completely discharged at time t of cycle z; Q ref is the reference capacity (e.g., nominal capacity is adopted when there is no updating).
Consider a scenario where the battery has never fully charged and/or fully discharged during its application, the maximum available capacity throughout its life cannot be estimated directly. Note that such scenario is common in applications such as cell phone, electric vehicle etc. in which users tend not to use up all the capacity available for contingency and over-charge/over-discharge prevention. In such scenario, the nominal capacity will be adopted as the reference capacity for SOC estimation when the maximum available capacity cannot be updated due to incomplete charge and discharge process.
When Q ref is updated from time to time, the SOC can be written as: ) I(t) is the current rate at time t, which is positive for charging and negative for discharging.
With the proposed method demonstrated before, SOC recalibration can be achieved by inferring the SOH from peak height ratio of the previous cycle with linear regression. And from the estimated SOH we can obtain the corresponding maximum available discharge capacity by the definition of SOH and update Q ref in the SOC calculation. The idea is that, concerning SOC estimation at zth cycle, with the SOH and peak height ratio relationship obtained from historical or experimental data as described above, the latest available information on peak height ratio among previous usage, e.g., peak height ratio from (z − 1)-th cycle or peak height ratio from the charge process of zth cycle, can be used to estimate the maximum available discharge capacity at corresponding cycle. And such estimated maximum discharge capacity can subsequently be used as the reference discharge capacity, Q , ref for SOC calculation at zth cycle.
Here, we present a demonstration of SOC recalibration for cell 2 at cycle 2800 using the peak height ratio information from cycle 2799 to update the Q .
ref For simplicity, we consider a discharge process with SOC(0, 2800) = 100%, i.e., the battery cell is fully charged initially at the beginning of cycle 2800 discharge. Figure 14 shows the SOC vs discharge time plot at cycle 2800 of cell 2. The SOCs with true Q ref at cycle 2800 are shown with black dots, the SOC estimations with Q ref updated using peak height ratio of cycle 2799 are shown with green circles (90% CIs are shown with blue triangles) and the SOC estimations with nominal capacity (1.1 Ah) are shown with red squares. It can be observed that the estimations with Q ref updated are more accurate as they take into account the drop in maximum available capacity along with battery aging. With the proposed method, the SOC recalibration can be achieved without the need of full discharge or charge process. Note that while Q ref is continuously updating, observation of peak height ratio deviated much from the previous trend should be handled carefully as it can be an outlier which arises from noisy data. In such case, the SOC estimation can be done with Q ref at the nearest cycle with non-noisy data.
In addition to the above example of applying the peak height ratio to estimate the SOH with linear regression and to recalibrate the SOC estimation, the proposed feature can be used for other applications, such as monitoring the degradation mechanism since each peak in the IC curve corresponds to a different phase-transition of Li ions intercalation of electrode materials (e.g., graphite in typical LFP battery), and the proportion of peaks reduction in IC curve can reveal the dominant degradation mechanism for a battery. 17,40 More discussion about the physical meaning of the IC peaks is given in later section. Also, the proposed feature can potentially be used for remaining useful life (RUL) prediction by means of certain machine learning methods to capture the degradation trend, as well as for assessing the columbic efficiency, with specific partial charge or discharge profile. Table IV. Comparison of RMSE of the prediction between peak height ratio alone and peak height ratio plus P 3 peak height.

Cell 1 Cell 2 Cell 3
Peak height ratio 0.0206 0.0122 0.0269 Peak height ratio plus P 3 peak height 0.0180 0.0159 0.0256 Figure 14. SOC vs discharge time plot at cycle 2800 of cell 2.

Discussion
Physical meaning of IC curve peaks and its significance in revealing degradation status.-As mentioned in early section, each peak in the IC curve corresponds to a cohabitation of two stages of Li ions intercalation in anode material, i.e., graphite in LFP battery. 17,32,33 The peak fade under the same current rate will indicate a drop in the amount of charge transfer during the particular intercalation phase-transition. Therefore, the different decreasing patterns observed in the IC curve peaks can be used to diagnose the degradation mechanism as mentioned in early section with reference to the studies of Riviere et al. 17 and Dubarry et al. 40 Since the peak changes in IC curve indicate an electrochemistry-based phenomenon, it should provide a robust indication on battery degradation.
In our proposed extended model, both peak height ratio and P 3 peak height are considered for estimating the SOH. To some extents, the peak height ratio can reflect the degradation contributed by the loss of lithium inventory, while the absolute P 3 peak height can reflect the degradation contributed by both the loss of active material and the loss of lithium inventory.
Superiority of the feature extraction algorithm.-In capturing the feature from IC curve, one challenge is that when the rightmost (P 3 ) peak fades completely, the second rightmost peak becomes the "rightmost" peak. While using voltage range to characterize the peak (s) of interest is one of the methods to deal with such problem, it will not be reliable when there is change in external resistance, shifting the voltage range of the whole IC curve. Our proposed algorithm, in which we make use of the relative position among the IC peaks, will be able to handle the situation where the rightmost peak fades completely.
IC curve smoothness consideration under different applications.-The smoothness of the spline model fit is associated with the smoothing parameter λ in Eq. 3, which can be adjusted for fulfilling requirements in different situations.
For SOH estimation, when the measurement error or noise is low, the cross validation method is able to obtain appropriate model smoothness for extracting features while maintaining a high fidelity of the data. But when the measurement error or noise is high, the cross validation method may not be reliable as the features extracted can be distorted by the noise. In such high noise level scenario, the smoothing parameter λ will have to be manually adjusted. Some remedial suggestions have been given in early section when encountering such high noise level scenario.
In addition to SOH estimation and SOC recalibration as discussed in this paper, another widely used application of IC curve is the investigation of battery degradation mechanism. 40,41,46 In such application, the smoothness requirement of the IC curves can be high in order to observe the relative peak changes qualitatively, revealing the degradation mechanism inside the battery. In such case, one can set the smoothing parameter λ manually to a higher value to see a clear relative peak change.
Applicability under different conditions.-To demonstrate the applicability of the proposed method on battery under different operating conditions, we apply our method on additional LFP cell data with different operating temperatures, current rates and brands. The full details of the cycle aging experiments for additional LFP cells are given in Appendix. Figure 15a shows the SOH vs peak height ratio plot with additional charge data. Additional set of LFP cells, cell A4, cell A5 and cell A6 (from A123 system; charged at 1.1 A; 60°C), is shown in orange, cyan and magenta respectively. Another set of LFP cells, cell A7 and cell A8 (from BAK Ltd.; charged at 1 A; 45°C), is shown in yellow and gray respectively. Different sets of battery cells correspond to slightly different conditions (i.e., different manufacturers, different operating conditions). In general, a linear relationship can be noted between SOH and peak height ratio within each set of battery cells, although they are not comparable across different sets. Figure 15b shows the SOH vs peak height ratio plot with additional discharge data. Cell A4 (from A123 system; discharged at 1.1 A; 60°C) is shown in orange. Cell A5 and cell A6 (from A123 system; discharged at 0.55 A; 60°C) are shown in cyan and magenta respectively. Similar to the charge data, a linear relationship can be noted between SOH and peak height ratio among battery cells under the same condition. The results from Figs. 15a and 15b indicate the feasibility of applying our proposed method to estimate SOH under different conditions. Note that the linear relationships between SOH and peak height ratio may not be well aligned even among battery cells under the same condition. It is believed that possible reasons come from measurement error and cell-to-cell variation. Additionally, current rate is also a possible factor affecting the quality of IC curve. Because of the charge-transfer related overpotential, the IC peaks, which arise from the change in anode (graphite in LFP) potential, can be superposed and masked by the overpotential, 6 in which the masking effect from overpotential is more pronounced in high current rate. Therefore, it is generally recommended to apply low current rate to produce IC curves, 5,6 but it may not be practical for battery in real applications to operate at low current rate, either in charge or discharge process. Another remedial solution suggested by Riviere et al. 35 is to add a pause in a charge or discharge process before collecting useful data for feature extraction (i.e., peak height ratio in our case), so as to allow the cell (especially the graphite electrode in our case) to reach thermodynamic stability and alleviate the overpotential problem under moderate current rate. This can be equivalent to adding a pause at SOC 30% (see Fig. 16a) in the charge process or SOC 100% (see Fig. 16b) in the discharge process, where we start getting useful data at those SOC value.
Application to SOH estimation or recalibration of SOC estimation.-As mentioned in previous section, the proposed method can be applied for SOH estimation or recalibration of SOC estimation. In real application, one issue to note is the SOC range that provides enough data to observe the two peaks of interest (P 1 and P 3 ). In our dataset (cell 1, cell 2 and cell 3), having data range approximately from SOC 30% to SOC 95% for charging (voltage range from 3.37 V to 3.58 V, with charge current at 1.1 A,) or from SOC 40% to SOC 100% for discharging (voltage range from 3.02 V to 3.26 V, with discharge current at 1.1 A) will be enough to note the two peaks of interest, as referenced from Fig. 16 which shows the dQ/dV vs SOC plots with charge and discharge data respectively. Note that SOC range is more reliable as change in external contact resistance can cause significant voltage shift. Nevertheless, both SOC and voltage ranges can vary for different brands of LFP battery. It is recommended for the users to determine the SOC or voltage range according to the historical data or experimental data of the LFP battery from the same design and the same manufacturer.
Concerning the current rate, in addition to the 1.1 A adopted in our early demonstration (cell 1, cell 2 and cell 3), the proposed method can possibly be applied at a different current rate (0.55 A) as demonstrated with additional data (cell A5 and cell A6). Theoretically, the proposed method involving ICA can be applicable under different constant current rate settings as long as the rate is small enough to suppress the effect from charge-transfer related overpotential.
In practice, the proposed method can be applied in some applications under certain circumstances: (1) Constant charge applications: the proposed method can be applied in applications where constant charge is adopted, except under fast charging arrangement at very high current rate. In particular, most applications such as laptop, cell phone, power tools are charged in a controlled condition with constant current. Note that when nominal charge process is at a controlled constant current environment with appropriate rate, the typical charge profile will already be sufficient for getting characteristic peaks information in IC curve. In such way, the proposed method will put nearly no extra arrangement/measurement/test on the applications, except that the algorithm of the proposed method has to be embedded on the battery management system. (2) Constant discharge applications: the proposed method can also be applied in constant discharge applications such as battery energy storage system for sharing the burden of utility during peak-load period, 47 although constant discharge applications are less common and the discharge rate requirement can be strict because of the inapplicability of our proposed method under high current rate. (3) Even if the nominal operating condition of an application does not meet the requirement of our proposed method, the SOH identification or the update of maximum available discharge capacity can be done occasionally instead of every cycle. That is, from time to time, an appropriate constant current is applied to charge the application. Although full discharge and charge process can also be done occasionally, our proposed method is expected to impose lesser interruption as the battery does not require complete discharge to SOC 0%.
Besides, with LFP battery, Weng et al. 34 spotted a second order correlation between the normalized capacity and the normalized IC peak, while Riviere et al. 35 noted a linear relationship between the normalized capacity and the normalized IC peak area and our study notes a linear relationship as well. This discrepancy is likely due to different configurations and designs of LFP battery with different brands or different environmental conditions such as temperatures. LFP batteries from the same design, same manufacturer and operating at similar condition should undergo similar degradation mechanisms. Thus, to apply the proposed method to estimate SOH or recalibrate SOC estimation with IC peak features, it is more advisable to train the model with historical or experimental battery data coming from the same design, same manufacturer and under similar environmental condition in order to ensure consistent change on the features throughout battery's life.
Limitation of the SOH estimation method and future research opportunity.-In this paper we present a SOH estimation method by means of the peak height ratio and P 3 peak height features obtained from the IC curve, which can also be used for SOC recalibration. The advantage is that we are able to estimate the SOH without having complete charge or discharge process. However, to reveal the characteristic peaks in the IC curve, a requirement here is that the partial charge (SOC 30% to SOC 95% approximately) or partial discharge (SOC 40% to SOC 100% approximately) process has to be observed under constant current rate while the rate cannot be too high due to the effect from charge-transfer related overpotential. 6 This means, in order to apply the proposed SOH estimation method, there should be a partial charge or discharge process under standard or nominal current rate (1C rate in our dataset) from time to time under real application. The charge profile can be readily applied with the proposed method when the charge rate is appropriate, however, as mentioned in early section, potential limitation is that the charge process of a battery cell can end prematurely when reaching the cutoff voltage without displaying the characteristic peak(s) of interest in IC curve due to the increase in internal or external resistance. Although applying higher cut-off voltage can help alleviate the problem of not displaying characteristic peak(s) of interest, it can only be applicable to specific types of battery with high overcharge tolerance such as LFP battery. Additionally, even though discharge profile can be used, a constant discharge at specific rate will be required. Unlike the charge process, such discharge profile may have to be specifically arranged since most applications are not going with constant discharge rate.  Moreover, in this paper we project the peak height ratio and P 3 peak height on the SOH under C1D1 regime. While the current rate of charging or discharging and the environmental conditions such as temperature may have effect on the maximum available capacity of a battery, future research can investigate how current rate and temperature can affect the correlation between SOH and the peak height ratio & P 3 peak height, and build a universal model connecting SOH and peak height ratio & P 3 peak height with current rate and temperature covariates.
Also, in this paper we make a strong assumption on the linear relationship between features and SOH and demonstrate a SOH estimation using linear regression model due to its simplicity. In the future, more rigorous analysis can be done and other machine learning methods can be explored to assess and capture the correlation between the peak height ratio & P 3 peak height features and the SOH so as to achieve more accurate estimation.
In addition, the voltage shift of peaks in IC curves can actually provide a valuable information on the internal resistance change inside the battery which is a sign of degradation. Although this piece of information is susceptible to external resistance (e.g., contact resistance) change which will alter the voltage shift, many electronic applications nowadays have their battery embedded and this will reduce the chance of any fluctuation in external resistance, making the voltage shift a reliable feature in such case. Thus, although our dataset in this paper does not show consistent voltage shifting trend probably due to fluctuation in external resistance during the experiment, it is worth to explore the voltage shift as a feature for characterizing the degradation status in the future with other datasets.

Conclusions
This paper presents a robust cubic smoothing spline method to obtain the IC curve, with a main advantage that the smoothing parameter can be determined using cross validation instead of subjective trial and error parameter tuning appeared in certain filtering methods. Comparison between the proposed spline method and several filtering methods using simulated data shows that the spline method can maintain good fidelity of the data. Besides, this paper proposes a peak height ratio feature extracted within IC curve for estimating SOH. The estimation of SOH from peak height ratio is demonstrated with linear regression. Regarding the flexibility of dealing with the two different dominant degradation mechanisms in LFP batteries, a more generalized version of SOH estimation method is also demonstrated with consideration of both the peak height ratio and the P 3 peak height (i.e., peak associated with "last phasetransition of Li ions intercalation during charging") features by means of multiple linear regression. The gains of inferring SOH from the proposed method are that without full charge or discharge data, estimation of SOH is still possible and the maximum available capacity can be updated for better SOC estimation. In addition, while current rate and environmental conditions such as temperature can affect the maximum available capacity at a cycle, a future research can examine the effect of current rate and temperature on the correlation between SOH and peak height ratio & P 3 peak height, and develop a universal model connecting SOH and the IC peak features with current rate and temperature covariates. Also, the voltage shift in IC curves, which is related the internal resistance change, can be explored as a feature to quantify the degradation with other dataset in the future.