Battery health evaluation using a short random segment of constant current charging

Summary Accurately evaluating the health status of lithium-ion batteries (LIBs) is significant to enhance the safety, efficiency, and economy of LIBs deployment. However, the complex degradation processes inside the battery make it a thorny challenge. Data-driven methods are widely used to resolve the problem without exploring the complex aging mechanisms; however, random and incomplete charging-discharging processes in actual applications make the existing methods fail to work. Here, we develop three data-driven methods to estimate battery state of health (SOH) using a short random charging segment (RCS). Four types of commercial LIBs (75 cells), cycled under different temperatures and discharging rates, are employed to validate the methods. Trained on a nominal cycling condition, our models can achieve high-precision SOH estimation under other different conditions. We prove that an RCS with a 10mV voltage window can obtain an average error of less than 5%, and the error plunges as the voltage window increases.


INTRODUCTION
Lithium-ion batteries (LIBs) have made our daily lives more convenient and colorful by powering our smartphones, computers, electric vehicles, and so forth. Their advantages in energy density, power density, and long lifetime have been accelerating their penetration in various energy storage applications (Larcher and Tarascon, 2015). However, LIBs inevitably age during use and non-use time, mainly owing to the loss of lithium-ion inventory (LLI) and loss of active materials (LAM) inside the batteries (Barré et al., 2013). The direct effect of aging on battery performance is the decrease in capacity and the increase in internal resistance . To improve the safety, efficiency, and economy of using LIBs, it is indispensable to conduct battery safety monitoring (Deng et al., 2018), residual value assessment, and timely maintenance , all of which heavily depend on the battery health evaluation. However, the health degradation of LIBs affected by temperature, current rate, mechanical stress, and historical operational conditions presents highly nonlinear dynamics (Attia et al., 2020;Severson et al., 2019).
From a physical point of view, the most direct method for battery health evaluation is to quantify the microscopic degradation processes of the battery, such as solid electrolyte interphase (SEI) growth (Wang et al., 2019), particle cracking (Yan et al., 2017), and lithium plating (Xiao, 2019). However, these degradation processes are coupled with each other, and their measurement is currently destructive to the battery (Reniers et al., 2019) and non-destructive techniques need to be explored. Moreover, these microscopic inspections usually require high-cost ex situ techniques and cannot be conducted in the field. To avoid these intractable problems, researchers developed various semi-empirical models to quantify the degradation caused by different stress factors (Schimpe et al., 2018), such as temperature, current rate, and depth of discharge. A large number of experiments are required to establish an accurate semi-empirical model, and its adaptivity to other operational conditions is still questionable. Battery electrochemical models (Doyle et al., 1993) and equivalent circuit models (Plett, 2004a) are widely used to simulate battery behaviors. As the battery health-related parameters, such as capacity and internal resistance, can be derived from the battery model, parameter identification (Rahimi-Eichi et al., 2014) and state estimation methods (Plett, 2004b) are used to obtain these parameters. However, owing to the model uncertainty and limited measurements for feedback (only voltage and temperature), it is very difficult to obtain reliable estimation results with clear physical meaning.
proposed for battery health prognostics. According to the input types, these methods can be roughly divided into two categories: feature-based methods and sequence-based methods. Feature-based methods use the extracted features as the inputs and then utilize lightweight machine learning algorithms to model the latent functions between the features and the output target, such as multiple linear regression (MLR) (Deng et al., 2021), support vector machine (Deng et al., 2016), relevance vector machine , and Gaussian process regression (GPR) (Richardson et al., 2019;Roman et al., 2021). Various features can be extracted from the voltage, current, temperature curves during the charging/discharging process, and electrochemical impedance spectrum . For example, incremental capacity (IC) and differential voltage (DV) analysis (Han et al., 2014) are two useful methods to extract features to evaluate battery health, and typical features include the peak values of the IC curves (Jiang et al., 2020;Tang et al., 2021), the valley values of the DV curves (Li et al., 2018), and the curve area within a given voltage range. In contrast, sequence-based methods directly use time-series data as the input and employ deep learning methods to achieve automatically feature extraction and nonlinear modeling, e.g., deep neural network (Roman et al., 2021;Tian et al., 2021), long short-term memory network (Deng et al., 2022b;Li et al., 2020), deep convolutional neural network (DCNN) (Shen et al., 2020), and their variants. These techniques usually use time-series data of battery current, temperature, voltage, and accumulated charge under complete or partial charging/discharging conditions as the input. Many studies have shown that both feature-based and sequence-based methods can achieve outstanding performance under specific conditions. In many applications, the battery discharge process is a bit dynamic, while the charging process is relatively stable and usually pre-defined, such as in electric vehicles and smartphones. Therefore, many researchers developed health evaluation models based on the charging data (Jiang et al., 2020;Li et al., 2018;Shen et al., 2020;Tian et al., 2021). In these studies, a specific voltage range and a fixed start/endpoint are required to ensure that the curves of different cycles have the same reference points. However, in practical applications, the charging behaviors of users are random (Zhao et al., 2021), which means the charging start and endpoints are not fixed, and a complete charging process is very difficult to capture (Deng et al., 2022a). Furthermore, for series battery packs, the inconsistency between battery cells causes them to have different charging voltage curves and a narrow voltage overlap window (Tian et al., 2020), especially for aged cells, which hinders the health evaluation of each cell. In short, it is still a significant challenge to conduct battery health evaluation based on a random and short charging segment in real applications.
In this work, we develop data-driven methods to accurately estimate battery state of health (SOH) using a random charging segment (RCS) extracted from the constant current process. The proposed methods are validated with four types of commercial batteries (75 cells in total) cycling under different temperatures and discharging rates. As schematically shown in Figure 1, we first divide the constant current (CC) charging curve into dozens of segments and extract a capacity increment sequence in each segment. We show that the capacity increment sequence evolves in a certain pattern as the battery ages, and its average value and SD have high correlations with the battery SOH. Then, we analyze the correlations under different numbers of segments and find that even a short segment is highly correlated with battery SOH. Finally, two types of machine learning algorithms (features-based and sequence-based) are used to model the SOH estimators. In the training process of the data-driven models, all RCSs are used as input, but only an RCS is required as input for the online application. We prove that the developed methods can achieve high accuracy using even an RCS with an extremely small voltage window (i.e., 10mV).

Charge capacity evolution
Taking the data of a LiNi x Mn y Co 1ÀxÀy O 2 blended with LiCoO 2 (NMC-LCO) cell as an example, we investigate the evolution of battery charge capacity with the number of cycles. Figure 2A and Figure 2B show the constant current-constant voltage (CC-CV) charging profiles and the corresponding charge capacity (Q) as a function of voltage, where the color denotes the number of cycles. It shows that the capacity curves gradually shift downward as the battery ages. Table 3 presents parameters setting used to extract the Q from different batteries. Furthermore, we divide the voltage range of 3.70-4.29V into 12 segments according to (Equation 2) and extract the capacity increment sequences (6Q seg ) from the segments ( Figure 2C). After this segmentation, each segment has a 0.48V voltage window. Some patterns in the evolution of charge capacity as the battery ages can also be observed in these partial charging segments. To evaluate the usefulness of the partial capacity segments for the battery health evaluation, we calculate two statistical characteristics of 6Q seg and analyze the correlations (r) between them and battery SOH. We choose the average value of 6Q seg (ave_6Q seg ) and the SD of 6Q seg (std_6Q seg ) as features, and their correlations with the SOH of NMC-LCO battery are shown in Figure 3. The correlations when the voltage range is divided into 12 segments are illustrated in Figures  For each segmenting operation, we calculate the average absolute correlation (AAC) of each feature for different segments. Figure 4 compares the variation of the AACs with the number of segments for different batteries. All the batteries are cycling in a 25 C chamber with the same charging policy (0.5C CC-CV). For the discharging policy, the NMC-LCO battery is discharged at 1.5 CC, while the other three types are discharged at 0.5 CC.
Owing to the difference in electrochemistry, the correlations of the four batteries present different patterns of iScience Article change. In contrast, for the same battery chemistry, the correlation variation of its two features is highly consistent. The correlation generally decreases as the number of segments increases, except for a partial recovery in NCA battery. For all types of batteries, high correlations of the two features with the battery SOH appear when the number of segments is less than 20. Even when the number of segments is 59, which corresponds to a voltage window of 10mV, a correlation over 0.5 is obtained for NMC-LCO, NCA, and NMC batteries, and around 0.4 for LFP battery. Compared with other battery types, it is more difficult for LFP battery to extract features highly related to battery SOH based on the 6Q seg , which increases the difficulty of its SOH estimation.

Battery state of health estimation
We use two types of data-driven methods to model the battery SOH estimator. The first one is the featurebased method, which uses selected features as input. By comparison, the second one, i.e., the sequencebased method, can directly use the raw data sequence as input.
In this study, one simple algorithm, (MLR) and two state-of-the-art machine learning algorithms (sparse GPR and DCNN) corresponding to the above-mentioned two types of methods are employed to construct battery SOH estimators. The MLR is a typical method to model the linear relationship between input features and output target. In general, the better the linear correlation between the features and the output, the higher the accuracy of MLR estimation. The sparse GPR (SGPR) can efficiently capture the nonlinear relationships between the inputs features and output and provide a probabilistic prediction of the target. For the DCNN, it can infinitely approximate the nonlinear characteristics of the process owing to its deep learning mechanism. In this SOH estimation problem, the MLR and SGPR take the two features (ave_6Q seg and std_6Q seg ) and the mean value of the corresponding voltage sequence as inputs, while the DCNN uses the capacity increment sequence (6Q seg ) and the corresponding voltage sequence as inputs directly. We convert the high dimensional sequences into images as input to the DCNN, which can automatically extract features from the front layers of the network.
As illustrated in Figure 1, to achieve battery SOH estimation based on any RCS, the training samples need to cover all segments. This increases the sample size by dozens of times (usually up to 10,000), resulting in a regular GPR being unable to complete the training on a regular computer (computational complexity is O(n 3 ), n is the sample size). In this regard, we use a sparse GPR and it can significantly reduce the computational burden by introducing inducing points (Candela and Rasmussen, 2005). The SOH estimation results for NMC-LCO batteries are presented in Figure 5, including the training process, the test process using all segments, and the test process using a random segment for each cycle. The results of the other three types of batteries are shown in Figures S4-S6. The statistical errors of SOH estimation for the four types of batteries are summarized in Table 1. All the mentioned methods are trained on a cell under 25 C, 0.5C CC-CV charging and 0.5C discharging conditions (except for 1.5C for NMC-LCO), and are tested on another cell with the same battery type and cycling condition. We define the above cycling condition as a nominal cycling condition. In the modeling processes, we also divide the voltage range into 12 segments, which means that 12 samples are generated per cycle. Therefore, the x-axis labels in the training ( Figure 5A) and test results ( Figure 5B) are denoted by ''sample'' rather than ''cycle.'' To mimic the random charging behaviors of users, we perform uniformly distributed sampling and select one from 12 segments for each cycle as the final estimated value of the cycle ( Figure 5C). It is worth noting that for the SGPR and DCNN methods, the model obtained after each training is different owing to the random setting of initial parameters. To obtain reliable results, we run 20 times of training and test processes, and take their average values as the final results.
The above results show that all the methods can obtain high-precision SOH estimation for different battery chemistries, with a mean absolute error (MAE) lower than 2% and a root-mean-square error (RMSE) lower than 2.5% in the test process. The DCNN achieves the best performance, and its MAEs and RMSEs are lower than 1 and 1.2%, respectively. Owing to the inability to model nonlinear characteristics, the iScience Article estimation accuracy of the MLR is always lower than the SGPR and DCNN. Besides, we can find that the estimation errors based on random segments are almost the same as that of using all test segments. This is because of the use of uniformly distributed sampling. According to the above results, we demonstrate that a high-accuracy data-driven estimator can be built as long as the data changes in some pattern with the output, whether it is a feature-based method or a sequence-based method.
We further analyze the effect of the number of segments on the accuracy of SOH estimation. Note that with a larger number of segments, we can get a smaller voltage window for each segment. For instance, when the number is 59, the voltage window is only 10 mV. Besides, for battery health evaluation, it is of practical importance to construct a model based on a specific cell but can maintain its accuracy for other cells. Therefore, we further test the accuracy of the models on other cells of the same chemistry. The cycling conditions of the four types of batteries are explained in Table 2. NMC-LCO batteries have the same cycling conditions, while NCA, NMC, and LFP have different cycling temperatures and discharging rates. The relationship between the capacity and the cycle number is shown in Figure S7 for all the batteries. This figure indicates that the rate of capacity degradation is significantly affected by temperature and discharging rates. In addition, we can see that the capacity of parts of batteries decays too fast under certain cycling conditions and reach a capacity range (area under black dasheddotted line in Figure S7), to which the batteries under the nominal cycling condition have not decayed. To examine the accuracy of the models under severe capacity degradation, we still use the models established under nominal cycling conditions to estimate the SOH in this area.
The MAEs of SOH estimation for the four types of batteries are shown in Figure 6, in which the errors are plotted as a function of the number of segments and cycling conditions (except for NMC-LCO, which uses cell numbers). Owing to the requirement of convolutional operation in the DCNN, the length of the input sequence cannot be too small. We set the lowest limit of the length equal to five for the DCNN, thus there are at most 55 segments for each cycle.
For NMC-LCO cells, the estimation accuracy of different cells is almost the same, and the maximum MAEs of the three models are lower than 6%. Even using an extremely small voltage window (10 mV), an acceptable estimation result can still be obtained. And when a voltage window of 100 mV is used, a MAE of SOH lower than 3% can be obtained by the SGPR and the DCNN models. However, the above outstanding performance is owing to the same cycling condition between the trained and test cells.
For NCA, NMC, and LFP batteries, when several cells are cycled under the same condition, the average values of MAEs are used to indicate the results of this cycling condition. We can find that the MAE increases  iScience Article error; while for NCA batteries, both a higher temperature and a lower temperature increases the error. In contrast, the influence of discharging rate on the estimation errors is insignificant. In addition to the metric of MAE, the RMSEs of SOH estimation for four types of batteries are also shown in Figure S8, and the same influence law from temperature can be observed.
To better evaluate the robustness of the models constructed using different numbers of segments and under different cycling conditions, the distributions of MAEs at different segments are shown in Figure 7 (the distributions of RMSEs are shown in Figure S9). The distribution is calculated based on the results under the same number of segments for the four types of batteries under different cycling conditions. It can be observed that when the number of segments is less than 10, a mean MAE of less than 2% can be obtained for each method. Even if the number of segments is up to 55 or 59 (corresponds to a 10mV voltage window), a mean MAE less than 5% and a mean RMSE around 6% can be guaranteed. Note that a larger number of segments means a smaller voltage window for SOH estimation. This proves that an RCS with a 10mV voltage window can achieve an acceptable SOH estimation. Meanwhile, we can observe that a sequence with a big voltage window can capture more battery degradation information and has better generalization for different working conditions. However, it is difficult to obtain a big voltage window in the charging process in many applications. Therefore, we have to sacrifice some accuracy to ensure the availability of the models.

DISCUSSION
In this article, we develop data-driven models to achieve accurate battery health evaluation using an RCS extracted from the constant current charging process. We prove that capacity increment sequences in the iScience Article CC charging process are informative for battery health evaluation. Two features extracted from these sequences are closely related to battery SOH. Two feature-based methods (i.e., the MLR and SGPR) and one sequence-based method (i.e., the DCNN) are used to construct data-driven models for SOH estimation. The proposed models are trained using the data of a cell under the nominal cycling condition and subsequently tested on other cells with the same or different conditions (difference in temperature and discharging current rates). The developed methods are validated using four types of batteries (75 cells in total), with the error lower than 2% when the voltage window is up to 500mV. Moreover, an average estimation error lower than 5% can be obtained even when the voltage window is less than 10mV. Our work substantiates that it is promising to use an RCS with a narrow voltage window to achieve accurate health evaluation for LIBs. As only the short random charging data is needed, the proposed methods can be applied to a variety of scenarios.

Limitations of the study
In this article, the proposed methodology is verified under cycling conditions with constant ambient temperatures, constant discharging currents, and full charge and discharge. It is valuable to conduct more verifications under other cycling conditions, such as changing ambient temperature, dynamic discharge, and shallow charge and discharge. These will be investigated in our future work.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

AUTHOR CONTRIBUTIONS
Z. Deng and L. Xu analyzed the datasets and conceived the study. Z. Deng, X. Hu, and Y. Xie developed the models. Z. Deng, X. Lin, and X. Bian interpreted the results. All authors edited and reviewed the article. X. Hu and X. Lin supervised the work.

Material availability
This study did not generate new materials.

Data and code availability
The raw dataset used in this study is available at https://www.batteryarchive.org.
The code for data processing, features extraction and data-driven modeling is completely available at https://github.com/TengMichael/battery-health-evaluation.

Dataset description
The cell cycling dataset of four different types of batteries is used in this work, and all of them come from an open-source battery data website. The information of batteries and experimental settings are listed in Table 2. All the batteries use graphite material as their negative electrodes, but material of their electrolytes are not disclosed. The battery cells are divided into different types based on the positive materials, namely, LiNi x Co y Al 1ÀxÀy O 2 (NCA), LiNi x Mn y Co 1ÀxÀy O 2 (NMC), LiFePO 4 (LFP), and a blend of NMC and LiCoO 2 (NMC-LCO). The dataset of NMC-LCO is contributed by Hawaii Natural Energy Institute (HNEI) (Devie et al., 2018), and the other three are contributed by Sandia National Laboratories (SNL) (Preger et al., 2020). All the battery cells are cycled under the same constant current-constant voltage (CC-CV) charging profile (0.5C CC-CV), but different CC discharging profiles with different current rates (0.5C, 1.5C, 1C, 2C, and 3C). Although the dataset of SNL including cells cycled under different depths of discharge (DODs), only the data of 100% DOD is used for this study. The benchmark SOH of the cell is calculated by, where C n is the nominal capacity of the battery, and C i is the actual capacity at ith cycle, which is equal to the maximum discharge capacity under CC discharging.

Capacity increment sequence extraction
In a battery charging process, battery charge capacity can be calculated by integrating battery current over time. For constant current (CC) charging, the battery voltage is usually monotonically increasing (or filtered to maintain monotonicity iScience Article with respect to the voltage sequence, where n = (V start -V end )/6V+1 (Deng et al., 2021). Considering the difference in voltage platforms for the four types of batteries, different parameters setting is used to extract the Q from different batteries and are listed in Table 3. The V start is set to a voltage point corresponding to about 5% SOC of the battery because 5% SOC is usually reserved to avoid battery over-discharge in practical applications. The V end is set to a value slightly away from the charging cut-off voltage to avoid the influence of data fluctuations when switching to the constant voltage (CV) charging stage.
To achieve accurate battery SOH estimation based on partial charging data, the charge capacity sequence can be further divided into dozens of segments (Q seg ). Given a fixed length (h) of the segment and a stride size (s), m segments can be extracted, where the function floor(.) gets the largest integer no more than the input value. In this study, s is set to 1, corresponding to one 6V voltage interval. As listed in Table 3, a charge capacity sequence, Q = [Q 1 , Q 1 , ., Q n ], can be extracted with n = 60 for all batteries. Setting h = 49, which corresponds to a 0.48V voltage window, then m = 12, which means Q can be divided into 12 Q seg , as shown in Figure 1 for NMC-LCO battery. Due to the random charging behaviors of users, the charging start point is not fixed, thus it is impossible to calculate the exact (or absolute) charge capacity values in practical applications.
To overcome this problem, the capacity sequence is replaced by the capacity increment sequence (6Q seg = Q seg À Q seg,1 ) in each segment.
For each 6Q seg , its average value (ave_6Q seg ) and standard deviation (std_6Q seg ) can be extracted as features. The Pearson correlation coefficient (r) is used to provide the strength of the linear correlation between the features and battery SOH. The correlation analysis results in four different types of batteries are illustrated in Figures 2 and S1-S3.

Multiple linear regression
To model the linear relationship between the input features and output target, an MLR is a commonly used method. Its expression is, where b y i is the predicted SOH, x j is the input feature, n is the number of features, and b w j is the weight. The objective function of this regression problem is often defined to minimize the mean square error of the output. When there are many features, a regularization technique can be introduced to prevent the model from overfitting during the training process (Severson et al., 2019).

Gaussian process regression
To better capture the nonlinear relationship between the input features and the battery SOH, the GPR technique is employed. GPR is a machine-learning framework with non-parametric modeling and uncertainty evaluation (Williams and Rasmussen, 2006). For a typical regression problem, observations usually contain Gaussian white noises and can be modelled as, where x i is the ith input features, s 2 n is the noise covariance, and f(x)=[f(x 1 ), f(x 2 ), ., f(x n )] is a Gaussian process. f(x) can be described as f(x) $ N (0, K), where K ij =k (x i , x j ) is the covariance kernel function, which is a measure of distance between points x i and x j . A squared exponential kernel function is the most widely used, and is expressed as, ) where s f and l are hyperparameters, which determine the amplitude of the kernel function and the importance of each input feature, respectively. Given training samples, the hyperparameters [s f , l, s n ] of GPR can be optimized by maximizing the marginal likelihood. Due to the matrix inversion in solving the maximum likelihood estimation problem, the computational complexity is O(n 3 ) for a regular GPR. When the size of ll OPEN ACCESS iScience 25, 104260, May 20, 2022 iScience Article the training space is very large, the training of the regular GPR is intractable. To overcome this problem, a sparse GPR with a specific number of inducing points is employed (Candela and Rasmussen, 2005). Only m training samples are selected from the original training set, thus the computational complexity can be reduced to O(m 2 n). In this paper, the GPR-based SOH estimation is realized by using the Gaussian processes for machine learning (GPML) toolbox (Williams and Rasmussen, 2006).

Deep convolutional neural network
The DCNN has been successfully used in image recognition, and the elements, lines, and shapes of a picture can be captured by different layers (Szegedy et al., 2015). Due to its ability to nonlinear modeling and automatic feature extraction (Gao and Lu, 2021), we use it to estimate battery SOH directly based on the 6Q seg . In addition to the 6Q seg , the corresponding voltage sequence is also used as the input of the DCNN. Unlike color images that have an input size equal to n3n33, the above features can only form input with a size equal to n3231, thus 1D CNN is used to construct the network. In this study, the DCNN-based SOH estimation model mainly consists of two 1D convolutional layers, one maximum pooling layer, and one fully-connected layer. The structure of the developed DCNN is shown in Table S1. Since the input size is changing (n varies from 6 to 60 in this case), to ensure that the size of the input to the maximum pooling layer is constant, the size of the first two convolutional layers is also set to be variable. The stride size is default to one, and no padding is used. In each convolutional layer, a batch normalization technique is applied to improve the performance and stability, and a rectified linear unit (ReLU) activation function is subsequently used to learn the nonlinear relationships.

Evaluation criteria
Two statistical characteristics of the SOH estimation errors are chosen to evaluate the model performance. where y i is the observed battery SOH, b y i is the estimated SOH, and n is the total number of samples.