Research Progress on Data-Driven Methods for Battery States Estimation of Electric Buses

: Battery states are very important for the safe and reliable use of new energy vehicles. The estimation of power battery states has become a research hotspot in the development of electric buses and transportation safety management. This paper summarizes the basic workﬂow of battery states estimation tasks, compares, and analyzes the advantages and disadvantages of three types of data sources for battery states estimation, summarizes the characteristics and research progress of the three main models used for estimating power battery states such as machine learning models, deep learning models, and hybrid models, and prospects the development trend of estimation methods. It can be concluded that there are many data sources used for battery states estimation, and the onboard sensor data under natural driving conditions has the characteristics of objectivity and authenticity, making it the main data source for accurate power battery states estimation; Artiﬁcial neural network promotes the rapid development of deep learning methods, and deep learning models are increasingly applied in power battery states estimation, demonstrating advantages in accuracy and robustness; Hybrid models estimate the states of power batteries more accurately and reliably by comprehensively utilizing the characteristics of different types of models, which is an important development trend of battery states estimation methods. Higher accuracy, real-time performance, and robustness are the development goals of power battery states estimation methods.


Introduction
The development of automobile electrification has effectively alleviated the oil energy crisis and environmental pollution problems caused by fuel vehicles. Under the dualwheel drive of market demand and policy support, the application scale of electric bus has rapidly increased. As the core energy storage component of electric vehicles, the states of a power battery affect the driving safety performance of the vehicle [1]. Abnormal battery states often lead to vehicle fire safety incidents. Due to the long mileage of the driving route, electric buses need to be equipped with more power battery modules to meet daily operational needs. At the same time, as electric buses transport more passengers, higher safety for power batteries and the entire vehicle is required. Battery states refer to the working state of a battery during its service, mainly including state of charge (SOC) [2], state of health (SOH) [3], remaining useful life (RUL) [4], state of power (SOP) [5], state of energy (SOE) [6], state of safety SOS [7], etc. Accurate estimation of battery states, early detection of abnormalities, and timely warning and disposal are of great significance for ensuring the safe and reliable use of battery, thus prolonging service life and enhancing aging performance of battery, and ensuring the safe use of electric buses.
The main goal of battery management is to estimate the power battery states accurately, providing guidance for safe use, maintenance, replacement, and retirement of batteries. To achieve this goal, researchers have conducted a large amount of effective research on the battery states estimation methods and arrived at fruitful achievements.
Researchers have designed different workflows to estimate the battery states [8,9]. The basic workflow usually includes four steps: data collection and preprocessing, feature engineering, battery model construction, and application, as shown in Figure 1. Data acquisition and preprocessing is the first step to focus on obtaining the battery states data with the data acquisition system, and performing data preprocessing such as data cleaning, data filtering, and regularization. The second step is feature engineering, which is to select and extract the features of the collected data, reduce the data dimension, extract the data features closely related to the battery states, and avoid excessive data redundancy. The main methods include principal component analysis, correlation coefficient analysis, and cosine similarity analysis, etc. The third step is to construct an estimation model, which is the core of the workflow, and to establish a mapping relationship between input data and output data. The commonly used models include machine learning models, deep learning models, and hybrid models. The final step is to apply the results of battery states estimation, such as abnormal states warning, triggering active intervention and disposal procedures. While estimating the battery states, different data sources have a significant impact on the estimation accuracy, and high-quality data are one of the key factors for conducting data-driven battery states estimation. There are three common types of data sources: test data, simulation data, and natural driving data. Specific data includes voltage, current, temperature [10][11][12], and CAN data [13]. Deng et al. [10] have estimated the battery states by comprehensively capturing time series characteristic data of voltage and current. With the development of vehicle sensor technology, 5G communication, and vehicle networking technology, a massive amount of battery states data under natural driving conditions has been recorded and stored, providing important input for data-driven battery states estimation.
Another key factor is to construct a battery states estimation model. The methods for building models are mainly divided into two categories: analytical model-based methods and data-driven model-based methods. The methods based on an analytical model mainly include electrochemical models and equivalent circuit models. The electrochemical model is based on the internal structure of the battery and simulates the complex chemical reaction mechanism inside the battery. The equivalent circuit model characterizes the battery by analyzing its electrical characteristics during working and simulating characteristics using circuit components. It has the characteristics of simple modeling structure and low  While estimating the battery states, different data sources have a significant impact on the estimation accuracy, and high-quality data are one of the key factors for conducting data-driven battery states estimation. There are three common types of data sources: test data, simulation data, and natural driving data. Specific data includes voltage, current, temperature [10][11][12], and CAN data [13]. Deng et al. [10] have estimated the battery states by comprehensively capturing time series characteristic data of voltage and current. With the development of vehicle sensor technology, 5G communication, and vehicle networking technology, a massive amount of battery states data under natural driving conditions has been recorded and stored, providing important input for data-driven battery states estimation.
Another key factor is to construct a battery states estimation model. The methods for building models are mainly divided into two categories: analytical model-based methods and data-driven model-based methods. The methods based on an analytical model mainly include electrochemical models and equivalent circuit models. The electrochemical model is based on the internal structure of the battery and simulates the complex chemical reaction mechanism inside the battery. The equivalent circuit model characterizes the battery by analyzing its electrical characteristics during working and simulating characteristics using circuit components. It has the characteristics of simple modeling structure and low computational cost [14,15]. The key models include the Rint model, Thevenin model, PNGV model, GNL model, etc. [16]. The problem of the above method is that detailed physical structure parameters of the model need to be obtained, as well as the high nonlinearity and strong coupling characteristics between each parameter, which result in low accuracy of the estimation and difficulty in further significant improvement. Data-driven model is to treat the battery as a black box, analyze hidden information and evolution rules from the external feature parameters of the battery, and estimate the battery states by mining the hidden feature information based on a large amount of dataset. Data-driven models involve simulating battery states by end-to-end training of data, which has the characteristics of simple modeling process, high estimation accuracy, and strong generalization ability. However, this model needs a large amount of data for training [17]. In recent years, with the development of internet of vehicles technology, a large amount of battery data has been recorded and stored, laying the data foundation for data-driven models. These methods are increasingly applied in power battery state models. This paper focuses on the research progress of data-driven models.
Data-driven models include classical machine learning models, deep learning models, and hybrid models. In early battery states estimation research, classical machine learning models are mainly used, and common models include artificial neural networks (ANN) [18,19], support vector machine (SVM) [20,21], and Gaussian process regression (GPR) [22,23], hidden Markov model (HMM) [24,25], random forest (RF) [26,27], fuzzy control [28,29], autoregressive(AR) [30,31], relevance vector machine (RVM) [32,33], etc. Although classic machine learning models can estimate battery states based on a small number of data samples, the estimation quality relies on expert experience to manually extract features, and the estimation accuracy is greatly affected by the selected features.
With the development of neural network technology, to further improve the accuracy and robustness of battery states estimation, some scholars have begun to explore the use of deep learning methods, such as convolutional neural networks (CNN) [34][35][36], recurrent neural networks (RNN) [37,38], and other models. The deep learning models achieve highlevel abstract representation and modeling of data by constructing a depth map composed of multiple processing layers and nonlinear and linear transformations. Compared with the machine learning methods, deep learning methods can automatically extract features of different depths from massive data, achieving end-to-end learning. They are not sensitive to data noise, easy to understand, and they have good portability. When the data sample size is sufficient, it can achieve higher estimation accuracy. However, the shortcomings of deep learning models are the need for larger data samples and more training time to train the model.
To further improve the accuracy and robustness of models and reduce training time, some scholars have attempted to comprehensively utilize the characteristics of different models to build hybrid models for battery states estimation [39][40][41]. For example, Song used the feature extraction capability of CNN and the time series prediction capability of RNN to try to build a hybrid model CNN-LSTM to estimate the SOC state of batteries, extracted advanced spatial features from original data through CNN, and captured the nonlinear relationship between SOC and measurable data such as current, voltage, and temperature through LSTM. It has better tracking performance than the single model of LSTM or CNN. The maximum average error of SOC estimation is less than 1.5%, and the maximum root mean square error is less than 2%. The hybrid model has good application prospects in the field of health estimation of batteries.
Researchers have summarized the research progress of power battery states estimation technology. For example, Toughzaoui et al. [42] summarized the research status of battery health status estimation and remaining life estimation, and Manoharan et al. [43] summarized battery states estimation technology based on traditional machine learning models. By analyzing the existing review literature, it was found that the existing literature mainly analyzes some battery states based on machine learning models, and there is no analysis of battery states estimation technology based on deep learning models and hybrid models. The main purpose of this paper is to analyze the latest achievements in data-driven power battery states estimation recently, summarize the main data sources and characteristics 4 of 22 of battery states, and compare and analyze the mainstream data-driven models and their advantages and disadvantages. In addition, some other content was summarized, and the basic process for estimating the health status of power batteries was proposed. The future technological development trends of data sources and data-driven models for estimating the battery states were discussed.
The main contributions of this paper include: (1) Analyzing the main data sources and their characteristics used to estimate battery states, providing guidance for subsequent data collection and application of power batteries; (2) Summarizing the construction methods of data-driven battery states estimation models, providing support for further research on model construction methods; (3) Analyzing the development trend of battery states estimation technology to provide reference for future research on estimation methods.
The structure of this paper is as follows. The second part analyzes the data sources and their characteristics. The third part discusses the model and characteristics of datadriven status estimation of power batteries. Finally, the development trend is analyzed and conclusions are presented.

Data for Battery States Estimation
There are many data sources used to estimate the battery states. According to the data acquisition method, the main data sources are divided into three categories: test data, simulation data, and natural driving data, as shown in Figure 2. hybrid models. The main purpose of this paper is to analyze the latest achievements in data-driven power battery states estimation recently, summarize the main data sources and characteristics of battery states, and compare and analyze the mainstream data-driven models and their advantages and disadvantages. In addition, some other content was summarized, and the basic process for estimating the health status of power batteries was proposed. The future technological development trends of data sources and data-driven models for estimating the battery states were discussed. The main contributions of this paper include: (1) Analyzing the main data sources and their characteristics used to estimate battery states, providing guidance for subsequent data collection and application of power batteries; (2) Summarizing the construction methods of data-driven battery states estimation models, providing support for further research on model construction methods; (3) Analyzing the development trend of battery states estimation technology to provide reference for future research on estimation methods.
The structure of this paper is as follows. The second part analyzes the data sources and their characteristics. The third part discusses the model and characteristics of datadriven status estimation of power batteries. Finally, the development trend is analyzed and conclusions are presented.

Data for Battery States Estimation
There are many data sources used to estimate the battery states. According to the data acquisition method, the main data sources are divided into three categories: test data, simulation data, and natural driving data, as shown in Figure 2.

Test Data
Some researchers use experimental data to estimate the states of batteries [44,45], which can be further divided into the following categories: electrical data such as voltage, current, resistance, capacitance, SOC, etc.; heat data such as temperature; other data such as gas composition, pressure inside the battery box, etc. Some of these data can be directly measured through testing equipment, such as temperature, which can be directly measured through single point, multi-point, infrared imaging or ultrasound method. Some data need to be obtained through indirect estimation methods, such as the resistance of power batteries. The characteristic of this type of data is authenticity and objectivity, but it requires the use of professional equipment for collection.

Simulation Data
Some researchers use simulation data to estimate the battery states [46,47]

Test Data
Some researchers use experimental data to estimate the states of batteries [44,45], which can be further divided into the following categories: electrical data such as voltage, current, resistance, capacitance, SOC, etc.; heat data such as temperature; other data such as gas composition, pressure inside the battery box, etc. Some of these data can be directly measured through testing equipment, such as temperature, which can be directly measured through single point, multi-point, infrared imaging or ultrasound method. Some data need to be obtained through indirect estimation methods, such as the resistance of power batteries. The characteristic of this type of data is authenticity and objectivity, but it requires the use of professional equipment for collection.

Simulation Data
Some researchers use simulation data to estimate the battery states [46,47]. For example, Sakile simulated the battery model in the MATLAB/Simulink environment and used simulation data to predict the SOC and RUL of the battery. The use of simulation to collect data under various operating conditions, working environments, and meteorological conditions has strong flexibility, but the main drawback is that the ability of simulation to reproduce and capture real operating conditions, working environments, and meteorological conditions is limited, and simulation data may not be consistent with real operating conditions data.

Natural Driving Data
With the development of on-board sensors, internet of vehicles, cloud platforms, and other technologies, a large amount of natural driving data is stored and recorded. Natural driving data have the characteristics of easy access to data, objective and realistic status, large amount of data, and rich information. These data not only include battery states data, but also include vehicle actual scene data and environmental meteorological data [48]. Wang used real driving data from two electric buses to predict the battery temperature during the charging phase of an electric bus [49]. With the development of artificial intelligence and big data technology, natural driving data may become the main source of data for estimating the status of batteries in engineering practice.
In summary, the three types of data sources used to estimate the battery states have different characteristics, and the advantages and disadvantages of each type of data source are analyzed in Table 1. Table 1. The characteristics of data source.

Data Source Strengths Weakness
Test Data The data are objective and truthful, and can collect extreme working condition data Special testing equipment is required, and the testing process requires a large amount of manpower and material resources, with a long acquisition cycle and high requirements for the testing environment

Simulation Data
Easy and convenient to obtain, with good repeatability and no environmental constraints Data quality is greatly affected by model accuracy

Natural Driving Data
The data are real and objective, easy to collect and cover a wide range of scene conditions High data interference noise makes it difficult to obtain extreme scenario data

Data-Driven Model for Battery States Estimation
According to the characteristics of the model, the data-driven battery states estimation models can be divided into three categories: machine learning models, deep learning models, and hybrid models, as shown in Figure 3.

Machine Learning Model
Machine learning refers to learning general rules from limited observation data and utilizing these rules to predict and analyze unknown data. The model needs to first rep-

Machine Learning Model
Machine learning refers to learning general rules from limited observation data and utilizing these rules to predict and analyze unknown data. The model needs to first represent the data as a set of features and then input these features into the prediction classifier to predict the output results. Its feature representation mainly relies on manual experience or feature transformation methods for extraction, and the extracted features have a significant impact on the recognition accuracy of the model. This paper focuses on analyzing the progress of estimating the battery states using ANN, SVM, and GPR models.
(1) ANN model ANN is an information processing system established based on imitating the structure and function of brain neural networks. Artificial neural networks have self-learning, selforganizing, adaptive, and strong nonlinear function approximation capabilities, and has strong fault tolerance, which is suitable for complex nonlinear modeling problems with multiple related features. The basic structure of ANN includes three layers: input layer, output layer, and hidden layer, as shown in Figure 4.

Machine Learning Model
Machine learning refers to learning general rules from limited observation data and utilizing these rules to predict and analyze unknown data. The model needs to first represent the data as a set of features and then input these features into the prediction classifier to predict the output results. Its feature representation mainly relies on manual experience or feature transformation methods for extraction, and the extracted features have a significant impact on the recognition accuracy of the model. This paper focuses on analyzing the progress of estimating the battery states using ANN, SVM, and GPR models.
(1) ANN model ANN is an information processing system established based on imitating the structure and function of brain neural networks. Artificial neural networks have self-learning, self-organizing, adaptive, and strong nonlinear function approximation capabilities, and has strong fault tolerance, which is suitable for complex nonlinear modeling problems with multiple related features. The basic structure of ANN includes three layers: input layer, output layer, and hidden layer, as shown in Figure 4. ANN models have significant advantages in estimation accuracy and robustness, and they have been widely applied in the field of battery states estimation. Researchers have conducted extensive research on building ANN models to estimate battery states [50][51][52][53][54][55]. For example, Wang et al. [50] used an ANN model to estimate the temperature changes of the lithium-ion batteries. They established temperature prediction models using backpropagation neural networks (BP-NN), radial basis function neural networks (RBF-NN), and Elman neural networks (Elman-NN), respectively, and compared the temperature prediction performance of different neural network modeling techniques. The MSE and  ANN models have significant advantages in estimation accuracy and robustness, and they have been widely applied in the field of battery states estimation. Researchers have conducted extensive research on building ANN models to estimate battery states [50][51][52][53][54][55]. For example, Wang et al. [50] used an ANN model to estimate the temperature changes of the lithium-ion batteries. They established temperature prediction models using backpropagation neural networks (BP-NN), radial basis function neural networks (RBF-NN), and Elman neural networks (Elman-NN), respectively, and compared the temperature prediction performance of different neural network modeling techniques. The MSE and MAE values did not exceed 0.3. At the same time, it is noted that the Elman-NN model has good adaptability and generalization ability, and has fast convergence speed. Bezha et al. [51] proposed a method for estimating the internal impedance parameters of lithium-ion batteries based on ANN, which achieves accurate estimation of the actual state of the battery within 30 s, with a maximum error of less than 3%. The model has good universality. Hussein et al. [52] used the ANN model to estimate the SOC of electric vehicle power batteries, and the error was less than 3%. Jaliliantabar et al. [54] constructed an ANN model to predict the SOT of lithium-ion batteries with the mean absolute percentage error (MAPE) being about 0.331.
Overall, the ANN model has good performance in the process of battery state prediction, the calculation process is fast and convenient, and the prediction result is relatively accurate, suitable for all kinds of battery. ANN also has some weaknesses, such as the model estimation accuracy being greatly affected by the training sample data, the large amount of data is an important prerequisite for obtaining accurate estimates, and the prediction ability of small sample data is poor. The parameters of ANN are complex, and it is easy to fall into the local optimization of parameters in during training, resulting in overfitting.
Due to the lack of clear methods for selecting network structures, the prior experience or comparison of multiple models is required to determine the final network structure.
(2) SVM model The SVM model is a new learning machine that maps nonlinear functions based on statistical learning theory. It maps nonlinear problems in low-dimensional space to linear problems in high-dimensional space through kernel function to complete the modeling of complex nonlinear systems, and find an appropriate hyperplane to complete accurate classification of data. Its principle is shown in Figure 5.
Overall, the ANN model has good performance in the process of battery state prediction, the calculation process is fast and convenient, and the prediction result is relatively accurate, suitable for all kinds of battery. ANN also has some weaknesses, such as the model estimation accuracy being greatly affected by the training sample data, the large amount of data is an important prerequisite for obtaining accurate estimates, and the prediction ability of small sample data is poor. The parameters of ANN are complex, and it is easy to fall into the local optimization of parameters in during training, resulting in overfitting. Due to the lack of clear methods for selecting network structures, the prior experience or comparison of multiple models is required to determine the final network structure.
(2) SVM model The SVM model is a new learning machine that maps nonlinear functions based on statistical learning theory. It maps nonlinear problems in low-dimensional space to linear problems in high-dimensional space through kernel function to complete the modeling of complex nonlinear systems, and find an appropriate hyperplane to complete accurate classification of data. Its principle is shown in Figure 5. Unlike ANN, the SVM method has stricter mathematical proof, lower computational complexity, faster convergence speed, and can effectively prevent local parameter optimization problems. SVM is not sensitive to the dimensions and variability of data, and is suitable for classification and regression of complex small sample data. Additionally, it has strong generalization ability and high estimation accuracy [56]. Some researchers use Low-dimensional space High-dimensional space Unlike ANN, the SVM method has stricter mathematical proof, lower computational complexity, faster convergence speed, and can effectively prevent local parameter optimization problems. SVM is not sensitive to the dimensions and variability of data, and is suitable for classification and regression of complex small sample data. Additionally, it has strong generalization ability and high estimation accuracy [56]. Some researchers use SVM to estimate the states of battery, for example, Deng et al. [57] applied SVM to diagnose the fault state of electric vehicle power batteries, with an accuracy of over 89%; Wang et al. [58] used an SVM method to model the nonlinear dynamic characteristics of batteries based on a small number of experimental data samples, with an estimated maximum relative error of 3.61%; Chen et al. [59] constructed an SVM model to predict SOH online using charging data, achieving an error of less than 2%; Li et al. [60] proposed a method to indirectly estimate the RUL by using SVM model. Compared with ANN methods, the SVM model has higher accuracy and shorter computational time, with a maximum error of 5% in battery states estimation. Some researchers have also used a combination of SVM and other methods to estimate battery states [61][62][63][64][65].
However, the SVM model still has some shortcomings in practical applications, for instance, feature vector is difficult to measure and calculate, kernel parameters are difficult to select, the model is highly dependent on cross-training and regularization methods and sensitive to missing data during feature vector selection or training process.
(3) GPR model The GPR model is a universal and resolvable non-parametric probability model that uses a priori of Gaussian processes to conduct regression analysis on data. In theory, it can achieve universal approximation of any continuous function in compact space and has been applied in the fields of time series analysis, image processing, and automatic control. GPR has the advantage of low computational complexity in solving high-complexity problems. Many researchers have constructed GPR battery states estimation models to solve complex nonlinear problems during the process of electric state changes [22,23,[66][67][68][69][70][71][72][73][74][75][76][77]. Wang et al. [22] established a battery states estimation model based on GPR, which has prominent accuracy and robustness, with a maximum relative error within 2%. Liu et al. [23] established a data-driven GPR model to predict the battery SOH, and achieved high accuracy under the premise of small sample size input. Except for some individual point estimation errors greater than 3%, most of them are less than 1.5%. Zhou et al. [66] designed a cyclic GPR model with delayed feedback loop to estimate the status of batteries, which has high accuracy and robustness, with an estimation error of 1.12%. Yang et al. [67] proposed a GPR model based on the charging curve to estimate SOH. The model has good robustness and reliability, and the estimation error of SOH is mostly less than 2%. Wang et al. [68] proposed a data-driven integrated Gaussian process regression (GPR) model to estimate SOH by comparing and analyzing the influence of different mean and kernel functions on the estimation accuracy of GPR model, achieving mean absolute error (MAE) and root mean square error (RMSE) of 1.7% and 2.41%, respectively. Pang et al. [69] proposed a GPR model for estimating battery RUL, which has high estimation accuracy and achieves battery RMSE less than 0.04.
In summary, GPR has the advantages of high model prediction accuracy and probability density prediction results. There are two main shortcomings. Firstly, due to the inherent structure of the GPR model, the computational complexity is high when analyzing large amounts of data; Secondly, the GPR model has more hyper-parameters, and the hyper-parameter adjustment process is tedious during training.
In addition, some scholars have utilized other machine learning models, such as particle filtering [78], Wavelet [79], Extra tree [80], Gradient boosting method [81], Linear compression [82], KNN [83], etc., to estimate battery states and have achieved certain results. A summary of machine learning models and their corresponding advantages and disadvantages is shown in Table 2.

Deep Learning Model
The concept of deep learning originates from the research of artificial neural networks, which use a processing mechanism of combining multiple hidden layers to stack and processing output layer by layer, to transform the low-level feature representations that is not closely related to the initial and target into higher-order abstract features that are more closely related to the target, in order to discover distributed feature representations of data. In recent years, this method has gradually been applied to the estimation of battery states and has achieved good results in mapping battery states data to typical state specific features to estimate battery states. The common deep learning models include CNN, RNN, MLP [84,85], etc. This paper focuses on analyzing the state of the research on constructing battery state models using CNN and RNN.
(1) CNN model The CNN model is a deep feed-forward neural network model including convolution operation inspired by the mechanism of biological receptive field. It is composed of multilayer networks, including input layer, convolution layer, pooling layer, fully connected layer, and output layer. The basic structure is shown in Figure 6. Compared with fully connected neural networks, CNN automatically extracts the salient features of feature data by changing the fully connected layer to the convolution layer and pooling layer, using multiple convolution pooling operations, and then uses mathematical statistics methods or classifiers to output after the full connection layer to complete the nonlinear mapping from input to output. The CNN model utilizes a network structure of sparse connections and parameter sharing to reduce the complexity of the model, significantly reducing the number of network weights, and has advantages such as automatic feature extraction, anti-noise interference, and end-to-end learning. It is widely used in machine vision, state diagnosis, and other fields.

Deep Learning Model
The concept of deep learning originates from the research of artificial neural networks, which use a processing mechanism of combining multiple hidden layers to stack and processing output layer by layer, to transform the low-level feature representations that is not closely related to the initial and target into higher-order abstract features that are more closely related to the target, in order to discover distributed feature representations of data. In recent years, this method has gradually been applied to the estimation of battery states and has achieved good results in mapping battery states data to typical state specific features to estimate battery states. The common deep learning models include CNN, RNN, MLP [84,85], etc. This paper focuses on analyzing the state of the research on constructing battery state models using CNN and RNN.
(1) CNN model The CNN model is a deep feed-forward neural network model including convolution operation inspired by the mechanism of biological receptive field. It is composed of multilayer networks, including input layer, convolution layer, pooling layer, fully connected layer, and output layer. The basic structure is shown in Figure 6. Compared with fully connected neural networks, CNN automatically extracts the salient features of feature data by changing the fully connected layer to the convolution layer and pooling layer, using multiple convolution pooling operations, and then uses mathematical statistics methods or classifiers to output after the full connection layer to complete the nonlinear mapping from input to output. The CNN model utilizes a network structure of sparse connections and parameter sharing to reduce the complexity of the model, significantly reducing the number of network weights, and has advantages such as automatic feature extraction, anti-noise interference, and end-to-end learning. It is widely used in machine vision, state diagnosis, and other fields.  CNN models have been applied in estimating battery states and achieved many research results [34][35][36][86][87][88][89][90]. For example, Wei et al. [34] constructed a CNN model and trained the model based on the battery common dataset to predict the remaining life of lithium-ion batteries. The life prediction results are superior to those of other existing methods. Qian et al. [35] designed a 1D-CNN model architecture to estimate the battery SOC using random segments of the charging curve as inputs. The models have good robustness and accuracy and can accurately estimate battery SOC. Lu et al. [36] proposed a CNN model for battery SOC estimation, which was used to estimate battery SOC based on partial voltage data during battery discharge. It can accurately estimate battery SOC with limited voltage data, and the MAPE is about 0.55%. Chemali et al. [86] proposed using a CNN model driven by partial charging data to estimate battery SOH with an average error of less than 0.8%. Sohn et al. [88] constructed a CNN model to extract features that can reflect the dynamic changes in battery performance and accurately predict the battery SOC. Shen et al. [89] constructed a deep convolutional neural network (DCNN) that can accurately estimate battery SOC by using measurement data during charging, with higher accuracy and robustness.
Some researchers utilize the feature extraction ability of CNN and combine it with other machine learning methods for power battery state estimation [91,92]. For example, Yang et al. [91] estimated the battery health status by building a CNN and random forest hybrid model, which improved the estimation accuracy and robustness compared with the single CNN model. In addition, some researchers have constructed CNN models with various forms for battery health status research by coordinating with some adjust-ments and improvements, such as LeNet [93], ALexNet [94], VGGNet [95], ResNet [96], DenseNet [97], etc.
In summary, the CNN models have high recognition accuracy and good robustness and have achieved good application in battery states estimation field. However, as a feedforward neural network, from the input to the output of the network from the bottom to the top unidirectional connection, CNNs have the disadvantages that the input samples are independent of each other, the output dimension is relatively fixed, and the output only depends on the current input. At present, CNNs tend to have smaller convolutional kernels, deeper network structures, fewer pooling layers, and gradually develop towards fully connected networks.
(2) RNN model The RNN model is a new type of neural network that takes sequence data as input and realizes short-term memory capability through recursive loop units in the evolution direction of sequence. Compared with CNN, the neurons of RNN can not only accept information from other neurons, but also their own information. Through the network parameter feedback mechanism, the important information of the network can be retained and updated for a certain period of time, which presents significant advantages in the modeling of time series problems. RNN has a loop network structure, connecting all nodes in a chain manner. The simplified model is shown in Figure 7. RNN has the characteristics of memory ability, parameter sharing, etc. It can theoretically approximate any nonlinear dynamic system. It has certain advantages in learning the nonlinear characteristics of sequences, and it has been widely used in speech recognition, machine translation, and other tasks. Some scholars have utilized RNN's strong ability to map high-dimensional and strongly nonlinear data to estimate battery states by constructing RNN models, which have achieved good results [8,38,39,[98][99][100]. For example, Catelani et al. [8] used an RNN model to estimate the RUL of lithium-ion batteries with good accuracy. Feng et al. [38] attempted to construct an RNN framework for estimating battery SOC, which showed good estimation performance with RMSE of less than 1.29%. Hsieh et al. [39] predicted the discharge state of batteries by building an RNN model framework, with an error rate of less than 2%.
The RNN model can only learn information that is close in time, making it difficult to apply to sequence data that require long-term dependence. When the input sequence is relatively long, there will be long-range dependence problems caused by gradient vanishing and gradient explosion during the training process of RNN networks. The most effective way to address this issue is to introduce a gating mechanism, which is called the LSTM unit and the GRU unit.
The LSTM unit updates the memory information of the unit by introducing a specialized memory unit that controls the input, memory, and output of information through input gate, forget gate, and output gate. The basic structure is shown in Figure 8 Some scholars have utilized RNN's strong ability to map high-dimensional and strongly nonlinear data to estimate battery states by constructing RNN models, which have achieved good results [8,38,39,[98][99][100]. For example, Catelani et al. [8] used an RNN model to estimate the RUL of lithium-ion batteries with good accuracy. Feng et al. [38] attempted to construct an RNN framework for estimating battery SOC, which showed good estimation performance with RMSE of less than 1.29%. Hsieh et al. [39] predicted the discharge state of batteries by building an RNN model framework, with an error rate of less than 2%.
The RNN model can only learn information that is close in time, making it difficult to apply to sequence data that require long-term dependence. When the input sequence is relatively long, there will be long-range dependence problems caused by gradient vanishing and gradient explosion during the training process of RNN networks. The most effective way to address this issue is to introduce a gating mechanism, which is called the LSTM unit and the GRU unit.
The LSTM unit updates the memory information of the unit by introducing a specialized memory unit that controls the input, memory, and output of information through input gate, forget gate, and output gate. The basic structure is shown in Figure 8. Some researchers have used LSTM to estimate battery states [101][102][103][104][105][106][107][108]. Yang et al. [101] established an LSTM model to predict the battery SOH with error of less than 3%, and the model has better accuracy and stability. Zhang et al. [102] built an LSTM model, which can accurately estimate the SOC and RUL of lithium-ion batteries. Park et al. [103] proposed an LSTM model to estimate battery RUL, the MAPE of the model reached 0.47-1.88%. Unlike LSTM, the GRU units simplify gates by merging input gates and forget gates into update gates and solve the long-range dependency problem by setting a reset gate to control the balance between input and forget. The basic structure is shown in Figure 9. Researchers have applied GRU to estimate battery states [109][110][111][112][113][114], for example, Yang et al. [109] used a GRU model to estimate the battery SOC by using current, voltage, and temperature data, with a maximum root mean square error of 3.5%. The model has good robustness. Guo et al. [110] used the GRU model to predict the RUL of lithium batteries with different charging strategies, which can provide accurate prediction results under different charging strategies. The root mean square error of prediction can be controlled within 1%, and the prediction response speed is very fast. Lyu et al. [111] estimated the battery SOC based on a GRU model, noting that GRU outperforms LSTM and RNN in network performance and estimation accuracy.

Input gate
Output gate Reset gate Unlike LSTM, the GRU units simplify gates by merging input gates and forget gates into update gates and solve the long-range dependency problem by setting a reset gate to control the balance between input and forget. The basic structure is shown in Figure 9. Researchers have applied GRU to estimate battery states [109][110][111][112][113][114], for example, Yang et al. [109] used a GRU model to estimate the battery SOC by using current, voltage, and temperature data, with a maximum root mean square error of 3.5%. The model has good robustness. Guo et al. [110] used the GRU model to predict the RUL of lithium batteries with different charging strategies, which can provide accurate prediction results under different charging strategies. The root mean square error of prediction can be controlled within 1%, and the prediction response speed is very fast. Lyu et al. [111] estimated the battery SOC based on a GRU model, noting that GRU outperforms LSTM and RNN in network performance and estimation accuracy.
The above are only some typical application cases, and there are still many successful application cases of RNN in building a health estimation model for power batteries [126][127][128], which will not be repeated. temperature data, with a maximum root mean square error of 3.5%. The model has good robustness. Guo et al. [110] used the GRU model to predict the RUL of lithium batteries with different charging strategies, which can provide accurate prediction results under different charging strategies. The root mean square error of prediction can be controlled within 1%, and the prediction response speed is very fast. Lyu et al. [111] estimated the battery SOC based on a GRU model, noting that GRU outperforms LSTM and RNN in network performance and estimation accuracy. It can be seen that RNN and its improved models can be used for estimating the battery states, suitable for high-dimensional and big data sample learning, and can extract deep spatiotemporal features. The error of battery states estimation is relatively low. However, CNN requires a large amount of sample data and faces difficulties in obtaining sample data. In summary, the models based on deep learning have achieved significant application effects in the battery states estimation. The application fields and corresponding characteristics are summarized in Table 3. Table 3. Application and characteristics of CNN model.

Battery States Application Strength Weakness
SOH [86,87,91,101,102,[104][105][106]114,116,117,122,125] Used for electric health state estimation Suitable for high-dimensional big data samples, reducing the number of parameters through weight sharing and aggregation, automatically extracting features and integrating them with classifiers to achieve end-to-end learning, with high accuracy and resistance to noise interference.
Due to the constraints of the model structure, the input and output dimensions cannot be arbitrarily changed, and the sample length is required to be fixed. It is difficult to process time series data, and overfitting occurs easily when the data sample size is large.

Hybrid Model
A hybrid model refers to a high-precision model constructed by comprehensively utilizing the different characteristics of different models. In recent years, hybrid models have attracted the attention of researchers. Some of them have put forward the comprehensive use of hybrid model to estimate the battery state and achieved good results [39][40][41]48,[129][130][131][132][133][134][135][136][137][138][139][140][141][142]. For example, Song et al. [39] tried to build a hybrid model CNN-LSTM to estimate the battery SOC by using the feature extraction capability of CNN and the time series prediction capability of RNN, extracted advanced spatial features from the original data through CNN, captured the nonlinear relationship between SOC and measurable data such as current, voltage and temperature through LSTM, and obtained better performance than the LSTM or CNN single model. The maximum average error and the maximum root mean square error of SOC estimation was less than 1.5% and 2%, respectively. Xu et al. [40] proposed that the hybrid model CNN-LSTM was added jump connection (as shown in Figure 10), which solved the problem of neural network degradation caused by multi-layer LSTM, not only significantly improved the SOH estimation accuracy, but also reduced the calculation amount of the model. RMSE was below 0.004 on both the NASA and Oxford battery data sets. Ren et al. [41] estimated the battery RUL using a hybrid CNN-LSTM model, the accuracy and RMSE estimated based on the hybrid model were 94.97% and 5.03%, respectively, which were much better than the SVM model (corresponding results were 81.77% and 18.23%, respectively), and the model has good generalization ability and robustness. the battery energy state (SOE) by building a hybrid model framework consisting of LSTM and CNN, with an estimation error of less than 3%. Zhang et al. [133] proposed a hybrid model IPSO-CNN-ILSTM for estimating the battery RUL status. The mapping association between fusion features and RUL was established through CNN and improved LSTM, and the improved particle swarm optimization (IPSO) was used to optimize the weights and learning factor parameters of the mapping network to achieve a battery life estimation result of about 0.7% MAPE, the estimation result of MAPE is better 5.0% than the CNN-LSTM model. Zraibi et al. [134] estimated the battery RUL by constructing a hybrid model CNN-LSTM-DNN and comprehensively using the advantages of CNN, LSTM, and DNN. The error and robustness of the hybrid model estimation are better than that of the single model. Yanwen et al. [135] used CNN to extract health status characteristics, mined data time series features with local features of long and short-term memory (LSTM), and combined with a GRU cell to build a hybrid model, which significantly improved the accuracy of battery SOH estimation. In summary, hybrid models can selectively select models according to target requirements and data resources and design learning networks of different depths and widths, with higher accuracy and stronger robustness. All in all, data-driven status estimation technology for power batteries is relatively mature at present, and different methods have their own advantages and applicable scenarios. Traditional machine learning models typically require extracting statistical features from data based on expert experience, and then using classifiers for battery states estimation. The accuracy is greatly affected by manually selecting features. The deep learning models have the characteristics of automatically extracting distributed features from massive data and enabling end-to-end learning of data. Their applications in battery states estimation methods are becoming increasingly popular, with high accuracy and great robustness. The hybrid model consists of multiple different models, which can fully utilize the advantages of different models to further enhance generalization ability and recognition robustness, and it has good research and application prospects. The performance characteristics of machine learning models, deep learning models, and hybrid  Chen et al. [129] proposed a hybrid model ELM-BSASVM composed of an extreme learning machine and an SVM model to predict the battery RUL. The hybrid model has good robustness, and the RMSE value is 79.68% better than the SVM model. Zhao et al. [130] and Liu et al. [131] explored a hybrid CNN and GRU model for battery SOH estimation and validated the model by reconstructing feature series samples on the Oxford battery dataset. The average values of RMSE and MAE reached 0.582% and 0.524%, respectively. Wang et al. [48] proposed a hybrid CNN and LSTM model for predicting battery temperature during the charging phase of electric buses, which was used to predict short-term temperature changes of batteries in the future. The model not only has excellent accuracy and robustness, but also reduces time and space costs. Mei et al. [132] estimated the battery energy state (SOE) by building a hybrid model framework consisting of LSTM and CNN, with an estimation error of less than 3%. Zhang et al. [133] proposed a hybrid model IPSO-CNN-ILSTM for estimating the battery RUL status. The mapping association between fusion features and RUL was established through CNN and improved LSTM, and the improved particle swarm optimization (IPSO) was used to optimize the weights and learning factor parameters of the mapping network to achieve a battery life estimation result of about 0.7% MAPE, the estimation result of MAPE is better 5.0% than the CNN-LSTM model. Zraibi et al. [134] estimated the battery RUL by constructing a hybrid model CNN-LSTM-DNN and comprehensively using the advantages of CNN, LSTM, and DNN. The error and robustness of the hybrid model estimation are better than that of the single model. Yanwen et al. [135] used CNN to extract health status characteristics, mined data time series features with local features of long and short-term memory (LSTM), and combined with a GRU cell to build a hybrid model, which significantly improved the accuracy of battery SOH estimation. In summary, hybrid models can selectively select models according to target requirements and data resources and design learning networks of different depths and widths, with higher accuracy and stronger robustness.
All in all, data-driven status estimation technology for power batteries is relatively mature at present, and different methods have their own advantages and applicable scenarios. Traditional machine learning models typically require extracting statistical features from data based on expert experience, and then using classifiers for battery states estimation. The accuracy is greatly affected by manually selecting features. The deep learning models have the characteristics of automatically extracting distributed features from massive data and enabling end-to-end learning of data. Their applications in battery states estimation methods are becoming increasingly popular, with high accuracy and great robustness. The hybrid model consists of multiple different models, which can fully utilize the advantages of different models to further enhance generalization ability and recognition robustness, and it has good research and application prospects. The performance characteristics of machine learning models, deep learning models, and hybrid models are summarized in Table 4. Different models have different characteristics and need to select reasonably based on goals, data resources, and other factors.  [109][110][111][112][113][114] Others [115][116][117][118][119][120][121][122][123][124][125][126][127][128] Accurate estimation of SOC, no initial SOC needed, easily filter noise in data with the gating mechanism to consider the influence of time dimension, high accuracy, large computational power and resource occupation, and slow convergence speed of multi-feature data Requires big data, large amount of calculation needed, complicated Hybrid models [39][40][41]48,[129][130][131][132][133][134][135][136][137][138][139][140][141][142] CNN + RNN Considering the advantages of different models, it has strong generalization ability and good robustness

Future Development Trends
With the application of technologies such as artificial intelligence, network communication, and advanced sensors, automobiles are accelerating towards electrification, intelligence, and networking. Data-driven power battery states estimation methods have broad application prospects in the field of battery safety management for electric buses. In terms of the future, power battery states estimation methods have shown the following development trends.
(1) The accuracy and reliability of data-driven battery states estimation methods largely depend on the quality and quantity of sample data, as well as the diversity of working scenario. During the actual condition of electric vehicles, environmental conditions such as road slope, climate, as well as abnormal voltage and temperature, can cause the battery modules to deviate from their equilibrium state. Based on natural driving, on-board sensor data have gradually become the main source of data for battery states estimation. The use of onboard sensors and network technology to obtain battery states data under actual scenarios has the advantages of objectivity, accuracy, reliability, massive data, and low acquisition cost, which can provide reliable data support for accurate estimation of battery states. Meanwhile, data migration learning, and confrontation generation learning will provide new solutions for generating massive training data samples [95,140,[143][144][145][146][147]. Tang et al. [147] combined industrial data with accelerated aging testing through migration-based machine learning to restore large-scale battery aging datasets and reduce the cost of aging testing. Li et al. [93] migrate the CNN model pre-trained on the large capacity battery data set to the small target battery data set through the transfer learning technology to improve the capacity estimation accuracy. (2) The design of models is crucial for achieving accurate estimation of battery states. Further development of battery states estimation models with higher accuracy, stronger robustness, and better real-time performance is a research hotspot. The deep learning model overcomes the problem of rapid decline in estimation accuracy and generalization ability caused by changes in work scenarios and key feature parameter selection in traditional machine learning models. By automatically learning and extracting spatiotemporal features of battery states data under different work scenarios, it has the characteristics of high accuracy and good robustness and can achieve a good balance between accuracy and generalization ability. The data-driven model-based battery states estimation method using deep learning will become one of the mainstream methods, especially for the hybrid model, which can comprehensively utilize the unique advantages of various models and use multiple methods for comprehensive complementarity in different stages of battery states estimation tasks to improve the accuracy of model. It has great value for further investigation. (3) It is an important research direction to further improve the accuracy and robustness of battery states estimation by considering multiple scenarios and multiple feature parameter constraints, combined with advanced intelligent algorithms. Artificial intelligence algorithms such as deep learning and transfer learning are widely used in battery states estimation, and traditional battery states estimation methods are being reshaped and upgraded. Under the influence of multiple factors such as breakthroughs in algorithm technology, powerful computing power, and massive data, artificial intelligence has the ability to represent knowledge and make inferential decisions at multiple levels, distributions, and tasks. In some fields, artificial intelligence has reached or surpassed human level. However, in the scenario of power battery applications, the challenges faced by deep human applications of deep learning are enormous due to the complexity and variability of the environment, the lack of annotated data, and the difficulty in overcoming practical pain points. (4) The accurate and real-time requirements in engineering applications have introduced higher requirements for the software and hardware of battery management systems. Some application scenarios require real-time and accurate estimation of battery states, which poses new challenges to the performance of battery management systems. In addition to low-cost high-computing-power microprocessors and high-precision sensors, more in-depth research is also needed on the collection and processing of on-board sensor data, multi-source high-dimensional heterogeneous data fusion, efficient deep learning network architecture design, and high-performance hybrid model development. Software and hardware technologies complement each other to further improve the accuracy, reliability, and real-time performance of battery states estimation while reducing costs. How to deeply mine multisource, heterogeneous, and massive vehicle sensor data information, and then use information fusion technology to classify and process information at different levels, comprehensively utilize the characteristics of different models, and more accurately identify the true status of various batteries will become a research hotspot in the field of battery management system technology.

Summary
The states estimation method of power batteries is an important research direction in the field of new energy vehicles, it plays an important role in engineering fields such as energy storage management and safety management. This paper summarizes the datadriven battery states estimation methods including data sources and estimation models. Firstly, the current research progress of battery states estimation methods is summarized. Around the basic process of battery states estimation tasks, mainstream data sources and typical models used for battery states estimation are analyzed and discussed. By comparing and analyzing the advantages and disadvantages of three types of data sources, it is found that there is rich information hidden in onboard sensor data and the natural driving data can be used as the main data source for battery states estimation. Secondly, the models for battery states estimation are divided into three categories: machine learning models, deep learning models, and hybrid models. By analyzing the research progress of various models, the paper notes that the deep learning models represented by CNN, RNN, and hybrid models have advantages in accuracy, generalization ability, and other aspects compared with traditional machine learning models. Hybrid models have received significant attention from researchers, becoming the mainstream of data-driven battery states estimation. Additionally, battery states estimation methods face multiple challenges including higher accuracy, real-time performance, and robustness which are important trends. Scholars need to continue to conduct in-depth and detailed theoretical and applied research on data-driven battery states estimation methods, which has profound significance for improving the safety of new energy vehicles and the healthy development of the automobile industry.