Review of "grey box" lifetime modeling for lithium-ion battery combining physics and data-driven methods

Lithium-ion batteries are a popular choice for a wide range of energy storage system applications. The current motivation to improve the robustness of lithium-ion battery applications has stimulated the need for in-depth research into aging effects and the establishment of lifetime prediction models. This paper reviews different combination approaches of physics-based models and data-driven models. The three basic physics-based battery lifetime models are introduced, and requirements and features are compared from an application perspective. Then, state-of-the-art approaches for integrating physics and data-driven methods are systematically reviewed. Flowcharts present each approach to offer the readers a clear understanding. Next, the publication trends are represented by line graphs, and pie charts, including data-driven assisted physical models and physics-guided data-driven, different physical model applications, and data-driven approaches. It is concluded that electro-chemical models have great potential to describe complex aging behavior under various conditions. Moreover, machine learning is a promising tool to overcome mechanistic absence and highly nonlinear performance, occupying 78 % of all data-driven methods. Physics-guided data-driven approach started to emerge as an innovative lifetime prediction method after 2020. The application advantages and limitations are compared according to the description of different methods. Furthermore, future perspectives are discussed, with opportunities


Introduction
With the increasing focus on using clean and renewable resources, lithium-ion batteries (LiBs) have attracted a lot of attention for replacing fossil fuels because of their high energy density, high charging efficiency, long lifetime, low maintenance, and low maintenance selfdischarge.In recent years, the development of LiBs energy storage technology (EST) has been emphasized by different countries' transportation and energy sectors.LiB ESTs are the first choice for powering the EVs/HEVs power, in the transportation sector.For instance, Tesla is using LFP-based prismatic cells as its power source.In the energy sector, LiBs EST can enhance the grid integration of renewables by acting as a power and energy buffer [1].However, the capacity/power capability of LiBs gradually decreases with the actual operation [2], leading to reduced service life and even creating some safety hazards [3].
During the long-term operation of LiBs, their performance is degrading and can be quantified as capacity y fade, resistance increase, and power decrease.Based on the variation of these parameters, batteries' State of Health (SOH) is defined as the ratio between their current value and the value at the beginning of life (BOL).The lifetime is defined as the length of time between the BOL and end of total useful life (i.e., when the battery reaches a predefined threshold value such as SOH = 80 %).Accurate estimation of SOH and lifetime is essential for lifetime modeling of LiBs and their reliable operation in a certain system or application [4,5].By assessing SOH and predicting lifetime, the performance of each cell can be identified, and information on battery lifespan can be obtained in advance, thus ensuring the safe and reliable operation of the battery system as well as planning maintenance tasks.
LiBs are considered complex electrochemical systems with strong nonlinearity and time-varying properties, where performance degradation at the cell level is mainly based on chemical degradation reactions at the electrodes and electrolyte levels.The different degradation mechanisms can be divided into lithium inventory loss and active material loss, resulting in capacity fade and resistance increase [6].It has been shown that solid electrolyte interphase (SEI) and lithium-plated film layers formed on the anode electrode are generated by the consumption of recyclable lithium ions [7] and can scale up to hundreds of nanometres in thickness [8].The same deposition occurs at the cathode as a cathodic electrolyte interphase [9].In addition, graphite exfoliation, adhesive decomposition, electrical contact loss due to current collector corrosion, and electrode particle cracking due to mechanical stresses [10] lead to the loss of active material [11].From this point of view, an underlying analysis of degradation mechanisms and the construction of physical models can help to improve the capabilities of assessing LiBs' SOH and lifetime.Despite the challenges of complex model parameters identification and online application, enthusiasm for physical modeling to guide BMS predictions is well underway [12].When cells are discharged at high C-rates, where C-rate is a measure of charging/discharging current compared to rated capacity, the temperature rises dramatically, coupling electrochemical reactions that affect battery performance.Studies on battery thermal management systems (BTMS) [13][14][15] and cooling technology [16][17][18] focus on maintaining cell temperatures within working ranges to increase service life.
Data-driven approaches have stepped out in recent years and are well positioned to address the shortcomings of physics-based approaches, as they can learn from high-quality data to accurately capture the dynamic behavior of batteries with a reasonably low computation burden.Data-driven approaches have been classified as machine learning methods, filtering techniques, stochastic methods, and time series methods [19,20].However, some stochastic-based methods can be regarded as a type of probabilistic machine learning [21,22].The main limitation of data-driven methods is its reliance on sufficient training data, which are closely related to battery degradation.With the advent of the Big Data era, physical approaches combined with datadriven approaches are favored by researchers.Some review papers have presented different ways of combining the two aforementioned approaches, five of which are summarized in Table 1.These five papers focus on the state of the art, comparison, and future prospects of the different integration strategies, and mainly discuss interdisciplinary hybrid approaches from the view of computer science and material science.The physical models for battery lifetime prediction mainly focus on electrochemical models and a few equivalent circuit models, not enough attention is given to other semi-empirical models.For datadriven prediction methods the main focus is on machine learning and the description of the way other methods are combined is not deep enough.The main contributions of this paper are listed below:

Nomenclature
• Physics-based lifetime modeling for lithium-ion batteries is classified into three broad categories and.The corresponding model flowcharts are presented.The requirements and capabilities of these models are compared from an application perspective.• The combination of physical and data-driven approaches is divided into two main categories.The first one is data-driven assisted physical models, termed as physical model prediction is the primary driver, and data-driven methods assist it.The other one is physicsguided data-driven, where a physical model is used to guide and constrain data-driven predictions.The different approaches are illustrated with flowcharts.• The publication trend of selected papers is presented as line graphs.
Different trends in the application of physical models and trends in the application of data-driven methods are discussed.The requirements, advantages and disadvantages of different integration methods are compared.Readers can select an appropriate method based on their available resources.• Future development based on physic guided data-driven are proposed.Considering EM-PINN is recognized as a promising direction.
It is challenging to simultaneously overcome the high complexity of EMs and combine it with machine learning to improve the computational efficiency of online applications.
The rest of this review is organized as follows.Section 2 introduces different physics-based models for battery lifetime prediction.Section 3 focuses on the status of different combinations of physical and datadriven forecasting.Then research trends, comparison from resources, prons and cons of each combination method and future perspectives are presented in Section 4. Finally, Section 5 gives conclusions followed by prospects.

Physics-based battery lifetime modeling
Dynamic modeling is an essential element of battery health management.The SOH of LiBs is influenced by many factors such as temperature, charge/discharge current rate, cycle depth, state-of charge (SOC) and cut-off voltage etc.It cannot be obtained by direct measurement but can be obtained by model assessment.Accurate lifetime prediction requires consideration of current SOH, historical usage data, and failure mechanism, and still remains a challenge.Considering this review mainly focuses on dynamic lifetime prediction from a physical perspective, this section discusses the commonly used physics-based models, Electrochemical model (EM), Equivalent Circuit model (ECM), and semi-empirical model.

Electrochemical model
The most popular used electrochemical model is the P2D model.It was designed by Doyle and Newman [28,29] to simulate the whole battery behavior, covering all the essential components of lithium-ion batteries.The model can be understood as a puncture from the cell through five layers in sequence: the negative current collector, the anode electrode, the separator, the cathode electrode, and the positive current collector.The "2D dimension" refers to the dimension along the x-direction of electrode thickness and the r-direction of the particle radius inside the porous electrode.The basic modeling process simplifies the reaction internal to the cell in 4 steps.
• The conductivity of the positive and negative electrode collectors tends to infinity, and there is no significant change in conductivity in the y-axis and z-axis directions.• The active electrode material consists of a porous structure with uniform distribution of spherical particles to avoid inhomogeneous structure and distributing particles of active material.• The double-layer effect is ignored to simplify the distribution state of lithium ions on the electrolyte and electrode surface.• The ionic transport in the electrolyte only includes diffusion and electromigration, and convection is not considered.
The P2D model follows mass conservation, where the substance is constant before and after the reaction, and charge conservation, where the current is equal to the sum of the solid and liquid phase currents at any given moment [30].Mass transfer refers to the motion of lithium ions occurring within the electrolyte and active material particles; 1) Using the Nernst-Planck equation to describe the diffusion process, where the diffusion process is related to the lithium-ion concentration gradient, and the liquid phase diffusion coefficient, and the migration process is related to the liquid phase potential distribution and concentration distribution.2) Using Fick's law to describe the diffusion process of lithium ions within the solid phase particles, where the reaction rate of the process is related to the solid phase diffusion coefficient and the concentration gradient of lithium ions from the solid phase.The charge transfer originates at the surface of the electrode active material particles.The Butler-Volmer equation describes the relationship between the local current density, the exchange current density and the overpotential.It is the bridge between the reactions occurring in the electrolyte and active electrode material.The model and reaction equations are shown schematically in Fig. 1.Table 2 lists the P2D model's governing equations.For all material properties and parameter values used in models, we refer to [31,32] for different chemistries.In several applications of the P2D model, researchers have developed continuous scale models from 1D to 3D.The distinction between different dimensions of the model is shown in the Table 3. 2D and 3D models can also be integrated with 1D EMs for analysis, and we refer to [33,34].The P2D model is powerful in that it simplifies the modeling of batteries at multiple micro-macro and time-space scales and achieves a high computational accuracy.Many studies [47] ) ) [39,40] Exchange current density Guo et al. contributions: one from the graphite particle fraction covered by the microporous SEI layer and the other from the cracked graphite particle fraction of the SEI layer.However, the above is only applicable to describe the aging process of graphite electrodes for Li-ion batteries with a moderate current up to 1C.Further research [48] has found that aging is linearly related to the number of cycles in the early stage of cycling, and this linear aging regime is dominated by SEI growth.As the cycling proceeds, the SEI grows and the anode porosity decreases, resulting in a more significant electrolyte gradient in the anode and, therefore, a lower lithium deposition potential.In turn, the appearance of lithium metal further accelerates the reduction in local anode porosity.This positive feedback leads to an exponential increase in the lithium plating rate and a dramatic decrease in the local porosity at the anode/separator interface.The cell aging characteristics shift from linear to non-linear.
The single-particle model (SPM) is a common simplified type of P2D, which assumes the current distribution is uniform in electrodes.The single-particle scale analyzes the kinetics of solid diffusion and intercalation reactions in electrode particles.SPM is a 0D model in COMSOL simulation.Considering coupled chemical and mechanical degradation, an advanced aging model was derived from alleviating low accuracy at high C-rates [49].In situations such as the above, where performance prediction is computationally intensive, SPM can be used instead of the complex P2D model [38].

Equivalent circuit model
A common phenomenological approach used to describe the behavior of batteries is the ECM, a model consisting of electrical components such as RC networks, voltage sources, resistors, etc., to represent the main electrochemical processes.In contrast to the EM model, an EMC model does not require an in-depth analysis of the internal electrochemical reactions inside the battery.The external characteristics of the battery can be modeled by describing the open-circuit voltage, the DC internal resistance, and the polarized internal resistance through a circuit.Typical ECM models include the Rint model [50], the Thevenin model [51,52], the second-order RC network [53] and their variants [54][55][56].
The polarization phenomenon of batteries consists of ohmic polari- (1) When n = 1, Q is equivalent to C. When n = 0.5, it is equivalent to an infinite solid state diffusion process; when n = 0, it is equivalent to a resistance.
To account for ohmic resistance, lithium-ion diffusion, migration, and charge accumulation interpolation capacitance of the host material, the identification, and parameterization of the ECM are usually performed using EIS in frequency domain analysis.
As shown in Fig. 2, the impedance spectrum shows a tail of inductive behavior at high frequencies, which is attributed to the porous nature of the cell electrodes and the connecting leads of the jelly-roll structure; the intercept on the real axis represents the total ohmic resistance of the cell, including electrolyte resistance, contact resistance, and electronic contacts.The depressed semicircle in the mid and high frequencies is attributed to the solid electrolyte layer at the membrane electrodesolution interface.The semicircle in the mid-frequency range characterizes the charge transfer kinetics at the electrode-electrolyte interface.
The low-frequency portion of the impedance is derived from the solid Warburg diffusion of lithium ions into the porous electrode matrix.When extremely low frequencies are present, the impedance response is related to the differential intercalation capacitance of the electrode.
As mentioned above, when modeling the behavior of a cell using ECM, a model is first pre-selected based on the shape of the measured impedance spectrum; this generally consists of a series ohmic resistor, a Warburg diffusion element, and several RC/RQ elements depending on the number of semicircles in the spectrum.According to Fick's law, the Warburg diffusion element describes the diffusion process in electrochemical systems, classified as the Finite Space Warburg (FSW) element and Finite Length Warburg (FLW) element.
Considering the behavior of electrodes that can be modeled by double-layer capacitive effects and solid-phase diffusion, Randles has developed an impedance model structure representing the combination of charge transfer processes and diffusion processes.It consists of a series connection of a charge transfer resistor and a diffusion element, in parallel with a double layer capacitor.The Randles circuit argues that charge-transfer overpotentials are directly related to solid-state diffusion and cannot be independent loss processes.Therefore, the dynamic behavior of diffusion and charge transfer cannot be decoupled by separate RC or RQ cells.Nevertheless, it was demonstrated [57] that the RQ cell and Warburg cell in series did not differ significantly from the Randles circuit at different diffusion time constants, which means that a good combination of RQ and Warburg elements can accurately characterize the cell.
The capability of ECMs to measure and predict cell voltages by optimizing models and parameter values is convenient for the design of control algorithms in BMS [58,59].Yet the prediction accuracy is still within limits due to the variety of factors influencing battery aging behaviors, and how these factors can affect impedance is not fully understood.W. Guo et al.

Semi-empirical model
In addition to the above modeling approach based on physical perspectives of aging, fitting a mathematical relationship between output characteristics and different stress factors can also yield reasonable accuracy predictions.As illustrated in Fig. 3, this model aims to quantify the effects of aging factors (i.e., temperature, cycle number, C-rates, etc.) and obtain a descriptive expression for the variation of battery performance with lifetime [60].The investigation needs to be based on a large amount of accelerated experimental data [61], seeking approximate expressions for time versus battery performance in terms of data trends.Although the predictive capacity of the semi-empirical model is lower than EM models, it is preferred for industrial applications owing to the low computational complexity and easy integration within BMS, divided into cycling aging modeling [62,63], calendar aging modeling [64], and global performance modeling [65].
Cycling aging factors include the number of cycles, temperature, Crate, average SOC, and cycle depth [68].The modeling process generally uses the cycle number as a time metric [69].Laboratory efforts have shown that long-term capacity loss follows a t 0.5 dependence [66], and models in the literature usually attribute this dependence to diffusion limitations through the SEI layer [67] due to the reactants participating in the formation of the SEI layer in the electrolyte.As ΔSOC changes from 3 % to 6 %, the power decay mechanism changes significantly [70].The power-law coefficient of time drops below 0.5, also indicating that more complex mechanisms influence, cyclic aging than just SEI growth.Quantitative analysis of acceleration effects of different influencing factors on battery aging has concluded that high-temperature stress, and high charge rates are promising candidates for forced battery degradation [71].The experimental findings of Ref. [72] indicate two degradation mechanisms in the tested cells, which depend specifically on the capacity ranging above and below 70 % of their initial capacity, expressed quantitatively as a power law of time.
The main factors influencing calendar aging are time, temperature, and storage SOC [73].According to experimental data [70], the areaspecific impedance growth and power loss obey a power-law function of time and Arrhenius kinetics.The power of time is approximately 0.5 [74].This relationship can be interpreted as SEI growth.Following test results on two types of LiBs [75], high temperature and high SOC appears to be promising for accelerated calendar aging.The calendar test's capacity fade and resistance increase obey the Arrhenius law in the temperature range of 30 • C to 50 • C and 60 % SOC [76].Therefore, it is concluded that the capacity fade and resistance increase are caused by a thermal activation process linearly related to time.
A global performance model can be expressed by adding cycling and calendar aging empirical models.As seen in the End-of-Discharge Voltage (EODV) degradation curve [57], three phases indicate different aging mechanisms dominating other each stage.In the early stage, EODV shows an exponential decrease trend affected by the polarization effect.The linear decline phase is mainly due to the gradual increase in the internal resistance of the battery.The sharp drop phase corresponds to cells' performance degraded exponentially, resulting from electrolyte drying, electrode dissolution, and degradation of active materials.Considering the interaction of these different aging phases, the degradation model is described by summing the empirical models with varying weights of stage.Additionally, it is possible to apply the same type of model as previously explained to fit the overall capacity degradation behavior [77] or to get a SOH degradation model [62] by considering both long-term and short-term aging.
The main drawback of semi-empirical models is that they do not interpret the processes of capacity decline and impedance rise, relying on independent studies of the stress factor influence trends of each, W. Guo et al. leading to loads of experiment work.

Requirements and applicable features
In this part, the knowledge requirements, measurements, computation capabilities, and application features of various physics-based models are compared and summarized in Table 4.
P2D is intended to provide a clear understanding of the specific physical and chemical phenomena that occur during the operation of batteries.A reliable and accurate model can be built with sufficient background in electrochemistry and physics accurate.However, many variables are challenging to measure due to lack of facilities or lack of technical precision in measurement.Moreover, complex modeling imposes a significant computational burden.Although it has very good generation performance and the most accurate results, significantly high-test requirements and computational pressure make it unsuitable for online prediction.
SPM is a simplification of P2D, with many essential battery properties.The SPM is derived directly from P2D and consists of ordinary differential equations.In addition to the series of measurements applied in P2D, it is common to use assumptions or fit the Arrhenius behavior in the parameter determination process.Besides, the simple structure has enhanced its computational utility and made it a popular model for SOH estimation [89].But it is not capable of describing batteries' nonlinear behavior at high C-rates because of missing electrolyte physics and degradation.
ECM allows the modeling to be incorporated with training algorithms at the system level due to its conceptual simplicity and has the potential to be applied to onboard applications in vehicles.With lumped models featuring relatively few parameters, users do not need to have an in-depth understanding of physical mechanisms, only how the time and frequency domain tests relate to the model structures.However, the accuracy of the model tends to drop significantly in the low SOC region of the cell or high current situations, as the non-linear characteristics of the cell are evident.
Semi-empirical model is based on simple correlations, derived from aging tests carried out under several conditions, between stress factors and capacity degradation/impedance increase.Therefore, adequate data to develop an awareness of the impact of accelerated calendar and cycle life is fundamental.Meanwhile, power of time and Arrhenius kinetics are often used as assumptions for the initial structures.Hence, understand these models' definition and the corresponding physical meanings of coefficients help to establish semi-empirical forms with good generalization capabilities.It is important to note that the appropriate acceleration conditions must be chosen to ensure that the extrapolation is successful with limited time and cost.
Based on the above analysis, only some methods can be applied online.The fact that no physical model is perfect has inspired researchers to use algorithms or a mixture of different physical models to fill in the gaps.Interested readers can refer to [89][90][91][92].

"Grey box" lifetime modeling
Given that the aging of LiBs is caused by an evolution of multiple interfaces and materials in a wide range of use conditions, it suggests that models capable of successfully predicting battery degradation should account for potential spatial, temporal, and chemical complexities.As this evolution can be described using thermodynamic and Kinetic laws of physics, the solution to these problems requires a combination of traditional physics-based modeling methods and flexible data-driven techniques.Why we need "grey box" lifetime modeling?Data-driven methods (black box) have the drawback of heavily relying on training data, and if only the capacity is utilized as input, then the prediction results only include the capacity, while the other internal characteristics are unknown.Traditional physical models (white box) are computationally demanding and highly challenging to apply online since they rely on many different material properties and parameters.Consequently, there is a rising need for hybrid models (grey box modeling) that complement the drawbacks of both strategies and integrate their advantages.Date-driven and physics-based methods can be combined in two possible ways: (1) the data-driven method is used to assist the physics-based method when estimating and optimizing the parameters of the physical model, for downscaling the first principle based physical model or to quantify the uncertainty of the applied physical model and (2) Guide the data-drive method using data that carries physical meaning, the error between a physical model and the prediction data-driven method, or by embedding a physical model directly in the data-driven method.This section reviews the development of these methods, divided into data-driven assisted physical models, and physics-guided data-driven methods.

Data-driven assisted physical models
This section deals with methods that use physical models as the main prediction method and data-driven methods to improve the model's accuracy or quantify its uncertainty.Precise prediction depends on whether the physical model sufficiently captures the relevant physical properties of the aging.Furthermore, the most important aspects of these methods are parameter identification and fast predictability.

Parameter identification
Estimating unknown parameters in a model is also known as model calibration, and a common method is to use a grid search over the space of parameter value combinations to obtain the best match between predicted and observed values.The parameter identification process is shown in Fig. 4. Prior to choosing or building a physical model, the parameters in the model are given nominal values.Then, by using sensitivity analysis, the set of unknown parameters that should be precisely evaluated is reduced.Maximizing the ideal values of the parameters is done using data-driven methods to produce a good parametric physical model and forecast.Typically, this is an iterative process, and by evaluating it against actual performance, the process is verified and ended.
The Electrochemical model is built from a series of partial differential equations (PDEs) and not all parameters can be solved from experimental observations.As the model parameters vary with use cases and time, the capacity state estimates will deviate from the truth.It has been shown that diffusion and conductivity changing the aging [93].This has led to the incorporation of a data-driven approach to update timesensitive parameters on a real-time basis.Filtering methods [91,94] or adaptive observers [95] that consider a combination of state and parameter estimation can generate aging relevant parameter measurements.Furthermore, they can strengthen self-correction schemas including Li-ion concentration in the electrode, total cell capacity, anode diffusion coefficient and SEI layer conductivity.This ensures that the model-based capacity prediction remains accurate over time.Additionally, machine learning can be utilized to enhance physics-based models, using current [96], voltage [97], and anode expansion rate [98] and capacitance [99], as aging predictors.Miguel et al. [100] gives a comprehensive review of computational parameter estimation and optimization methods for EMs.These include single optimization analysis (SOA) and multi optimization analysis (MOA) according to parameters evaluation based on one or more optimization procedures.SOA uses a specific EM model (P2D or SPM) after collecting data to identify parameters using nonlinear least squares [101] or genetic algorithm [90,102].The highly complex optimization scheme of this method leads to a loss of accuracy.MOA designs test curves [103] to isolate specific parameters or sets of parameters.Fisher information [104] is applied to measure and optimize the solvability of a given parameter estimation problem, thereby increasing the speed of parameter identification.This method expands how much information is collected to determine accurate estimates of certain parameter values.In order to shorten the time W. Guo et al. of the parameter identification process, sensitivity analysis [104] and deep learning [105] are chosen to identify the parameter types with the highest impact on the prediction results to decrease the number of parameters, and speed up the convergence.The ensemble Kalman filter (EnKF) performs parameter identification independent of the initial state, which avoids computing the Jacobian matrix of the P2D model to reduce the computational difficulty.Other data-driven methods such as the elastic net algorithms [98] which penalize the size of the coefficients to reduce the risk of overfitting, and nonlinear least squares (LS) with dynamic bounds [106] used to track the evolution of individual parameters, are tried to reduce modeling overfitting and prediction uncertainty over the entire battery life cycle.With the goal of online prediction, popular neural network [97] is used to obtain the parameters in an SPM.The NN proposed by Kim [107], which is expected to be implemented in BMS.Since the NN model can flexibly adapt to numerous input variables and output parameters.
Battery parameter estimation using ECMs relies on large experimental designs to account for the change in parameters due to C-rate, temperature, and degradation.However, when measurements are acquired in real-time C-rate, temperature, and degradation also change in real-time.Data-driven methods have been used to account for this change when estimating the parameters of the ECM using the measured current and voltage of the battery.An improvement of the classical Kalman filter (KF) for nonlinear systems has been successfully used for the BMS of electric/hybrid vehicles.By far the most common method are joint and dual Extended Kalman filter(EKF) [108] and Sigma Point Kalman filter(SPKF) [109].However, the EKF has some drawbacks.If the assumption of local linearization is not satisfied, it will lead to a highly unstable observer.SPKFs include some variants such as the central differential Kalman filter (CDKF) and the unscented Kalman filter (UKF) [110].Such strategies require fewer samples than particle filters in terms of statistical linearization and exhibit better performance.The weighted recursive least squares algorithm (RLS) [111] is often used in combination with a KF, EKF or UKF In addition to the above filtering methods, the genetic algorithm (GA) [99] is proposed to estimate the SOH of a battery on-line using the diffusion capacitance of a second-order RC circuit model.Using genetic algorithms, the diffusion capacitance of the battery can be monitored in real-time by measuring the battery current and terminal voltage.The disadvantage is that it takes some time for GA to find the optimal solution.In literature [112], the terminal sliding mode observers are utilized to estimate three variables (open circuit voltage, polarization voltage, and terminal voltage), and two variables (capacity and internal resistance) in the ECM model, which is then adapted to make a robust estimation of SOC and SOH.This observer allows continuous output injection signal, which attenuates chattering, and eliminates the low-pass filter.Although the above advantages are presented only on a single cell.Furthermore, a study reported by Hu et al. [58] implements a multi-swarm particle optimalization algorithm to identify the optimal parameters based on twelve lumped ECM models.The first-order RC model is favored for LiNMC cells, according to RMSE comparison results, while the first-order RC model with one-state hysteresis appears to be the best option for LiFePO 4 cells.
One way to improve long-term forecasting is to combine semiempirical models with filtering algorithms [113][114][115] in order to dynamically update the model parameters.As illustrated in [116], Particle filter (PF) takes the aging parameters given by the physical scaling laws to account for the impacts of physical variation and correct the findings produced by assuming constant physical attributes, describing capacity decay and internal resistance increase as part of the state vector (given by Eq. ( 2)).The PF algorithm adapts parameters online that superimpose two exponential degradation feature models (given by Eq. ( 3)) to track and predict battery life.
where x k is the system state vector at time k, y k is the measurement at time k, f and h are the state transfer and measurement functions, q and v are the process noise and measurement noise.
where F k is the defined battery health parameter.Q k and R ref are the measured capacity and internal resistance at the k th cycle, respectively.Q ref is 80 % of the capacity nominal value and R ref is equivalent to about 133 % of the internal resistance nominal value.Since semi-empirical models are generally low-order algebraic equations and fit a small number of parameters, data-driven methods are used to assist in parameter estimation without great complexity to ensure computational efficiency.The GA can then fit the battery cycle life model very accurately [63], using the root mean square error (RMSE) between the predicted and tested battery capacity as the objective function to minimize the RMSE yielding the parameter estimates.The empirical model, which consists of two exponential models, is anticipated to be applied to on-board prognostics and health management (PHM) systems and can deliver precise predictions starting from the early stages of battery life with Bayesian Monte Carlo enhancements [117].PF updates the parameters in accordance with Bayes' law, bringing them closer to their real values over time.A numerical Monte Carlo method is used to solve the recursive propagation of the posterior density in the Bayesian update process.
Most of the effort spend in combining physical models and data- driven methods, has emerged to enable online parameter estimations of the physics based models using real-time measurements [118], avoiding tedious and expensive laboratory measurements.Although parameter identification is simple to comprehend and apply, it heavily depends on the model and a priori knowledge.

Reduced-order physical model
Reduced-order models (ROMs) attempt to capture the most important properties of more high-fidelity physical models, by reducing the dimensionality of the system, thereby reducing the computational complexity and cost.This reduction ignores weak responses that are insensitive to the global system, to obtain a 'dominant' sub-model whose response is like that of the full-order model.Fig. 5. depicts the ROM prediction flow.The EM model is discretized using four different techniques in the first category (seen as Fig. 5(a)), which greatly reduces the model order while maintaining physical significance and parameter accuracy.To achieve prediction, data-driven methods are applied to correlate the model with actual outcomes.The second category (seen as Fig. 5(b)) utilizes test data to integrate a library of descriptors with various algebraic equations that can forecast battery behavior.Datadriven methods are then used to filter the best sub-descriptors and remerge them into a global ROM to achieve prediction.
A common approach is to project the control equations of the system into a linear subspace of the original state space using methods such as orthogonal decomposition [119], time-step discretization [120], spectral methods [121] or Padé approximation [122].In [123], an electrochemical P2D model was discretized using Chebyshev orthogonal collocation.The cell region is subdivided into three sub-domains, where the model equations are solved for thickness of anode x a , thickness of separator x s and thickness of cathode x c at different sets of Chebyshev coordinate nodes.The P2D model differentiated by orthogonal collocation is comprised of a group of non-linear differential algebraic equations (DAEs) in relation to time.Such state-space representations can be recognized as stochastic state-space model and a modified extended Kalman filter (EKF) algorithm is applied to achieve the optimal state estimation of the battery model.The benefit is that the state error is estimated at each time step using a time-varying linear approximation of the model differential algebraic equations.The fact that it cannot ensure the state estimation's convergence is a drawback.There is also a study [124] implementing a reduced-complexity battery model developed from an SPM, where the final SOC estimation is obtained using the iterative extended Kalman filter (IEKF), an upgraded variant of the EKF that strengthens the state estimate around the current point at each time step in order to solve nonlinear problems more effectively.However, the computational complexity increases and needs to be kept at a tolerable level.Smiley et al. [125] presents a method for predicting battery performance using an interacting-multiple-model (IMM) Kalman filter to select from a pre-computed set of physically based ROMs, and choose the one closest to the observed output voltage measurements, given an input current.The method for creating the ROMs uses the discrete-time realization algorithm (DRA) [126].This method guarantees a stable model that accurately represents the internal and external battery dynamics at each stage of lifetime as opposed to the more commonly implemented adaptive methods.An optimal discrete-time state space model in reduced order satisfies Eq. ( 4) and (5).

X(t + t s ) = AX(t) + BI app (t)
(4) where ts is the discrete-time ROM sampling period (integer), X(t) is the model's "state" vector at time t, Y(t) is the vector of the model's "output" at time t, and A, B, C, and D are matrices.An alternative approach to generating ROMs is to use a data-driven approach to select and automatically identify the basic set of parameters that capture the aging characteristics.Machine learning methods have the potential to greatly enhance ROM identification, as they typically have fast forward execution time and the ability to exploit data to model larger number of generated descriptors [127].Descriptors here refer to algebraic expressions that accurately predict battery behaviors, such as Arrhenius, Tafel and polynomials.Based on the physical observation of the calendar fade, Gasper et al. [128] combine ROM and machine learning by using symbolic regression to identify local parameter sub-models, replace the local parameters with their respective sub-models, and perform regression to assemble a global model.This approach speeds up the model development process and assists in the construction of reduced order models through sensitivity analysis, bootstrap resampling, and long-term extrapolation and analysis of unused validation data.The convergence of descriptors in this research, such as Arrhenius and Tafel-like sub models, for local parameter submodels identified by LASSO has also been investigated to provide insight into the learning behavior of the models.One branch of future work could be to create a larger pool of descriptors in the hopes of better performing models with fewer parameters.
There are also investigations that address mathematical reformulations based on physical insights to generate reduced-order models, but they are not in the reduced-order approach of interest in this section.Interested readers can refer to [44,129].
In general ROMs only using engineering physics models (i.e., semiempirical/empirical models) and limiting their dynamic in lower dimensional space have less freedom in terms of parameter variation within the system they represent and retaining less information in the original space may lead to a loss of accuracy in the numerical solution.It may be possible to narrow the search area and produce more reliable training models with less data by combining electrochemical principles and assembling them.

Uncertainty qualification
How reliable can predictions, for tracking a batteries age, be considered?To quantify uncertainty of a prediction requires characterization of the entire distribution, (y|x), rather than just y = f(x).This will allow analyses of the degree to which the predicted values cover the true value, y, or the sensitivity of the input features, x.Fig. 6 depicts the uncertainty qualification (UQ) procedure.A physical model is constructed or chosen based on the test data.Data-driven methods are used to adjust the model parameters and track the prediction up to the currently observed period while taking the uncertainty in the degradation process into account.Depending on the threshold set, the remaining useful life or the distribution of the underlying parameters can be obtained to describe the uncertainty.As a result of transient fluctuations, cell-to-cell variations, and measurement errors, UQ combines random variance to characterize the uncertainty in the battery's degrading behavior.
Traditional methods such as Monte Carlo (MC) allow uncertainty quantification to be applied to P2D physical models.For instance, porous electrode model is used to estimate battery life based on charge/ discharge curves, where probability density for effective solid-phase diffusion coefficient Ds quantified by MCMC shows a monotonic reduction of Ds with increasing cycle number with very high confidence [93].Similar quantification of uncertainties in design-related parameters [130,131] are used to meet the need for a framework for assessing the effects of internal parameters of EMs and their relative impact on cell behavior.Utilizing such uncertainty qualification (UQ), it is possible to lower cell-to-cell variation and create more focused quality control procedures to lower the cost of cell manufacture [132].By using an extended P2D model, the nested point estimate method (PEM) and MC techniques assess sub-cell level bias and cell-to-cell variation [133].In order to provide a global SA conclusion that the sensitivity of the studied parameters relies on the applied C-rate, the nested PEM is applied to a significant number of independently normally distributed parameters.Both aforementioned methods are sampling based UQ methods, and comparison studies reveal that the PEM is computationally more affordable but has a lower sensitivity.Due to the system's nondifferentiability at low C-rates, PEM fails.In Ref. [134], a stochastic LiB modeling approach based on non-invasive polynomial chaos (PC) [132] is proposed to study the effect of uncertainties in the EM model parameters of Li-ion batteries' capacity, voltage, and concentration.The PC relies on the sparsity of the expansion coefficients, and a modest number of battery simulations can yield precise statistics for the quantity of interest.However, the stochastic LiB model created using PC has the drawback of being sampling-based, and as the number of necessary cell simulations rises, so does the overall computing cost.
The management of uncertainty for battery health prognostics based on ECM, have generally fallen into two categories: particle filter [135] and machine learning method [136].An integrated method to estimate capacity and RUL based on a lumped ECM is proposed in [137].In this paper, a Gauss-Hermite particle filter (GHPF) is applied to model the capacity decay and infer future capacity values to predict RUL.Furthermore, the GHPF method has been experimentally validated over the past 10 years showing its accuracy in capturing the uncertainty in RUL prediction.Saha et al. [138] first demonstrates the usefulness of Bayesian theory in managing uncertainty as a powerful tool for integrated battery health diagnosis and prediction through Relevance Vector Machines (RVMs), and state estimation with particle filters (PF).Furthermore, they propose an RUL prediction method, a Rao--Blackwellized PF (RBPF), using the correlation between battery performance and ECM model parameters [114].The results demonstrate that the particle distribution that represents the system state probability density function (PDF) can be quantified in terms of the contributing factors.The particle cloud distribution analysis can then be utilized to greatly minimize the spread of the RUL distribution while still keeping the convergence qualities of the underlying PF when there are deterministic relations in the system model.
Based on the semi-empirical formula for capacity fade obtained from the regression analysis of experimental data, the RUL prediction can be given in the form of a probability distribution using data-driven methods, such as nonlinear mixed effect [139], particle filter [140], and Gaussian processes [141], so that the confidence in the prediction can be assessed.A narrower PDF indicates a higher confidence in the RUL prediction.In [142] Xing et al. presents how two models, a double exponential model and a polynomial model, can be incorporated into a single degradation model.As it divides the capacity degradation data into three sections to meet the ideal global and local regression characteristics of the ensemble model, this integrated model is more effective than each of the two individual models.A PF is employed to account for the battery aging process.The RUL prediction is investigated by measuring the narrowness of the probability distribution.Reliable predictions can be made based on this new ensemble model for an additional set of cells with a different rated capacity, using a wide variety of initializations.Guha [116] proposes a method for estimating the RUL of Li-ion batteries based on capacity degradation and internal resistance growth.A semi-empirical model is developed based on battery capacity measurement data.Another semi-empirical model for internal resistance growth is developed based on EIS data.A PF is used to predict the RUL based on a fusion of the capacity and internal resistance degradation models.
Uncertainty quantification provides guidelines for assessing the confidence of physical model forecasts.It is also used for both quantitative validation of simulations and for optimizing robust designs.High computation times and a reliance on trustworthy priors are problems for models that apply UQ tasks to physics.Traditional MC lacks sufficient flexibility.Since the scale of the Gaussian process is O(N 3 ) with N data points, the basic machine learning technique requires significant computer resources when applied to higher dimensions or larger data sets.

Physics-guided data-driven
Many physics-based models cannot precisely represent battery aging trend for its entire lifetime, implying there are simply physical laws that we are yet to fully understand.To close this gap, the combination of physical prior information and data-driven methods has been developed to fuse the benefits of both.This section categorizes physics-based learning for batteries into three parts: (1) physics-based data generation, (2), physics-based residual learning, and (3) physics-based embedding.

Physics-based data generation
Data generation with restricted physical laws will provide prior knowledge when training data-driven models.Currently, two main approaches are applied when generating scientific data.The first relies on simulation [143] and the second from experiments.As shown in Fig. 7, Physical simulations are used to create a large number of new input/ output data.Capacity fade, voltage, current, and temperature response are typical outputs, whereas the inputs are normally voltage, current, and temperature applied to the battery.By first training on these simulation data, machine learning can include aging mechanisms into the model.A small number of experimental data inputs, such as current/ voltage curves, EIS, and QV curves, can be utilized to improve the accuracy of model predictions.It is important to think about how the data are merged because too much simulation data could mask valuable experimental data.
The high computational cost of electrochemical models prevents applicability in predicting SOH.Meanwhile, machine learning methods have been found to be successful in predicting, analyzing, and optimizing SOH at lower computational costs [144].As battery design needs to consider reducing risks while increasing performance, machine learning-based multivariate optimization of design parameters is used to address battery capacity and performance degradation.In [145], a numerical model based on a Newman model and 2D current preservation model is adopted to create a nail penetration simulation database that serves as training data for a Gaussian process.Augmented Lagrangian genetic algorithm attempts to combine the above regression model and target optimal design conditions for Li-ion batteries.Furthermore, [146] show that neural networks are highly valuable in battery design.Data from finite element analysis has been used to train and construct two neural networks.The first is classifier, aiming to determine if a group of input variables is physically feasible.The second is a calculator, targeting a specific energy and power.Statistical models also contribute to optimizing and extending battery service life.One case is reported by Li et al. [147], who propose an electrochemical thermal model and use it to generate training data.The internal concentrations and potentials of electrodes and electrolytes in different spatial positions are then estimated using the generated training data as input to a deep neural network.It is shown that the proposed method can bridge the spatial, temporal, and chemical complexity.Additionally, physics-based simulation data has been used to train Gaussian process regression (GPR) [148].This approach uses just the ambient temperature and C-rate as input features to an EM, the finite element is used to simulate the capacity degradation and SEI thickness, and then charging voltage curves are linked to GPR model.
Conducting physical experiments also creates meaningful datasets to forecast battery lifetime using machine learning, even without complete knowledge.As clarified in Section 2.2, electrochemical impedance spectroscope (EIS) is coupled with internal electrochemical reaction and contains rich information on material properties used to describe battery aging.Zhang et.al [149] collected 20,000 electrochemical impedance spectroscope (EIS) measurements of Li-ion batteries under different SOC, SOH and temperature.An accurate battery prognostic system was built to achieve a real-time, non-invasive, and information-rich diagnosis with Gaussian process regression.The suggested model not only tells us which frequencies are dominant, but also outperforms conventional prediction techniques [150] that make use of discharge curve characteristics.Jiang et al. [151] trained an machine learning model using a series of nano tomographic slices of NMC composite electrodes from as experimental data.A mask regional convolutional neural network was used to identify and segment NMC particles for each slice (Mask R-CNN).The benefit of this technique is that, especially when the picture signal-to-noise ratio is low and the boundaries are hazy, it can address the over or under-segmentation caused by the conventional internal distance map as a signal function.Understanding the electrochemical effects of the changing battery particles with the conducting matrix is aided by the visualization of the microstructure evolution of the electrode material.Lastly, Ricardo et.al [152] utilized three machine learning methods to predict performance of NMC-based cathode from manufacturing parameters.It was revealed that support vector machines can predict the influence of these manufacturing parameters with high accuracy.Physics-based data generation can also consider both simulated and experimental data.Machine learning models are upgraded separately and in combination using simulated data based on a half-cell model, and dQ/dV curves from cycle experiments in the early-life stage [153].Either data augmentation or the bias-correction method can produce more precise degradation predictions.However, the strength of the extrapolation capability of different machine learning methods needs to be carefully addressed, as the efficacy of performance improvement varies.Data generation methods have stringent requirements for producing almost brand-new data under circumstances.Today's physics-based data creation relies largely on executing simulations or carrying out experiments, both of which take some time.It is challenging to use machine learning techniques to learn data distributions unsupervised in order to produce new data that cannot be produced using conventional techniques.
Physics-based data generation feeds realistic synthetic prototypes with directly using mechanistic equations such as the P2D.Numerical simulation or experimental dataset from physics-based methodologies can used to support data-driven methods to achieve more accurate results.

Physics-based residual learning
With comprehensive understanding of aging mechanisms of Li-ion batteries, research on aging-conscious modeling including EMs [39,40] and ECMs [154,155] coupled with different degradation phenomenon has increased in popularity.However, no single model can accurately describe all degradation factors.Residual learning, in which a machine learning model learns to forecast the errors created by a physics-based model, is the most established and widely used method for directly addressing the flaws of physics-based models.The workflow is shown in Error!Reference source not found.The physical model and machine learning are performed simultaneously.A representative Li-ion battery's electrochemical, electrical, or thermal behavior is essentially represented by the physical model.The physical model mismatch is learned using machine learning.To mimic the projected value of the battery, the final output will be P hybrid = P phy + ΔP.Features include voltage, current, resistance, temperature, and SOC etc., prediction can be terminal voltage, capacity loss and resistance increase (Fig. 8).
The fundamental idea is to gauge model predictions by learning the physical model's residuals (in relation to the observations).One way is to use machine learning to fill the gap in our understanding of degradation mechanisms that introduce errors.A potential method is universal ordinary differential equations (UODEs) an extension of neural ordinary differential equations (NODEs) [156], which will act as a function over all state variable of the system.The UODE is used to describe the capacity fade and resistance increase triggered by unknown physical mechanisms [157].A combination of physics-based model and the UODE is proposed to create a degradation model assessing charge loss Fig. 8. Physics-based residual learning workflow.Given the features, the physics-based model predicts an initial solution, and the machine learning method reduces the total error by learning the mismatch between physics-based prediction and observation to obtain a more accurate prediction.The cubes indicate the input features X and output prediction Y. Furthermore, the yellow, blue, and green boxes represent the physics-based model, the machine learning method, and the battery, respectively.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)(due to SEI growths, lithium plating [48] and active material isolation [35]) and resistance increase.This physics informed machine learning approach, governed by Eq.( 6), has the potential to improve the accuracy of Li-ion battery degradation models.The equations governing the UODE framework for degradation mechanisms is seen in Eq.( 6): [157] where Y describes the aging characteristics (i.e., capacity, internal resistance, power), μ represents collection of physical parameters such as ohmic overpotential and equilibrium potential, X is the vector of operating conditions, and θ denotes parameters of NN.A similar concept was also proposed and studied under the name 'data-driven error compensation' by Gesner et al. [158].
Another approach to improving the physics-based solutions is by deploying algorithmic control.State information calculated by physical principles, such as the anodic surface and bulk SOC, etc., is added to the output and machine learning is used to optimize the error between the model-based solution and the true value.In [159] a feedforward neural networks (FNN) is used to capture residuals of physical models.They concatenate an SPMT (SPM coupled with thermal effects) and FNN.And the second effort integrates the ECM [160], developed in [161], with FNN.The FNN monitors the ongoing state evolution of physical model and learns what is missing in the physics-based model using the measurement data.This hybrid approach is validated using simulations and experiments revealing a high predictive accuracy over a wide range of Crates.Recently Park et al. [162] also showed support for hybrid electrochemical modeling with recurrent neural networks which outperforms other reduced-order battery models in most situations.In this approach, an SPM describes the terminal voltage output, governed by equations based on electrode thermodynamics, electric overpotential, and Butler-Volmer kinetics.The current is used as input and the SPM outputs the voltage result, while RNNs learns the difference between the P2D model (considered as the true value) and the SPM result, and finally outputs V model = V SPM + ΔV RNNs .Nevertheless, one drawback of this structure is its inability to blend physical constraints in machine learning.
The investigations show that data-driven error compensation outside restricted boundaries leads to improvements and robustness in the predictive accuracy.However, these aging prediction models need to be validated under for more conditions when applied to LiBs.

Physics-based embedding
Physics-based embedding incorporates physical models into the model optimization loop, where the physical models act as the part of the skeleton and the machine learning is responsible for tracking the trend and accelerating the calculation.The workflow is shown in Fig. 9.One structure feeds the physics-based model's output into a machine learning model that predicts the target directly.Another use is when the machine learning model is applied to forecast an intermediate quantity that is challenging to represent with physics or to replace one or more physics-based model parts.Features extraction include voltage, current, temperature, and SOC.Prediction can be battery voltage, total resistance, maximum charge, and charge/discharge power, all of which are motivated by the aging process.
One approach to embed physics into a machine learning algorithm is by feeding the output of physics-based model as input to the data-driven model, as illustrated in Fig. 9 (a).A demonstration of this approach is carried out by Tu et al. [160] The HYBRID-2 model employs FNNs to predict terminal voltages based on an SPMT and NDC (an ECM developed by Tian et al. [163]), both of which provide state information to the FNNs.For SPMTNet (a combination of an SPMT model and an FNN) the FNN measures SoC bulk , SoC surf and T derived from the SPMT model as its input variables and exploits the SPMT's state to make predictions.For NDCNet (a combination of a NDC model and an FNN), state variables such as voltage referring to the bulk inner region of electrodes, voltage describing the surface region of the electrode, and transient voltage caused by ion diffusion dynamics, are fed to the FNN to embed physical laws in its predictions.In [164], a simplified SPM and a lumped thermal model are used as sub-models of an ETNN (electrochemicalthermal-neural network) to forecast the core temperature and offer estimates of the approximate terminal voltage.The electrochemical thermal sub-model is parameterized to give an approximation of the terminal voltage and a neural network with I, T, and V SP inputs is then used in series with the sub-model to improve the accuracy of the predicted voltage.Validation of the ETNN model indicates that it can accurately predict the battery terminal voltage and core temperature over a wide range of ambient temperatures (from − 10 • C to 40 • C).An earlier systematic study of physics embedded machine learning methods by Refai et al. in 2011 [165] shows that a sparse recurrent neural network can incorporate the output of physics-based model as additional input.However, this hybrid neural network can only be used after the physic-based model.Recently, Hu et al. [166] proposes a physics informed data-driven model where a ECM is exploited to capture the physical features of the Li-ion battery during charging and discharging, and a Tensor-Network-based Volterra model is used to predict the SOC.The results show that this method can reduce the risk of overfitting.Li et al. [167] applies EM to monitor and iteratively predict the internal electrochemical condition of Li-ion batteries in real time to determine safe operating conditions.This research uses GPR, which accelerates online prediction computation by employing the window of historical maximum charge and discharge currents that moves forward step by step over time as opposed to using all historical data as the training data set.In incorporating LSTM for capacity fade prediction, a recent study [168] has mostly centered on semi-empirical Eq. [169].The benefit is that by understanding how operational stress factors and battery health conditions affect battery degradation, capacity fade can be properly predicted.While it is still necessary to validate predictions made for various battery types, loads, and temperatures.
Another embedding approach is to replace a part of the physicsbased model with a data-driven method [170,171].Encoding the loss of physics-based models in a machine learning method such as a neural network have yielded positive results [172].In a recent paper by Nascimento et al. [173], a physics-based model (based on the Nernst and Butler Volmer equations) is embedded into an RNN, thereby generating physically driven hidden constraints for the RNN.Part of the physicsbased model is replaced by a multilayer perceptron (MLP), which is flexible enough to capture the dynamic changes of non-ideal voltages.It is easy to adapt and interpret the hybrid model since most of the computations in the RNN are driven by the physics-based model (i.e., Nernst and Butler-Volmer equations).The same approach has been applied in redox flow batteries [174].He et al. establishes a physics-constrained deep neural network (PCDNN) using a 0D cell model of the vanadium redox flow battery, which learns the model parameters as a function of operating conditions.DNNs are used to replace the parameter function of the physics-based model.Physics informed machine learning and visual tracking are employed to predict the thermal conductivity of the heat pipe in battery thermal management systems related to temperature and position [175].The Multiphysics numerical simulation used within the heat pipe that supplies variable thermal conductivity can contribute useful insights into the efficiency of the heat pipe.The disadvantages of the embedded structure include the necessity to modify the hybrid structure based on the predictor variables and the need for additional electrochemical domain knowledge to select which machine learning units embed into the physical structure.
Embedded predictions insert intermediate variables between a physics-based model and a machine learning method during training to ensure that the acquired parameters carry a real physical interpretation.

Publication trend
Battery lifetime modeling publications, including journals and conference proceedings, blending physics and data-driven methods in the past 20 years are reviewed and illustrated in Fig. 10.It should be noted that the publications for 2022 are only available until June 1st.The number of this hybrid way is growing rapidly.Before 2017, data-driven assisted physical models are the dominance of physics-based lifetime modeling.However, after 2020, what is striking in this figure is the phenomenal growth of physics-guided data-driven approaches.Physics informed data-driven methods first appear in 2011, originating from computer sciences and breathe new life into the battery aging prediction  area.The pie chart, Fig. 10.(b) shows machine learning has the largest share of data-driven methods in the "grey box" lifetime modeling.This indicates that machine learning is the most popular data-driven method to combine with physics-based models (especially non-probabilistic machine learning), which will evolve our understanding of battery aging.
Regarding the usability of physics-based battery lifetime modeling (Fig. 10. (a)), P2D models and SPM have become the most popular physical models for lithium-ion batteries, and the success of these models depends on an accurate understanding of the electrochemical properties of the battery.The EM-based aging prediction has been one of the hot spotlights of research over recent years, attracting increasing attention from academia and industry.Despite the greater complexity of EM, EM-based BMS is regarded as an encouraging trend for future BMS with the advancement of research [167].Moreover, ECM and semiempirical have become the second most applied model.Their benefits include the ability to explain how external stressors affect aging and a minimal parameter complexity that is conductive to online applications.The drawback, however, is that they convey less physical perspective than EMs to depict the nonlinear behavior of dynamic operating circumstances.

Comparison
Battery lifetime prediction modeling combining physics and datadriven discussed in this part covers a great deal of work.Table 5. summarizes the distinctions by listing the synopsis, strengths, and weaknesses of the different methods.Different approaches can be chosen depending on the resources available and the problem-solving objectives.
Parameter identification is an approach to using data-driven methods to estimate parameter values through regression data to physical models.For accurate prediction of results, a high-performance physical model is a necessity.And suitable algorithms need to be considered to apply for leveraging computational resources and accuracy.In on-board applications, prediction is completely dominated by the physical model, and forecasting speed is also related to model complexity.Additionally, it is essential to ensure that these internal variables are patterned adequately before they can be safely used in BMS applications.
The reduced-order physical model angle provides a simplified physical model while ensuring accuracy.One is to focus on simplifying high-fidelity EMs, such as P2D or coupled EMs.Considering the complexity of EMs, decomposition ways are used to reduce expensive representations.Then data-driven algorithms are employed to output from ROMs.Moreover, Machine learning appears to be a powerful tool to automatically identify ROMs from the sub-models' descriptor library.However, ROM requires more time for calculation if parameter values are altered, making it unsuitable for system applications.
Uncertainty qualification expresses the battery life (RUL) as probability distributions, describing the uncertainty due to the measurement tolerances, parameter fluctuations, and cell-to-cell variations.Filtering, stochastic, or probabilistic machine learning methods are requirements.Another purpose is to describe parametric uncertainty through statistical theory.Quantitatively quantifying the effects of these uncertainties is essential for reliable physics-based model prediction.While prediction accuracy relies on priors and online evaluation, dynamic training strategies are still inadequate.
Physics-based data generation can output large quantities of computational data and reduce the cost of experimental observation acquisition.Machine learning technique trains on these data or synthesis of experimental and computational data to ensure a partially physicsconstrained prediction result.This is a simple integration of physics and data-driven and demonstrates excellent performance in real-time state assessment and battery system cloud optimization.However, this method requires a high computational cost in data generation.With more data under different applications, the accuracy of the method can be significantly improved.
Physics-based residual learning captures unmodeled dynamics in physical models.Machine learning improves prediction accuracy by reducing the errors between observations and models.The physical model will choose an electrochemical-based derivation to ensure physical solid consistency.It reduces the data requirements corresponding to the pure data-driven methods.Although it is faster than existing complex physical models, a more straightforward model form is favorable for online applications.And the prediction speed is limited by model complexity.
Physics-based embedding approach is either as physics informed machine learning architecture or physics constrained machine learning.This method requires governing equations and a suitable algorithm (preferred ANN).The physical model can feed some intermediate parameters in NNs.Meanwhile, machine learning can learn the nonlinear PDEs or ODEs to let the output be partially constrained by physics insights.Such connections can be alternated in a cycle until satisfactory results are obtained.It serves as the cornerstone for work online.At the same time, more work needs to be done to lay the groundwork in this area.

Future perspectives
Physics-guided data generation is an important study area.Data can provide breakthrough technologies and powerful new forces to bridge experts from different disciplines.Experimental and simulation-based high-fidelity datasets with physical perspectives are in demand.Accelerated experimental datasets are an important basis for developing prediction methods in the battery field.Already commonly cited are battery data published by NASA [176], CALCE [177], and in 2019 MIT [150] published a dataset of 124 commercial LFP/graphite cells under fast charging scenarios.Recently, Pozzato et al. [178] and Xia et al. [179] have also released experimental data subjected to an EV discharge profile and the deep aging process.More comprehensive public datasets are encouraged.
The above trends emphasize the significance of data sets.On the one hand, the identification of reasonable accelerated experimental conditions and the investigation of standardized test procedures are worthy of continued development to ensure the minimum test matrix and test costs.Thermal [180], mechanical [181], and other test instruments [182] can also be combined with electrical tests to enrich the data dimension.On the other hand, the generation of multi-physics simulation datasets [183,184] through physical mechanisms is also a valuable area that can support the study of underlying parameters not easily measured to guide cell design optimization and iterative production.
Physical models leading fused data-driven approaches have achieved a lot and have become practical for improving accuracy.Instead, longterm attention should be focused on physics-guided machine learning approaches to prognostics.As biology embraces data-driven algorithms, machine learning has emerged as the most promising tool.In physicallybased high-dimensional models [185,186] the physics-guided machine learning can estimate the parametric functional form, which will improve the accuracy of the model compared to the standard LS or other optimal regression algorithms.Introducing more physical crossover factors supports degradation prediction, which needs to be solved numerically or approximately for PDEs.
PINN method [187], a set of deep learning algorithms for seamlessly integrating data and extracting mathematical operators, can solve for the spatial derivatives of these fields in the PDEs and boundary condition residuals by embedding multi-physics field loss functions in the NN loss function.Considering the complex dynamic degradation of lithiumion batteries in EV application, PINN with EM models or some principal equations (Butler Volmer, conversation laws, etc.) would be a promising solution in the future.
With neural networks studied for their ability to incorporate physical W. Guo et al. concepts well [188,189], the development of physics-based neural network architectures that can adapt to changes in physical correctness or quality of training data is promising.We expect experts from computing, physics, mathematics, chemistry, etc., to work together to make this happen.On-board prediction for BMS is another focus [190].The ideal model needs to be constantly updated and optimized by combining battery design, manufacturing factors, historical usage data, and online monitoring data to form a closed-loop update mechanism to achieve a wide range of applications in electric vehicles.We want to test the model under real load conditions in EVs and extend the physics-based neural network model to more powerful components to build a complete hybrid model that is not only useful for predicting the end of discharge but also for fault detection as well as isolation within the EV system.
While as a matter of fact, blending physics-based and data-driven techniques in an accurate sole model has its challenges related to identifying the merging point.What physical model can we choose to inject machine learning networks?How to choose the optimal machine learning method and its architecture to avoid overfitting or underfitting?Which form of structural embedding can fully exploit the guidance and constraints of physics-based models, while allowing machine learning to track the aging trajectory to give accurate predictions flexibly?Hopes and challenges will inspire life prediction and troubleshooting of lithium-ion batteries to go even further.

Conclusions
Battery lifetime modeling is a nonlinear and time-varying process.Accurate lifetime assessment is a hot but challenging topic in the battery field.The interest in transferring from a single model to a hybrid physical and data-driven prediction approach to improve the generalization and accuracy of battery aging prediction can solve many of the issues of the pure physical or data-driven approaches.
This review gives a systematic overview of battery lifetime modeling, combing physics, and data-driven methods on the basis of 190 related papers.Three physics-based battery models are introduced, and these models' requirements and application features are presented.Through the perspectives of parameter identification, reduced-order models, and uncertainty qualification applications, data-driven can assist physical models in obtaining results closer to observations.Constraining and feeding data-drive algorithms via physical equation fusion significantly increases the results' confidence while reducing the training data requirements.Regardless of the above two approaches, the gradual enhancement of electrochemical models is noticed, with more than 50 % occupation in the physical part of "grey box" modeling options.At the same time, the 78 % share of machine learning demonstrates its better predictive power when compared to other data-driven methods.To develop a highly sophisticated life model to describe the battery aging phenomenon, combined with the temporal and spatial complexity of electrochemical processes needs to be considered simultaneously.Therefore, developing physics-based models is an ongoing required effort.Furthermore, open-source multi-conditional application data is expected.Finally, deriving physical explanations to inject into datadriven lifetime predictions will help guide accurate lifetime prediction and safe battery operation.We believe that using physics-guided machine learning to predict battery degradation is very promising, such as PINN or applying EMs to develop physics informed machine learning architecture.
In addition to building high-fidelity models, implementing, and updating model prediction capabilities by appending models to the BMS is an expected development direction.We hope to inspire more researchers to keep enhancing the online application of "grey box" lifetime modeling.

Fig. 1 .
Fig. 1.Schematical of P2D and governing eqs.A lithium-ion battery consists of current collectors, an anode, a separator, a cathode, and an electrolyte.The electrochemical behavior in the cell is calculated in two dimensions, in the direction of the sandwich stack thickness and in the direction of the particle radius.

Fig. 3 .
Fig.3.Schematic of semi-empirical models, experiments, and reactions.Lithium-ion batteries' lifetime follows the Arrhenius law[66] and power law through accelerated degradation tests.The power-law coefficient of time depends on the SEI growth reaction.The SEI growth schematic is cited from Ref.[67].

Fig. 4 .
Fig. 4. Parameter estimation flowchart.The physical model carries out predictions using nominal parameter values.The parameter space is made less dimensional by sensitivity analysis and identifies which parameters would be sensitive.Data-drive methods evaluate the prediction error compared with experimental measurements, update the parameter estimates, and iteratively optimize until the error is less than the tolerance to obtain the optimal parameter values and battery life prediction results.Yellow boxes indicate physical models and physics-based predictions.Blue boxes represent data-driven assessments.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 5 .
Fig. 5. Two types of Reduced-order models flowchart.(a) The EM is downscaled using four numerical operations to obtain the ROMs and the predictions are obtained using data-driven assistance.(b) To extract the ROM descriptor library from battery measurement data and use data-driven identification of optimal local ROM models and global ROM models to predict battery life.The green box represents test data.The white box denotes the intermediate process of model prediction.The blue color means data-driven prediction, and the white cube on the yellow side indicates physics-related intermediate processes.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 6 .
Fig. 6.Uncertainty qualification workflow.The physical model extracts measurement data to estimate battery state.Data-driven methods can be adopted for tracking prediction and update state estimation.Uncertainty qualification consists of parametric uncertainty and RUL distribution results.Green indicates measurement data.Yellow represents physical models.Blue shows the data-driven related process.Red denotes the threshold.And white boxes in blue indicate predicted results.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 7 .
Fig. 7. Physics-based data generation schematic.Physics-based models can be adopted for generating train data.Machine learning methods train data obtained from physics-based models and physical experiments with electrochemical significance to acquire a mapping relationship between measured features and predicted outcomes.The cubes indicate the input feature X, physics solution S and output prediction Y.The yellow box represents physical model and blue box denotes machine learning methods.Green box refers to physical experiment data.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 9 .
Fig. 9. Two types of physics-based embeddings.(a) physics-based model takes input features and feeds its output to a machine learning method.The machine learning method uses the original input features and the output of the physics-based model as an input, and outputs the predicted battery life.(b) Machine learning replaces a part of the physical model, while the physics-based model is used to constrain the machine learning method to obtain physically meaningful mapping between the features and the prediction results.The cubes indicate the input feature X, intermedia vector, and output prediction Y.While the blue and yellow boxes refer to the machine learning methods, and physics-based model, respectively.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 10 .
Fig. 10.Publication trends of the literature reviewed in this paper.(a) Three physical model trends are reviewed in Section 2, and two categories of 'grey box' lifetime modeling trends are reviewed in Section 3, (b) The application percentages of different data-driven models used in hybrid approaches in Section 3.

Table 1
Summary of published literature related to integration of physics and datadriven methods.
W.Guo et al.

Table 2
Governing equations of the P2D model.

Table 3
Different dimensional model characteristics.

Multi-layer surface film Charge transfer process Solid state diffusion Electrolyte, Separator, Current collector, Contact resistance Middle to low frequency Middle to high frequency High frequency
Relationship between ECM and impedance spectrum.ECM consists of ohmic resistance, n-RQ elements, and an RQ-Warburg element.The EIS result corresponds to respective ECM components.Warburg elements can be classified as Finite Space Warburg (FSW) elements and Finite Length Warburg (FLW) elements.

Table 4
Summary of requirements and functional features from different physics-based modeling.The left column corresponds to the model types described earlier.
[88]rstanding of power-law relation with time, Arrhenius kinetics, and accelerated tests, preferable some physical insights corresponding to models ▪ Accelerated degradation tests (cycling and calendar aging profiles) ▪ Electrical analysis (Capacity, power, or resistance[76])▪ Heavy sets of experimental research before modeling ▪ Speedy computing capability ▪ Simple to implement online ▪ Can lead to significant errors[71]unless combined with other physical models[88]

Table 5
Comparison of different methods regarding description, needs, advantages and limitations.