A comprehensive approach to sparse identification of linear parameter-varying models for lithium-ion batteries using improved experimental design

This paper proposes a comprehensive approach to the identification of battery models in the linear parameter-varying (LPV) framework inspired by the equivalent-circuit model structures. The proposed LPV model structure is formulated using the input–output representation, wherein the model parameters are considered to depend on the state-of-charge, the current magnitude, and the current direction. The aforementioned dependence is explained using a suitable set of basis functions motivated with appropriate physical insight. Furthermore, the optimal experimental design problem is discussed to propose an improved input design for the model identification procedure. Subsequently, the Least Absolute Shrinkage and Selection Operator (LASSO) along with the ridge regression are employed for the selection of significant model terms and parameter estimation to yield a sparse model with adequate simulation capabilities. Finally, several battery models with varying model order and basis-function complexity are identified, which are subsequently validated and compared using a real drive-cycle dataset. The corresponding voltage simulation results yield the root-mean-squared error (RMSE) value for the best-performing model to be around 24.5 mV when a 1 A h NMC battery gets discharged with the lower voltage cutoff as low


Introduction
Owing to the increased global focus on reducing carbon emissions, the electrification of various transport systems using lithium-ion batteries has been at the forefront of recent research.To meet the desired safety and performance criteria in such battery-powered systems, battery management systems (BMS) are employed to accurately describe the nonlinear battery behaviour using appropriate algorithms given a particular set of measurements, such as current, voltage, and temperature.Quite generally, model-based techniques have gained much traction in understanding various battery behaviours under dynamic operating conditions.The commonly studied battery models can be mainly classified into physics-based analytical models and data-driven empirical models.The physics-based models utilize mathematical formulations of the electrochemistry and thermodynamics principles, involving a system of partial differential equations.A few examples of such models include (i) the Doyle-Fuller-Newman (DFN) model [1] based on porous-electrode theory, (ii) Single-Particle Model (SPM) [2] which is a simplification of the DFN model, (iii) lattice-gas models [3,4], and (iv) the models based on the theory of nonequilibrium thermodynamics [5,6].Although the physics-based models can accurately physical processes, rendering the resulting ECM physically insightful.For instance, an ECM can be identified using the so-called theory of the distribution of relaxation times (DRT) along with the Electrochemical Impedance Spectroscopy (EIS) measurements [11].Reportedly, the ECM parameters can then represent the battery internal resistance, the SEI layer, the charge-transfer kinetics, and the diffusion processes inside the battery.In addition, an application-specific approach to the identification of an ECM also exists, which involves experimental data representing a subset of the battery dynamics in a limited frequency range, for instance, related to a drive-cycle application [12].The estimated parameters of such an ECM become less representative of the underlying physical processes as the size of the chosen subset of the battery dynamics shrinks.Nevertheless, such ECMs have been the preferred choice from the practical perspective due to their adequate accuracy, ease of parameterization, and reduced computational complexity.
Primordially, the ECMs have been represented within a linear timeinvariant (LTI) framework, wherein the model parameters remain constant during the battery operation [13][14][15].Since the battery is better described as a nonlinear system, the LTI framework cannot capture the time-varying dynamics, even if a higher-order ECM is chosen [16].To mitigate several issues with the LTI framework, an intermediate paradigm between the LTI and the nonlinear systems has been presented in the literature giving rise to a class of linear parameter-varying (LPV) models [17].Essentially, the nonlinear battery dynamics are approximated by a measurable time-varying signal -referred to as the scheduling variable -over the entire battery operating regime.Such mathematical description leads to the time-varying but linear relations between the inputs and the outputs of the battery, thereby allowing better generalizability for the LPV-ECMs than their LTI counterparts.
A brief literature review of the ECMs suggests an increased focus on the incorporation of unique battery characteristics in the prescribed LPV models [18][19][20], such as the state-of-charge (SOC) dependence, the current-direction dependence, and the temperature dependence of the model parameters.Such an approach effectively provides additional degrees of freedom to capture the complex battery dynamics.Yet the question of generalizability persists whether the adopted model identification approach can lead to better approximative models of the battery, thereby enhancing the model applicability over a wide range of the intended battery operation.Often, the principle of parsimony gets adopted while modelling various physical processes based on the presumption that nature admits simplistic descriptions in general [21].Accordingly, a parsimonious or sparse model gets preferred over a complex model even if the latter performs slightly better than the former [22].Furthermore, the quest to obtain sparse models relies profoundly on the informativity of the experimental datasets used for the model identification.Remarkably, the battery literature is devoid of any work that focuses on the identification of sparse battery models while emphasizing informative identification experiments.As such, there is a need to emphasize a proper structure while prescribing novel battery models along with informative experiments and efficient model identification procedures to obtain generalizable models with adequate prediction (or simulation) capabilities.
In this paper, we demonstrate a comprehensive approach to identifying a sparse battery model by extending the equivalent-circuit model structure in the LTI framework to the LPV framework.Firstly, we motivate the dependence of various model components on the SOC, the current magnitude and the current direction in the form of the so-called scheduling variable.Secondly, the functional form of the scheduling-variable dependence has been explored using a set of suitable basis functions with appropriate physical insight.Moreover, we briefly discuss the required informativity of the experimental datasets used in the model identification procedure and subsequently propose a suitable design for the input current profile.Next, we employ the Least Absolute Shrinkage and Selection Operator (LASSO) and the ridge regression for the selection of significant model terms and the parameter estimation.Finally, we validate several models with varying model orders and basis-function complexities using a real drive-cycle dataset and compare the respective voltage simulation performances.
This paper has been organized as follows.Section 2 provides preliminaries regarding common linear parameter-varying system representations.Section 3 introduces the proposed battery model structure in the LPV framework.Section 4 motivates a suitable input signal design from the perspective of the optimal experimental design problem.Section 5 describes a model selection and estimation procedure to yield a sparse battery model.Section 6 presents the conditions regarding the battery experiments presented in this paper.Section 7 discusses the obtained results regarding the identified battery models.Section 8 concludes the work presented herein.

LPV model representations
This section gives a brief introduction to the two commonly studied model representations of linear parameter-varying systems, namely the state-space and the input-output representations.The respective subsections introduce some notation required in the context of this paper, which has been presented in a similar fashion as in the literature related to the linear parameter-varying systems (see [17] for a detailed presentation).

LPV state-space representation
In contrast to linear time-invariant (LTI) systems, the state-space representation of a linear parameter-varying (LPV) system introduces a so-called scheduling variable in the system matrices.More formally, the discrete-time LPV state-space representation of a system in a singleinput single-output (SISO) setting can be given by where  ∶ Z → R   is the state vector,  ∶ Z → R the input,  ∶ Z → R the output,  ∶ Z → P ⊆ R   the scheduling variable,  ∈ Z the discrete time instant, and (, , , ) the collection of -dependent matrices which represent a state-space realization of the LPV system.Note that the notation (□ ⋄ )  corresponds to having an arbitrary functional dependence on the variable  at different time instants, e.g., ( ⋄ )  =  (… ,  −1 ,   ,  +1 , … ).

LPV input-output representation
In addition to the state-space representation, the discrete-time LPV representation of a system can be given using a difference equation relating only the system inputs and the outputs where

Proposed model structure
The proposed battery model structure decomposes the battery terminal voltage  into (1) an a priori known component function  EMF where  is the battery SOC,  the total battery overpotential,  the battery input current, and  the scheduling variable.

Motivation behind model components
The above model equation can be motivated using the physical intuition behind the component functions  EMF , and  , which would also facilitate the direct formulation of the input-output representation of an equivalent-circuit model.As mentioned earlier,  EMF represents the battery EMF which, in this paper, has been considered to be a static nonlinear function depending only on the battery SOC , and assumed to be available a priori through various techniques [23], such as Galvanostatic Intermittent Titration Technique (GITT), extrapolation to zero current, or low-current cycling.More precisely, three trivial choices for the function  EMF exist, namely the charge  chg EMF , the discharge  dchg EMF , or the average  avg EMF EMF function; the latter being the average of the first two functions.In this paper,  avg EMF has been chosen to represent the battery EMF assuming negligible hysteresis, which has been obtained using the GITT experiments in both the charge and the discharge directions (see Fig. 1).
The component function  essentially represents the overpotentials associated mainly with the battery dynamics due to (i) the mass transport (diffusion) limitations inside the battery resulting in a slow overpotential, (ii) the battery internal ohmic resistance resulting in a fast overpotential, and (iii) the charge-transfer resistance at the electrode-electrolyte interface resulting in an intermediate overpotential compared to the first two overpotentials.Note that the term slow (respectively fast ) refers to a physical process having the characteristic time constant sufficiently larger (respectively smaller) than the model sampling time.For most practical purposes, such as during the drive-cycle application, it suffices to consider a finite-order difference equation as given by (2) to approximate the underlying overpotentials, namely where  is the order of the difference equation, and   ,   the parameters of the difference equation.Note that the explicit -dependence of the parameters (  ⋄ ), (  ⋄ ) has been omitted for the sake of readability.An interesting idea is to describe the functional dependence of the parameters using the SOC , the magnitude of the input current ||, and the direction of the input current .Such an argument can be easily motivated by the reported dependence of the model parameters on the aforementioned variables [24][25][26].Note that the dependence of battery behaviour on the temperature and hysteresis phenomenon can also be incorporated into the model structure using appropriate experimental design and basis-function selection, but has not been considered in this paper for the sake of simplicity; see [27] for strategies to incorporate the hysteresis phenomenon into the proposed model structure.

Scheduling-variable dependence
The SOC , the current magnitude ||, and the current direction  dependence introduced in the model component function  can be combined into a single variable -the so-called scheduling variable.Formally, the scheduling variable  can be defined as a three-dimensional vector  ∶ Z → P ⊂ R 3 given by where the scheduling space P is a subset of R 3 , namely The corresponding SOC trajectory   can be given as a solution to the following widely accepted coulomb-counting difference equation in the literature, namely where  corresponds to the model sampling time, and  the battery capacity.Moreover, the trajectory   can be obtained by defining  using the sign of the input current through a first-order difference equation as given by and sgn(⋅) corresponds to the signum function.In addition,  ∶ R → E ⊆ R can be considered as a hybrid parameter which takes on different values depending on the current   , where has been used to denote a specific current-direction trajectory for the given values of  0 and  1 .For instance, the values  0 = 0 and  1 = 1 yield  [0,1]  = sgn(  ) when the current is nonzero, and −1 otherwise.In effect,  can be considered to have a memory such that it retains its previous value when the current becomes zero.Another interesting case can correspond to choosing  0 and  1 close but not equal to 0 and 1, respectively.In such a case,  has two notable properties, namely (i) it has a finite memory so that the value of  slowly decays to zero when the current becomes zero, which can model extremely slow relaxation effects inside the battery, and (ii) it can be considered as a continuous analogue of the otherwise discontinuous sgn(⋅) function when the current is nonzero, which effectively renders the trajectories of the scheduling variable  in the scheduling space continuous.Simply put, the model parameters change their values ''smoothly'' when the current direction is changed, which is based on the practical assumption that the real physical processes are continuous.

Input-output representation
Quite generally, the static nature of the  EMF function in (3) gets exploited in the literature to explain the battery terminal voltage by modelling the battery overpotentials separately.Concretely, where   represents the total effective overpotential at time instant , which can be regarded as the output of the overpotential model, and  the model order.In addition, the equivalent-circuit models for battery overpotentials have been ubiquitously realized using the modal state-space representation, wherein the -matrix takes the diagonal form, and the state vector  represents the collection of voltages across the resistor-capacitor networks.In this paper, the input-output model representation has been readily adopted as the starting point by noting the fact that an th-order state-space representation of an equivalent-circuit model in the LTI framework can be transformed into an th-order difference (or differential) equation of the form given by (4) using transfer-function-based approaches (see [28] as an example).Now, the scheduling-variable dependence discussed previously can be formally incorporated in the model structure by rendering the parameters {  }  =1 , {  }  =0 in (4) as functions of the variable .In this regard, several possibilities exist to introduce the -dependence of model parameters in the LTI framework, thereby yielding LPV model structures that can involve varying degrees of mathematical complexity but with little practical difference in the present context.Accordingly, the socalled shifted-form input-output representation will be considered in this paper, which incorporates a special dynamic -dependence as can be given by It is worth mentioning that the chosen shifted-form for the LPV inputoutput representation in ( 9) simplifies the derivation of the equivalent LPV state-space representation for an arbitrary model order , which otherwise may become intractable as the model order is increased (see [29] for an introduction to LPV equivalence transformations).

State-space representation
Even though the proposed methodology does not rely on the statespace representation for model identification, it may be beneficial to present the equivalent state-space representation of the proposed model structure for the sake of completeness.A possible application of such representation can be given during the design of an extended Kalman Filter for the estimation of the battery SOC [30].Accordingly, the statespace representation of the shifted-form input-output representation in ( 9) can be obtained using the so-called cut-and-shift procedure as detailed in [31].The resulting state-space representation of the proposed model can be given by the matrix quadruple (, , , ) as follows where the special form of the dynamic -dependence considered in the input-output representation gets converted into the static dependence in the state-space representation.Moreover, since the above state-space representation is not in the modal (diagonal) form, the corresponding state vector does not directly represent the collection of voltages across the resistor-capacitor networks of an ECM.An important point to note here is that a physical interpretation of the identified resistors and capacitors can be quite misleading since there do not exist any established guarantees that such resistors and capacitors would not take ''physically impossible'' values during a black-box model identification procedure.Nevertheless, the existence of such a diagonal form can reveal the analytical relations describing the functional dependence of the corresponding resistors and capacitors on the scheduling variable .More formally, given the state-space representation (, , , ), the corresponding modal representation requires the formulation of the so-called dynamic eigenvalue problem [32] involving a possibly parameter-varying transformation matrix ( ⋄ )  as given by (, , , ) where the diagonalized matrix  should contain elements on the diagonal as real-valued functions, which can then be related to the resistor-capacitor networks of an LPV-ECM.Unfortunately, the above problem formulation requires diagonalization of a time-varying matrix, which is a non-trivial task, and therefore has been considered out of the scope of this paper.

Functional dependence of model parameters
Even though the -dependence of the model parameters {  } and {  } has been well recognized using either experimental observations or appropriate physical reasoning (see Section 3.2), the exact form of the functional dependence remains unknown.This inherently constitutes a black-box identification problem.In effect, the parameters {  } and {  } can then be expanded in a completely flexible manner by allowing numerously many possibilities for the so-called basis functions, namely polynomials, exponentials, (hyperbolic) trigonometric functions, and so on.Often, it is advisable to blend some physical reasoning with the choice of the basis functions to avoid the black-box approach and rather adopt a somewhat dark-greyish1 approach.Fortunately, some physical insight can be given in this case to relate the behaviour of the model parameters with the individual components of , that is, the SOC , the current magnitude ||, and the current direction .Accordingly, the basis function expansion in terms of each variable can be motivated as follows.
SOC .Generally, the LPV battery models proposed in the literature assume polynomial basis functions to explain the SOCdependence of the model parameters.A bit of careful experimental observations suggests that certain battery parameters, such as the internal resistance, may exhibit asymptotic blow-up behaviour as the SOC approaches zero.Therefore, the considered basis functions should exhibit a slope approaching infinity as the SOC approaches zero to model such behaviour -a property that the polynomials do not possess.Consequently, the basis function expansion in terms of polynomials would require several higherorder polynomial terms to model the battery behaviours at low SOCs.To overcome this issue, an extended class of polynomials referred to as fractional polynomials [34], is considered in this paper, which can include the terms containing fractional and negative exponents.More formally, the family of fractional polynomials can be given by {… ,  − 1 ,  − 2 , ln(),   3 ,   4 , …} where   ∈ R. The above set can be significantly restricted to only a few terms, namely { −1 , ln(), }, to essentially simplify the model identification problem.Such a restriction can be motivated using a hopeful argument that the inclusion of the  −1 term would lead to the generation of sufficiently representative candidate model terms, thereby allowing sparse model identification.Current magnitude ||.In a similar fashion as the SOC, it can be argued that the model parameters may exhibit increased nonlinear behaviour as the current magnitude is increased.Again, a careful examination of various function classes suggests that the introduction of exponential terms may result in a more parsimonious representation than using only the polynomial terms.For instance, using a term ||  for  ≥ 1 as a possible basis function would result in a candidate model term ||   which is less than  for the values of  between 0 and 1.Consequently, at least two terms are needed to explain the nonlinear behaviour related to higher current values, for instance,  + || for some values of ,  ∈ R. Ideally, a single candidate term that behaves linearly for smaller current values and increases nonlinearly for higher current values may be more representative.Such a functional behaviour can be achieved by considering a term of the form  ||  as a basis function, which would result in a candidate term  ||   possessing the desired behaviour, that is,  ||   >  for  = (0, ∞] and ,  > 0. Now, it can be easily examined that for  =  = 1, (1) the function  ||  exhibits severe nonlinear growth, and (2) the range of  for which the function can be considered practically linear is quite small, which may not be sufficient to explain the linear battery operating regime.A possible solution can be given by using a comparatively less nonlinear function, such as   √ || , where  acts as the normalization factor such that the linear battery operating regime can be matched with that of the function itself.Accordingly, a suitable basis function set chosen in this paper can be given by {  √ || } with  = 1  20 to explain the current-magnitude dependence of the model parameters.Note that when the battery models are expected to explain severe nonlinear behaviour, which can be the case for an aged battery, the functions such as  ||  with  ≥ 1 may serve as appropriate basis-function candidates.Current direction .A relatively straightforward approach has been adopted to explain the current-direction dependence of the model parameters, that is, only the linear terms of the form  [ 0 , 1 ] have been studied.In principle, a two-dimensional manifold can be searched to generate the -trajectories for various values of  0 and  1 .However, only the values ( 0 ,  1 ) = (0.01, 0.99) have been considered herein to approximate the obvious direction-dependent battery behaviour.Accordingly, the basis function set can be given by { [0.01,0.99]}.
Once the suitable basis functions have been selected for each component of the scheduling variable, an extra step can be taken to generate the combinations of the individual functions with replacement.More formally, considering  as the nonlinearity order and taking an arbitrary value  = 3, a dictionary (set) of basis functions which includes the combinations can be given by where the cardinality || =  represents the number of elements in .
Note that the redundant combinations in  can be discarded, such as  −1 ,  −1 ln(), and so on.Furthermore, the combinations containing multiple -trajectories have also been ignored for the sake of simplicity.
It can be noted that even a quite conservative selection of the individual basis functions can lead to a massive dictionary of the possible basis functions when the cross-terms (or combinations) are allowed.Finally, the parameters {  } and {  } can be expanded in terms of the basis functions as given by where {  } and {  } represent the to-be-determined free model parameters, and  () the th element in .
Remark.The incorporation of physical insight during the formulation of the dictionary  reduces the flexibility (or complexity) of the model class through the restrictive inclusion of certain basis functions, thereby rendering the black-box identification setting into dark-greyish.As pointed out in [33], system identification is not just a science but an art of solving real problems, and using adequate physical insights ''to shrink the model set as much as possible" may protect against the failure of a model identification attempt.

Optimal experimental design
During the model identification procedure, an important concern related to designing optimal experiments arises so that the quality of the resulting parameter estimate can be maximized.Several design variables can be considered while formulating the optimal experimental design problem, which include the input frequency spectrum, the sampling interval, and the size of the collected dataset [35].In this paper, an analytically motivated suggestion presented in [36] (see Result 4.1) has been readily adopted as the guiding principle, which advocates for the design of an identification experiment to resemble the intended model application ''as much as possible''.Herein, the problem has been confined specifically to designing an input signal with appropriate frequency-domain and time-domain properties, while considering the model application as in a practical battery management system.
The explicit realization of the desired input signal for the identification experiment can be motivated by formulating several arguments based on the frequency spectrum and the amplitude distribution of a typical drive cycle.Ideally, the input signal should appropriately excite all the battery dynamics corresponding to the expected frequencies during a practical drive cycle at each point in the scheduling space.From the time-domain perspective, since the practical drive cycles involve varying current amplitudes, the amplitude distribution of the input signal should resemble that of the drive cycle to capture the nonlinear effects to the relevant extent.Likewise, the input signal should also contain sufficiently large regions of zero current (  ∼ 0) to explore the necessary diffusive dynamics inside the battery.It may be worth mentioning that even though some works have quite rightly advocated against the use of pulse current tests during the model identification intended for practical BMS applications [16,37], the absence of appropriate zero-current regions in the input signal can suppress the relevant dynamics during the identification experiment, thereby affecting the quality of the parameter estimate.
In light of the above statements, a possible input signal design can be given using a rectangular pulse train (RPT) , 0 with period  and zero-input regions of length  0 as follows where  represents the stochastic input signal having the following characteristics.Namely, (i) the initial amplitude of the pulse train can be specified using a fixed value  0 , (ii) the time period of each pulse in the pulse train can be given by a random variable sampled from a normal distribution  ∼  (  ,   ) such that the frequency spectrum of sufficiently many pulses resembles that of a practical drive cycle, (iii) a zero-current region of length given by a random variable  0 ∼  (  0 ,   0 ) can be inserted following each phase of the pulses in the pulse train, (iv) a zero-mean random signal  ∼  (0,   ) can be superimposed on the pulse train to excite the high-frequency dynamics, (v) a fixed DC offset  can be introduced so that the pulse train effectively discharges or charges the battery, thereby spanning the entire SOC range during the identification experiment, and (vi) the amplitude of the final signal can be modified by a random variable  ∼  ( − ,  + ) to match the resulting amplitude distribution with that of the drive cycle.Note that the random signal  and the offset  become piecewise zero during the zero-current regions, thereby not affecting the rest period following each phase of the pulses in the pulse train.Moreover, from the practical perspective, the offset  should be small enough so that the battery experiences a sufficient number of pulses for an SOC range wherein the model parameters {  }, {  } as functions of only the SOC can be regarded constant.Admittedly, such nonanalytical arguments cannot lead to a unique optimal input signal, and several other signal designs can be motivated, for instance, Nevertheless, this paper demonstrates the utility of the signal  as it leads to adequate identification results, which will be discussed in Section 7.

Model selection and estimation
Given an informative experimental dataset and a set of suitable basis functions  as given in (12), the subsequent steps in a model identification procedure can be given by the model selection and the estimation.The former step includes the model-order selection as well as the selection of the significant model terms -also referred to as the variable selection, feature selection, or support recovery [38].Indeed, the model selection procedure becomes redundant when the model structure is known a priori, that is, in the case of white-box or lightgreyish identification settings where appropriate physical insights are available.However, in the case of black-box identification, the modelselection step becomes important to ensure only the relevant terms in the model structure while respecting the principle of parsimony.In the following, the model selection problem will be limited to variable selection while assuming the model order  to be given.
Various methods have been proposed in the literature to tackle the variable selection problem, which include (i) the exhaustive subset selection methods which identify the best subset by comparing all possible combinations of the candidate model terms using a suitable quantitative measure, such as the Akaike or the Bayesian information criterion, (ii) the stepwise selection methods which iteratively add the significant terms or drop the insignificant terms from the model structure based on some quantitative measure, and (iii) the shrinkage methods which solve a penalized regression problem forcing the estimated parameters corresponding to the insignificant terms towards zero.Understandably, the best subset selection approach quickly becomes intractable as the number of candidate model terms increases.Furthermore, it has been reported that the shrinkage methods, such as the Least Absolute Shrinkage and Selection Operator (LASSO), may parallel with or even outperform the subset selection and the stepwise selection methods for variable selection [39,40].In this paper, the LASSO has been employed, which shrinks the parameter vector such that several parameters turn out to be exactly zero, thereby performing the desired variable selection automatically.Accordingly, the inputoutput representation in ( 9) can be used to consider a linear regression setting as given by, where  ∈ R ×1 represents the collection of the output measurements {  }  =1 ,  ∈ R × the collection of the regression vectors {  }  =1 ,  ∈ R  the unknown parameter vector, and  ∈ R ×1 the error component.In addition, the regression vector   ∶ Z → R  contains all the candidate model terms which can be generated using the set , that is, where ⊗ denotes the Kronecker product,   ∶ Z → R 2+1 represents the vector containing the output and the input terms,   ∶ Z → R  the vector containing the basis functions, and  = (2 + 1).Furthermore, the parameter vector  can be expressed in terms of the free model parameters as given by where {  }, {  } correspond to the free model parameters.Accordingly, the variable selection involves filtering out the columns of the matrix  corresponding to the insignificant model terms, which can be carried out using LASSO resulting in several parameters   and   in (13) to be exactly zero.More formally, the LASSO regression can be given by where  1 is the tuning parameter controlling the amount of shrinkage (or the regularization) via the  1 -norm, and θLASSO the LASSO estimate.Note that a suitable value of the parameter  1 can be determined using the -fold cross-validation approach [41].Furthermore, given  ′ <  nonzero values in the obtained LASSO estimate θLASSO , the corresponding  ′ model terms can be regarded as the selected model terms.Unfortunately, several issues have been reported which render the LASSO estimate as unsuited for the final model estimate, namely (1) the selection of noisy terms in the model structure [42], and (2) the large bias in the estimate θLASSO due to constant shrinkage, which can affect the model prediction capabilities [43].Accordingly, several LASSO variants have been proposed in the literature which augment the original LASSO with a second estimator, such as relaxed LASSO [42], adaptive LASSO [44], LARS-OLS hybrid [45], and LASSO-Ridge [46].Herein, the ridge regression has been carried out following LASSO to mitigate the aforementioned issues, which has then been considered to yield the final model estimate.Formally, a filtered regression vector  ′  ∶ Z → R  ′ can be constructed using the selected model terms leading to the following model estimate, where  2 is the tuning parameter controlling the amount of shrinkage via the  2 -norm,  ′ ∈ R × ′ the collection of the regression vectors { ′  }  =1 , and θRidge the final model estimate.Note that the parameter  2 can also be determined similarly to  1 using the cross-validation approach.

Experimental setup
In this paper, a non-commercial lithium-ion NMC/graphite pouchshaped battery cell with 1 Ah rated capacity has been studied.The battery was kept in an isothermal environment at 25 • C using an Espec SU-642 environmental chamber having temperature control of 0.1 • C. In addition, the battery was connected simultaneously with a Delta Elektronika SM 30-200 power supply and an H&H ZS electronic load to either charge or discharge the battery, respectively.Furthermore, GITT experiments to characterize the battery EMF function  EMF were carried out using 0.1 C current pulses corresponding to roughly 1% SOC followed by adaptive rest periods, such that the variation in the battery terminal voltage  at the end of each rest period was  < 1 mV/h.Moreover, the upper and lower voltage cutoff limits were set at 4.2 V and 2.5 V, respectively, during the application of both the identification as well as the test-input current profile.Finally, the battery current and voltage measurements were recorded with the sampling time  = 1 s.

Results and discussion
Several battery models with varying nonlinearity order  and model order  have been identified to generate a two-dimensional grid of the battery models  , with the corresponding values ,  ∈ {1, 2, 3, 4}.Fig. 2 shows the current profile used to identify the models  , , which can be considered as a realization of the input signal design  given in (14) with the associated parameters given as  0 = 0.4 A,   = 64 s,   = 32 s,   0 = 16 s,   0 = 8 s,   = 0.1 A,  + = 1,  − = 0.1, and  = −0.2 A. Note that the parameters have been chosen roughly keeping in mind the specifications of the studied battery and the arguments presented in Section 4. Subsequently, the voltage simulation capabilities of the identified models have been compared using a test dataset comprising of a real drive-cycle current profile as shown in Fig. 3. Table 1 summarizes the obtained results quantitatively using the root-meansquare error (RMSE) and the mean absolute error (MAE) corresponding to each model.In addition, the RMSE and MAE values for certain models { 2,1 ,  2,2 ,  3,3 ,  3,4 ,  4,3 ,  4,4 } have been highlighted in Table 1, whose corresponding simulated voltage curves are shown in Fig. 4. Note that the grey regions in Fig. 4 represent severely nonlinear battery operating regimes since the lower cutoff voltage limit had been set purposefully lower than usual to demonstrate the usefulness of the proposed model structure.
Several remarks can be given in light of the results presented in Table 1 and Fig. 4. It can be seen that the increase in the model order  tends to improve the model simulation performance in general, which can be attributed to the application of efficient model selection and estimation techniques to mitigate the multicollinearity issues in an illconditioned regression setting.Another important thing to consider    is the possibility of the identification of unstable models which can diverge during the simulation, for instance, the models { 2,4 ,  3,3 ,  4,2 } in this case.A possible solution can be given by the restriction of basis functions for the autoregressive terms (the past output terms) to enforce stability during the model identification.Nevertheless, the stability of simulations has been deemed as an open area in system identification [33] and is not explored further in this paper.The utility of selecting the basis functions based on physical insight can be further emphasized by identifying another set of battery models  poly , using conventional polynomial basis functions for explaining the SOC-dependence of the model parameters, similar to the approach adopted in [12].More precisely, the models  poly , can be identified in a similar fashion as the models  , using the same basis functions for the current magnitude || and the current direction  but with different basis functions for the SOC , that is, {}.Note that the nonlinearity order  allows the generation of the higher-order SOC terms up to the degree , for instance, the term  3 can be generated using  ≥ 3. Table 2 summarizes the simulation performance of the identified models quantitatively using the test input shown in Fig. 3, which can be directly compared with Table 1.It can be seen that the error values corresponding to the models  poly , are multiple orders of magnitude larger than that for the models  , due to the inability of the terms   for  ≥ 1 to sparsely explain the battery voltage behaviour.Furthermore, the increase in the nonlinearity order  and the model order  does not necessarily improve the error values but instead can result in unstable model behaviours.Indeed, a battery model containing several higher-order SOC terms of the form   for  ≥ 1 can be identified in a non-sparse fashion (in this case, by skipping the LASSO regression), which may result in improved voltage simulation accuracy.Such an argument can be confirmed by identifying another set of models  poly,ridge , in a similar fashion as  poly , but employing only the ridge regression, thereby skipping the variable selection using LASSO.Table 3 summarizes the corresponding simulation performances of the models  poly,ridge , showing that (1) the corresponding error values are significantly lower than that for the models  poly , , and (2) the increase in the nonlinearity order  generally improves the simulation performance.Moreover, it can be argued that further increase in the value of  beyond  = 4 may lead to additional improvements in the voltage simulation accuracy for the models  poly,ridge , .However, based on our initial presumption about the principle of parsimony, the models identified in a non-sparse fashion may not be close to the true mathematical description of the battery as compared to a well-performing sparse model.Thus, it can be concluded that the usage of conventional polynomial basis functions to explain the SOC-dependence cannot lead to sparse battery models having appropriate simulation capabilities.
Another important contribution of this paper, namely the design of a synthetic identification input signal, can be analysed by comparing the models  , with another set of battery models  dc , identified using a real drive-cycle current profile as considered a common practice in the literature [12,16,47].Fig. 5 shows the drive-cycle current profile used to identify the models  dc , .Table 4 summarizes the error values between the simulated and the measured voltage curves for the models  dc , using the test input shown in Fig. 3. Using Tables 1 and 4, it can be seen that the models  dc , perform quite poorly as compared to the models  , .In essence, the usage of real drive-cycle current profiles can be risky since the informativity of different drive-cycle datasets   can vary, and therefore, a sufficiently informative drive-cycle dataset is needed for a successful battery model identification attempt.Consequently, if a real drive-cycle dataset needs to be used for the model identification (and update) procedure, then the informative segments of the dataset should be selected instead of using the whole dataset as is.Alternatively, a synthetic signal design, such as the one proposed in this paper, that excites all the relevant dynamics can be used instead of a real drive-cyle dataset to obtain suitable battery models.
Finally, as hinted previously in Section 3.4, the translation of the identified LPV model into the ECM components requires diagonalization of the parameter-varying matrix (  ).Fortunately, contrary to the remaining ECM components, the feedthrough resistance (subsequently denoted with  0 ) can be directly related to the scheduling variable  using an analytical relation, that is,  0 (  ) = (  ) = (  ) =  0 (  ).For instance, the functional relation for  0 corresponding to the model  4,3 has been identified as follows  0 (  ) ≈ 0.104 + 0.019 −1  + 0.058 where  0 does not exhibit dependence on the current-direction  for the set of dynamics excited during the identification experiment.All in all, it can be remarked that a successful identification attempt has been demonstrated in this paper aided with suitable physical insights to yield sparse battery models.It must be emphasized again that the sparsity of an identified model may not always lead to a better approximation of the true system, and therefore should be accompanied with appropriate prediction/simulation capabilities to enhance the trust of the user on the identified model.

Conclusions
In this paper, a battery model structure has been proposed by extending the archetypal equivalent-circuit model (ECM) structure in the linear time-invariant (LTI) framework to the linear parameter-varying (LPV) framework.Subsequently, the need for a sufficiently informative experimental dataset has been emphasized in light of the widely accepted notion that the identification experiments should resemble the model application as much as possible, thereby rendering the model application as interpolation and not extrapolation.Furthermore, the principle of parsimony has been embraced to obtain the simplest possible battery model for a given set of dynamics using a model identification strategy that exploits the sparsity-inducing properties of the  1 -norm penalization (LASSO).It has been shown that the increase in the model order generally enhances the simulation capabilities of the identified battery models -a result that intuitively makes sense but often remains contested in the literature.Finally, the proposed framework can be extended to investigate several other factors affecting the battery behaviour, namely temperature and various ageing phenomena, and will be considered for future studies.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.Charge and discharge EMF curves as a function of SOC for the 1 A h lithium-ion NMC/graphite battery cell studied in this paper (see Section 7) obtained using GITT experiments.
A.M.A.Sheikh et al.

Fig. 2 .
Fig. 2. A realization of the proposed input signal design  used to identify the battery models; (b) shows the highlighted portion of the current profile in (a).

Fig. 3 .
Fig. 3.A real drive-cycle current profile used as the test input to validate the identified battery models.

Fig. 5 .
Fig. 5.A real drive-cycle current profile used to identify the battery models  dc , .
and (  ⋄ ) ∶ P → R are the -dependent model parameters, and   and   the model orders with   ≥   ≥ 0. An important point to note here is that a parameter   is considered to have static dependence on  if it depends only on the instantaneous value of , i.e., (  ⋄ )  =   (  ), otherwise, the dynamic dependence is implied.The same holds for a matrix in the state-space representation above, for instance, ( ⋄ )  = (  ) implies static -dependence.

Table 1
Error description of the simulation results corresponding to the battery models  , with model order  and nonlinearly order .

Table 2
Error description of the simulation results corresponding to the battery models 

Table 3
Error description of the simulation results corresponding to the battery models 

Table 4
Error description of the simulation results corresponding to the battery models  dc