Battery digital twins: Perspectives on the fusion of models, data and artiﬁcial intelligence for smart battery management systems

of

Effective management of lithium-ion batteries is a key enabler for a low carbon future, with applications including electric vehicles and grid scale energy storage. The lifetime of these devices depends greatly on the materials used, the system design and the operating conditions. This complexity has therefore made real-world control of battery systems challenging. However, with the recent advances in understanding battery degradation, modelling tools and diagnostics, there is an opportunity to fuse this knowledge with emerging machine learning techniques towards creating a battery digital twin. In this cyber-physical system, there is a close interaction between a physical and digital embodiment of a battery, which enables smarter control and longer lifetime. This perspectives paper thus presents the state-of-the-art in battery modelling, in-vehicle diagnostic tools, data driven modelling approaches, and how these elements can be combined in a framework for creating a battery digital twin. The challenges, emerging techniques and perspective comments provided here, will enable scientists and engineers from industry and academia with a framework towards more intelligent and interconnected battery management in the future.

Introduction: scale of the challenge, where we are and the potential
Reducing greenhouse gas emissions and the onset of climate change is a global priority. One essential technology, in our low carbon future, is the lithium-ion battery (LIB) which enables applications ranging from electric vehicles to grid scale energy storage for balancing of renewable electricity from wind and solar. However, barriers to wider adoption of the technology still exist, including: limited energy density, poor life- Vehicle-to-Grid thus a higher system cost. Therefore, understanding, quantifying and predicting battery performance in real-world conditions is essential for future consumer electronics, electric vehicles and grid energy storage batteries.
The simplest approach to operating a battery safely, limits its operation within manufacturers prescribed voltage, temperature and current values. However, these are often overly conservative and are often sub-optimally defined, leading to inefficiencies and premature failure. A more attractive approach to control, is through model driven methods which focus on State-of-X (SOX) estimation. Here key states include: SOC, state-of-available power (SOAP) and state-of-health (SOH), which leads to the estimation of the remaining useful lifetime (RUL). In recent years, these state estimation approaches have been extended to increasingly physical metrics such as individual electrode potentials and lithium concentration profiles which can be used to optimise functionality such as fast-charging [2] . By then coupling these states with models which describe degradation processes, dynamic limits can be imposed which account for variable real-world operating conditions.
Within the academic literature, the most commonly captured degradation processes in LIBs are: growth of the solid-electrolyte interphase (SEI) layer, lithium-plating, particle cracking, pore clogging and active material dissolution [3] . However, one of the challenges with model driven approaches is that the parameters, are often difficult to measure and subtly vary due to manufacturing inconsistencies [4] . When this is compounded by degradation mechanisms, which aren't captured in the model, this can lead to significant variation in predicted RUL.
In recent years, given the challenges around model parameterisation and the highly non-linear and coupled nature of battery degradation processes, researchers have renewed their efforts in data-driven approaches for state-estimation problems. Commonly used approaches have included methods such as linear regression, decision trees and artificial neural networks (ANN) as forms of machine learning (ML). In this sense, ML methods are defined as algorithms which are able to take data, learn about its behaviour and ultimately improve the performance of the system automatically. However, a major drawback of these ML approaches is the vast amount of experimental training data needed to produce an accurate model, and also the reliability of these approaches beyond the experimental training set.
Within this context, there has been interest in fusing model-driven and data-driven approaches into hybrid models that combines the best aspects of both, as well as leveraging deeper electrochemical diagnostics. Furthermore, there is an emerging opportunity to increase the volume and diversity of data collected from real-world battery systems due to the rapidly increasing uptake of battery electric vehicles (BEVs) with high degrees of connectivity. However, challenges with this include the highly dynamic data that will be collected compared to the traditional well posed lab experiments used to parameterise these models. Furthermore, challenges around how to handle the volumes of data and its effective curation remains a challenge in a similar vein to other big data problems.
Thus, there are huge potential benefits for more intelligent control of LIBs with exciting developments in model driven control, ML approaches and big data sets with a wealth of information. However, todate these developments have mostly been analysed and developed in isolation. With the recent advances in ML, data science and internet-ofthings (IOT), the concept of a digital twin has emerged, whereby a digital twin is a digital replica of a physical entity with a close connection between the two. Given the usage dependant degradation, and highly non-linear behaviour of LIBs, there is thus an opportunity to create a battery digital twin framework which fuses data, models and artificial intelligence (AI) for next generation energy storage devices. This is diagrammatically represented in Fig. 1 .
This perspectives paper thus covers: the functional requirements of LIBs, factors impacting their performance, modelling and control aspects of batteries ( Section 2 ), current and emergent on-board sensing and diagnostic techniques ( Section 3 ), applications of AI to LIBs ( Section 4 ) and how these individual elements can be combined together to create a battery digital twin, potential benefits of this approach and remaining challenges ( Section 5 ). The aim of the paper is not to provide an exhaustive review of papers in this area, rather it is to highlight key works and concepts, and curate them towards a roadmap for more intelligent battery management.

Models: current state-of-the-art in battery modelling and control
A LIB consists of an anode, cathode and electronically insulating separator, which are all porous to allow for the infiltration of an ionically conductive electrolyte. These are then sandwiched in between metallic current collectors. Whilst there is an increasing diversity of electrode materials, most batteries currently use a graphite anode and a transition metal oxide cathode which is typically nickel manganese cobalt oxide (NMC), nickel cobalt aluminium oxide (NCA), lithium iron phosphate (LFP) or lithium cobalt oxide (LCO). In its fully charged state, lithium is intercalated into the anode. When the battery is discharged, lithium at the surface of the anode, deintercalates and becomes a lithium-ion and electron. The lithium-ion is then able to diffuse through the electrolyte towards the cathode, however the electron has to flow through an external circuit where useful electrical work can be extracted. At the cathode, the lithium-ion and electron are recombined and intercalated into the transition metal oxide cathode. This process is then reversed for charge.
For effective management of these batteries; performance, lifetime and safety are the 3 core considerations. Intelligent control of a battery system leverages off a battery management system (BMS) which is able to sense its environment, understand its current/future state and thus be able to adapt. This level of AI is essential for next generation energy storage devices to enable functionality such as fast charging and multiple use cases such as combined automotive load profiles and vehicle-to-grid (V2G) power management. In order to achieve these functions, different states need to be estimated in the battery, with the 3 most basic ones being SOC, SOH and SOAP. Furthermore, the impact of environmental operating conditions also needs to be factored in when conducting state-estimation as well as different chemistries, form factors and cell types (power vs energy). This section thus outlines the key factors affecting battery performance, approaches to estimate internal states and challenges in traditional purely model driven approaches.

Key functionality and factors affecting performance
For automotive applications, batteries have to be able to meet the demanded load profile from a drive cycle. In the case of pure BEVs, the average discharge rate (C-rate) is relatively low, however peak charge/discharge rates can be much higher due to acceleration/deceleration events. In hybrid vehicles, such as the Toyota Prius which combines a LIB with an internal combustion engine, these C-rates can be considerably higher. The challenge with these high C-rate events is that they can accelerate the degradation of the battery. This is especially important as the current trend is towards batteries with a higher specific energy (Wh/kg), where one of the solutions is the use of thick electrodes. Conventional LIB electrodes have thicknesses typically in the region of 50 m, and by increasing the thickness to 100 m, cell specific energy can be increased by ca. 25% [5] by removing electrochemically inactive phases such as the current collectors and separators. However, thick electrodes often suffer from poor lifetime, due to mechanical instabilities. When a LIB is charged/discharged the electrode materials undergo volume expansion which can lead to mechanical fracture of the electrode, and accelerated degradation. At high C-rates, this fracture typically occurs more aggressively at the electrode-separator interface due to uneven the reaction current density associated with long ionic diffusion pathways in the electrolyte [6 , 7] . Therefore, as electrode thicknesses increase, the criticality of understanding and minimising these stresses becomes increasingly important.
Another key consideration is that LIBs are highly sensitive to their operating temperature, and thermal gradients, which primarily impacts the resistance of the cell [8] and the rate of degradation [9] . The sources of these resistances generally include electronic, ionic and charge transfer resistance. These typically have characteristic time constants, with the high frequency component characterised by the electronic and ionic resistance, medium frequency components representative of charge transfer reactions and low frequency components characteristic of diffusion processes.
The SEI layer, which is often cited as one of the main degradation modes, consumes cyclable lithium through the formation of a passivating film on mainly the anode, but also to a smaller extent, the cathode. Furthermore, this SEI layer generally restricts the movement of lithiumions and electrons, thus increasing the resistance of a cell. The rate of this parasitic side reaction, is highly temperature and SOC dependant with the most detrimental impact occurring at high temperatures and high SOCs, or low anode potentials.
However, despite these challenges, LIB lifetimes have made significant advances with notable work from Harlow et al. [10] who presented a comprehensive analysis of a LIB which has a widely reported "million mile " lifetime. This was achieved through a combination of single crystal cathode particles, which exhibit superior resistance to mechanical fracture, and also operating the cell in an optimum temperature. Thus, motivated by these improvements in battery lifetime, and the low utilisation of a consumer BEVs, there are increasing opportunities and renewed interest in V2G applications. This has typically been perceived as having a negative impact on the battery lifetime due to the additional cycling, however, if controlled in an appropriate way, can lead to benefits. Uddin et al. [11] for instance, highlighted the potential lifetime improvements that V2G could have on a BEV battery pack if the time spent at high SOCs could be reduced. However, in practice the effective implementation would require addition predictions on grid-side load requirements, consumer constraints of the vehicle and a deeper understanding of optimum operating conditions as the battery degrades.
Furthermore, as consumer uptake of BEVs continues, automotive manufacturers are targeting reduced LIB charging times. However, higher current charging leads to increased rates of degradation due to lithium metal plating on the anode and also the formation of dendrites which have the potential to short-circuit the anode and cathode [2] . This problem is exasperated at low temperatures where the mobility of lithium/lithium-ions is reduced and also with the development of thicker electrodes. Efforts towards solving this problem have centred around understanding the behaviour of lithium-plating, which occurs when the anode potential falls to below 0 V vs Li/Li + . This is influenced by the solid-state lithium diffusivity, ionic conductivity of the electrolyte, exchange current density and electronic conductivity of the electrode. Whilst the solid-phase lithium diffusivity is often quoted as one of the rate limiting step, lithium-ion depletion in the electrolyte should also be noted due to the significantly larger diffusion distances. This is especially true as the trend towards thicker electrodes continues.
The most common approach to battery charging is the constant current-constant voltage (CC -CV) profile. Here an initial constant charging current is applied up to an upper voltage limit which depends on the battery chemistry. After this voltage limit has been reached, a constant voltage hold is applied where the current decrease exponentially to allow sufficient time for the lithium to fully saturate the anode. One of the challenges with this approach however, is that it is difficult to measure the anode potential directly. The insertion of a reference electrode is possible, however this is impractical in cells to be industrially used due to the additional cost and also the challenges around minimising the influence of the reference electrode on the cell performance. Beyond the conventional CC -CV approach various other techniques have been explored such as multi-stage constant current-constant voltage, boost charging, constant power-constant voltage, pulse charging and the variable current profile [2] , however these still all have the same estimation challenges.
Beyond fast-charging, automotive companies are also including functionality that allows vehicles to operate briefly outside of their recommended temperature limits. Tesla for instance, have a "Ludicrous " mode which allows normal operating power limits to be exceeded for a short period of time. This enables above average acceleration by allowing the cells to operate at a higher temperature, however this cannot be maintained and comes at the cost of battery lifetime.
Therefore, it can be seen that the demands of batteries in terms of the expected cost, lifetime and performance are increasing year-on-year. The Faraday Battery Challenge in the United Kingdom, for instance, have set targets for 8 key metrics needed of automotive batteries from current values to 2035 values which include: cost (130 $/kWh to 50 $/kWh @ cell level, 280 $/kWh to 100 $/kWh @ pack level), energy density (700 Wh/L to 1400 Wh/L, 250 Wh/kg to 500 Wh/kg), power density (3 kW/kg to 12 kW/kg), safety (eliminate thermal runaway at pack level), 1st life (8 years to 15 years), temperature ( − 20°− 60 °C to − 40°to 80 °C), predictability (fully predictive models for performance and ageing) and recyclability (10-50% to 95% @ pack level) [12] .

Modelling approaches
To address the issues of performance and lifetime prediction of LIBs, models are often used. These typically describe the voltage response to a current load and the evolution of the capacity/resistance over the cells lifetime. A diversity of approaches currently exist, some of which only describe a single attribute and some which describe both. One of the most common forms of battery model is the equivalent circuit model (ECM), whereby the voltage response of a battery is replicated using a combination of resistors and capacitors as well as a voltage vs capacity profile described by a look-up table. Different levels of complexity in ECMs can be found in the literature with most of these characterised by the number of resistor-capacitor (RC) pairs, with higher number of RCpairs generally replicating the battery voltage better. However, these ECM models generally lack physical meaning, making them generally unsuitable for control applications requiring the estimation of internal physical states such as electrode potentials. An exception to this, is that some researchers have also been developing physics based ECMs, where the models are constructed in such a way to describe physical processes such as diffusion [13 , 14] . Whilst, this approach is promising, this has yet to be widely adopted in practice.
A more comprehensive approach to describing the performance of LIBs is through the use of continuum level physics models which were pioneered by Doyle, Fuller and Newman [15] . In these physics based models, equations which govern the mass and charge transport in both the electrode and electrolyte phases are coupled to a Butler-Volmer equation which describes the local reaction current density. The complexity of this model comes in various forms with the most common variant termed the Pseudo-2D (P2D) model which describes the through thickness localised behaviour of the battery. In recent years, the accessibility of these models has increased with the publication of open-source versions of these codes with notable examples including the Python Battery Mathematical Modelling platform (PyBaMM) [16] and LIONSIMBA [17] .
Whilst, this P2D modelling framework is very powerful in its ability to describe the local properties of a cell, it comes with increased computational cost and a large number of parameters which need to be extracted. This was highlighted by Chen et al. [18] who presented a comprehensive parameterisation of a NMC811-Gr/SiOx cell, whereby 35 different parameters, broken down into physical, chemical and electrochemical parameters, for the electrode and separator/electrolyte were extracted. However, in order to get these values, a range of characterisation tests were needed, many of which were destructive. Similar efforts were presented by Ecker et al. [19] where they also noted variations in measured electronic conductivity of their electrodes with that of literature values, the criticality of the solid phase diffusion coefficient and the importance of having this vary with the state of lithiation in the electrode. These extensive and destructive sets of tests, therefore, makes it impractical to parameterise every cell, even though it is known that slight cell-to-cell variations exist.
However, whilst there are many parameters in these electrochemical models, some of them are more critical than others. Towards understanding this, Li et al. [20] investigated how identifiable these P2D models are by conducting a sensitivity study on a NMC-Gr cell by looking at 26 physical parameters. These parameters were broadly categorised into geometric, transport, kinetic and concentration based parameters, and their sensitivity was compared to not only the terminal voltage but also the anode potential and surface/bulk SOC of the cathode, which are key metrics for a model driven control system. Whilst their results indicate increased identifiability at high C-rates and a general insensitivity to electrolyte and separator parameters, the authors acknowledged that these findings may differ for different chemistries. This highlights the lack of universal approaches.
Another challenge also associated with the parameterisation of these models is that, whilst general methods for extracting parameters are widely used in the field, their implementation varies from user to user which can impact the end result. This particular issue was highlighted by Nickol et al. [21] who investigated the general assumptions and the different data analysis approaches for extracting the solid phase diffusion coefficient. Here, it is noted that at lower temperatures and higher Crates of pulsing, different commonly used methods resulted in diffusion coefficients that vary by over an order of magnitude even with the same data. Furthermore, the authors also noted the high degree of polarisation in half-cells due to the lithium counter electrode which cause significant discrepancies at higher current densities. This therefore highlights the conflicting challenge that whilst the system becomes more identifiable at high C-rates, as suggested by Li et al. [20] , half-cell measurements suffer from influences in the lithium counter electrode, if not compensated for with a reference electrode.
Motivated by this, a number of variants to the P2D model have since been created such as the single particle model (SPM) which simplifies the electrode domain into a single particle. However, whilst this reduces the computational burden by removing the iterative step in calculating the reaction current density distribution, this generally limits the accuracy of the SPM to < 2C. However, to overcome this problem, authors have modified the SPM to include the electrolyte concentration gradients back into the model towards a formulation termed the enhanced SPM (SPMe) which has superior accuracy compared to the SPM [22 , 23] .
For lifetime estimation, these approaches are divided into physics based, data driven and hybrid approaches [24] . Physics based approaches use a set of differential equations to describe the physical degradation mode happening in a battery. Whilst this approach has good rational for its construct, there does not exist a model which currently captures all degradation modes in a battery and of the modes that are captured, the parameters needed are often difficult to extract leading to inaccuracies over the lifetime of operation. Perhaps one of the most comprehensive descriptions of physics based battery degradation models is that of Reniers et al. [3] who presents and compares models with degradation mechanisms including: SEI layer growth, loss of active material, pore blocking, lithium plating, active material dissolution and lithium plating. However, in this work the authors use a SPM approach meaning localised degradation effects have not been captured which become critical at extremes of operation.
Many empirical and semi-empirical approaches exist which are precursors to the purely data driven approaches currently receiving extensive investigation. Given the electrochemical nature of battery operation there is clearly a correlation between cell lifetime with temperature and SOC. Here temperature dependency follows an exponential Arrhenius type relationship and SOC typically follows a more non-linear relationship. Whilst, these approaches are simple they tend to lose predictive power over their lifetime of use and in some cases are overly simplified. The definition of SOC for instance, can vary depending on the user and thus, relating degradation models with more physically relevant states is important moving forward. Schimpe et al. [25] for instance presented a comprehensive lifetime study on a LFP-Gr cell, whereby the SOC dependant calendar ageing was coupled to an anode potential curve, Tafel equation and Arrhenius rate law approach. This semiempirical approach has advantages that it captures some of the physics occurring in the cell at low-computational cost, however starts to lose predictive powers under more aggressive modes of operation.

State-estimation and control
In order to control the battery effectively, states are often needed as inputs which cannot be directly measured and therefore must instead be inferred from other measurements. SOC estimation is the most basic function and is an indicator of the remaining capacity in a cell. The simplest approach to estimate this value is based on Coulomb counting from a datum point (i.e. the fully charged 100% SOC point). Here, the current signal, which is most commonly taken from a shunt resistor or a Hall Effect sensor, is used to measure the extracted current and compared against the known overall capacity to give an estimate of the SOC. Whilst this approach is simple, there are a number of drawbacks of this approach. One of them is that the accessible capacity is often lower than the theoretical, especially at low temperatures and high C-rates. This is due to the slow diffusion of lithium in the solid phase of a LIB leading to additional losses. Furthermore, the Coulomb counting approach often suffers from signal noise issues, whereby the true current passed to the cell may differ from the measured value from the sensor.
Furthermore, there are challenges around SOC estimation when the system is being initialised. The simplest approaches to this problem of non-100% SOC initialisation is to compare the open circuit voltage (OCV) against a look-up table. However, this is very challenging in certain battery chemistries such as LFP since the OCV-SOC curve is generally quite flat and also suffers from a voltage hysteresis. Furthermore, as the cell ages, the accuracy of this approach generally decreases due to effects such as stoichiometric drift between the anode and cathode.
The most common approach to address the issue of sensor noise affecting the SOC estimation is the use of Kalman filters (KF). KFs have been used successfully in many engineering applications involving trajectory and guidance systems. Its usage for battery SOC estimation was popularised through a series of papers by Plett [26][27][28] . The SOC is appended as a time dependant variable to a state vector together with several internal voltage drops that occur in a battery. The terminal voltage is then used as a feedback corrector signal to iteratively update and estimate the SOC and its uncertainty. Unlike Coulomb counting, the feedback mechanism corrects for sensor noise and any incorrect initial conditions involved in Coulomb counting. The algorithms have since been modified to handle non-linear battery models and improve the SOC uncertainty estimate by using the unscented KF and particle filters.
The state-of-the-art SOC algorithms are focused on estimating the concentrations (or SOC) of the individual electrodes using terminal voltage as the feedback corrector signal. The problem however becomes that of estimating a concentration function (in space and time) rather than a finite dimension state vector. Techniques such as the backstepping partial differential equation observer have been developed to successfully track surface concentrations of the electrodes [22] .
During operation of the vehicle it is also important to be able to estimate the SOAP for the battery pack to inform the control system about the maximum accessible power. For vehicle applications this would impact the peak acceleration of the vehicle. However, whilst most SOAP algorithms mostly focus on keeping the cell with their voltage windows, temperature limits and current limits, more degradation aware algorithms are needed. Suthar et al. [29] for instance used a P2D model with additional modifications to include stress generation in the electrodes and used this as a control metric for testing and designing new fast-charging algorithms. Whilst, this is an interesting approach no experimental validation was provided.
In order to address the problem of lithium plating during fast charging, researchers have developed a number of model informed approaches to minimise the risk of this occurring. Physics based models which capture the electrochemical reactions occurring in the cell have recently been used to estimate the anode potential indirectly, which can then be used as a control metric. Mai et al. [30] for instance, used a P2D approach to investigate the limits of standard CC -CV charging algorithms by estimating the anode potential. They subsequently, use this approach towards proposing other alternative charging profiles such as a voltage-ramping profile and a multiple CC -CV profile. However, whilst promising, this work and others often do not capture all degradation modes nor fully validate the results. Furthermore, the challenges with this approach are reliable parameterisation of the model at not just the start of life but also as the cell degrades. This is detrimental to accuracy as it is known that as the cell ages, the plated lithium can cause a number of effects such as pore blockage, shifting of the electrode stoichiometry and loss of cyclable lithium. Furthermore, many lithium plating models implemented for control applications have focused on 0D, 1D or P2D approaches which do not capture the complexities of large form factor cells, where thermal gradients in the through-plane and in-plane directions can have a significant impact at the cell and pack level. This leads to further uncertainty in the results.

Challenges
Whilst significant progress has been made towards understanding factors which influence battery performance, developing models and control algorithms, these are underpinned by effective state estimation, where there are still significant challenges. For instance, whilst physics based models can provide insights into the inner operations of a battery, effective estimation of the numerous parameters inside the battery and how they evolve over time remains a challenge. This is further compounded by the fact that inherent cell-to-cell variations exist due to subtle differences during the manufacturing process, which may not be immediately apparent, however emerge later in their lifetime with the onset of non-linear and rapid degradation. Furthermore, these models are often sensitive to certain parameters such as the solid-state diffu- sion coefficient, of which researchers have inconsistent procedures for extracting their parameters. Thus, ensuring that these subtle inconsistencies in cells are captured and developing consistent and easily accessible parameterisation methods is critical for intelligent battery systems moving forward.
Solutions to these problems continue to be presented by the academic and industrial community, however the nature of the problem is also evolving. Advances in electrode synthesis methods towards single crystal NMC cathodes for instances have unlocked significant lifetime benefits in some applications, leading to the much reported "millionmile " battery [10] . Such claims have thus led to renewed interest in V2G integration which has the potential to increase the functionality of the battery but creates a more complex load profile. This level of intelligent control becomes increasingly important as the specific capacity of cell increases due to increasing electrode thicknesses and also larger form factor cells, which minimise the inactive parasitic mass in a cell.

Data: on-board sensing and diagnostics
A critical element of an intelligent battery system is what data can be collected about the system and what information can be inferred from its analysis. Furthermore, as ML approaches become increasingly applied, the quality and diversity of data vectors becomes a critical enabler. Whilst there are a wide range of spectroscopic techniques which are available to characterise and understand the dynamic behaviour of battery materials, the vast majority of these are limited to lab based studies. This is because they require special cells, use expensive equipment and/or require the disassembly of the cell. Thus, for real world implementation, low-cost and portable sensing techniques are needed. Currently, in the majority of applications, voltage, current and temperature are data types which are collected. Whilst this might seem limited, with the right excitation, and post-processing, a wealth of electrochemical insights can be inferred. Moving forward, the synthesis of this data with other data types such as stress and strain could be powerful additions for state estimation. This section therefore presents the current state and potential perspectives of what insights can be inferred from current and emerging on-board sensing techniques. Diagrammatically, these are highlighted in Fig. 2 .

Current approaches
Faults in cells/packs can occur during use due to a number of factors including mechanical, electrical and thermal abuse [31] . The ability to be able to detect this would therefore allow users to either adjust operating limits or notify users to service their vehicle before catastrophic failure occurs. One of the simplest approaches to detect a battery module fault is the current interrupt technique. Here the voltages of parallel cell strings are monitored, and a current pulse is applied. By comparing the voltage before and during the pulse, a differential resistance can be calculated. By using this approach, faulty interconnection resistances in a 12P7S pack could be identified in the work by Offer et al. [32] . If these problems are not detected significant load heterogeneities can build in battery packs causing accelerated degradation [33 , 34] . However, one of the challenges with this approach in estimating the differential resistance is the pulse duration to be taken. Here, the resistance of a cell is frequency dependant, with a larger polarisation resistance experienced with longer pulses. This parameter sensitivity was highlighted by Nickol et al. [21] when conducting galvanostatic intermittent titration tests (GITT) for the extraction of diffusion coefficients, where varying values were found for different applied currents. Thus, unless measurement conditions are standardised, inconsistencies in the measured resistance will emerge.
Another commonly used variant of these pulsed approaches is the hybrid pulse power characterisation (HPPC) test whereby the resistance as a function of SOC is determined by pulsing the battery with both charge and discharge currents at varying C-rates. This approach is commonly used to parameterise ECM models, where a regression is then used to fit each pulse to the RC elements in the ECM. These RC elements are then stored as static look-up tables for state-estimation. Whilst, this is a simple and easy approach, the true resistance and capacitance evolves as the cell ages, reducing the accuracy of the ECM prediction if not adaptively updated.
Thus, many efforts are now focusing on understanding physical degradation modes. Here slow rate discharge/charge data can be used to infer information about the anode and cathode in a cell. The origins of this technique are based on the premise that the cell voltage of a battery consists of the difference between the anode and cathode. For each of these electrodes, they undergo various electrochemically driven phase transitions as they are lithiated/delithiated. In general, where a phase transition occurs there is a plateau in the voltage-capacity curve which is often inferred from the differential voltage data. If half-cell curves for the electrodes can be established, cathode and anode specific degradation modes can typically be decoupled. Here the most common modes are loss of lithium inventory (LLI), loss of active material in the anode during delithiation/lithiation (LAM an(de) , LAM an(li) ) and loss of active material in the cathode during delithiation/lithiation (LAM ca(de) , LAM ca(li) ).
This differential voltage analysis (DVA) or incremental capacity analysis (ICA) is now a commonly used electrochemical diagnostic tool. Notable work includes that of Dubarry et al. [35] who analysed the degradation modes in a LFP-Gr cell under different modes of degradation and varying electrode loadings. Here they highlight how the different modes of degradation (LLI and LAM) can be inferred from the cells ICA profiles. The authors highlight in this work that, in most cells, there is an excess of anode capacity relative to cathode capacity to accommodate the growth of the SEI and subsequent loss of lithium. However, as the cell ages, if the anode degrades at a more rapid rate than the cathode, with the anode fully lithiating before the cathode delithiating, this can result in increased probability of lithium-plating. Thus, being able to track loss of active material in the anode and cathode is an important control metric in terms of informing charging algorithms.
In addition to this, further insights into physical degradation modes occurring in a cell can be inferred from ICA data. Conventionally, the peaks in the ICA spectra are tracked with this then correlating to a bulk LLI and LAM. However, as cell form factors increase in size, understanding heterogeneous degradation is also important. Sieg et al. [36] for instance, cycled 2 large form factor pouch cells and harvested their electrodes to make 25 coin cells in a 5 × 5 matrix to understand the localised degradation. One cell was operated in a pulsed mode, whereas another was operated with a load cycle indicative of what a BEV would experience. Their results showed much higher localised degradation in the centre of the pouch cell with the pulse loading compared to the BEV cycle. Furthermore, the characteristic width of the ICA peaks in the more heterogeneously degraded cell increased more than the homogeneous case. Through analysis of the spatial mapping of performance with their 25 coin cell measurements, they showed that this was because of the distribution in localised performance.
Beyond the traditional ICA and DVA techniques, the time varying temperature profile of the cell under a constant current load also contains information about the battery state. With this in mind, Wu et al. [37] demonstrated how taking the ratio of the time varying voltage and temperature differentials could be used to track battery SOH. This technique was termed Differential Thermal Voltammetry (DTV) and works primarily due to the phase transitions occurring in the battery electrodes causing an entropic heat generation which can be detected. Typically voltage plateaus represent a phase transition, with a parallel in the temperature signal. In a full cell configuration however, the variation in the cell voltage is typically dominated by the cathode voltage which thus require ICA measurements at relatively low C-rates, resulting in lengthy measurements. By contrast, the entropic heat contribution from the anode and cathode are the same order of magnitude, allowing another means of discerning the anode and cathode processes. Merla et al. [38] . later demonstrated how features in the DTV spectra such as the peak position, width and height could be correlated with the SOH of the cell.
Whilst these differential techniques are useful, for real-world implementation their data processing must also be considered along with the logging rate and resolution limits of on-board sensing technologies. Depending on the resolution of the BMS used to acquire the voltage curves this can have an impact on the quality of the ICA spectra. Towards this end, Feng et al. [39] investigated the impact of multiple smoothing methods which led to under/over fitting of raw voltage and tem-perature data, resulting in ICA curves with additional and non-physical peaks. In order to solve the problem of inconsistent, and in some cases, incorrect fitting, they proposed a method of processing ICA, DVA and DTV data termed the Level Evaluation ANalysis (LEAN) method. Here the voltage/temperature data is binned into windows to produce histogram plots to avoid potential under/overfitting. This work therefore highlights an often neglected step towards using electrochemical diagnostic data towards data driven problems; consistent and robust preprocessing of the raw data.

Emerging techniques
Besides the aforementioned techniques, there are also a range of other approaches which show significant promise but have yet to be fully deployed into on-board vehicle applications. For instance, electrochemical impedance spectroscopy (EIS) is an extensively used technique in lab based studies of battery condition monitoring [40] , however has seldom been used in vehicle applications due to the high cost of the potentiostat required to make the measurements. However, work by Howey et al. [41] showed that EIS measurements on a vehicle could be made through cell excitation driven by the motor controller. Here they demonstrated the applicability of this approach by fitting the EIS spectras of a LFP-Gr and NMC-Gr cells, however they note the challenges with purely correlating the variations of EIS parameters with SOC for LFP cells. In NMC cells there was a stronger correlation with parameters associated with the charge transfer resistance and SEI layer resistance, however they again noted challenges around these measurement being made with no direct current (DC) basis. These points therefore highlight that whilst these approaches are attractive for diagnostics, there are challenges with their generalisation across different cell chemistries and cell types.
Nevertheless, the use of EIS as a diagnostic tool continues to attract interest for a range of state-estimation problems. For instance, Richardson et al. [42] used EIS to estimate the internal temperature of a 26650 LFP-Gr cell, which due to the limited sensitivity of the EIS to SOC made for an ideal candidate chemistry to decouple the SOC-temperature effects. Here their approach of combining impedance measurements at a single frequency (215 Hz) and surface temperature measurements allowed for the estimation of internal core temperatures to with a 3% error under their testing conditions. Common approaches to use EIS data often involve a stage of fitting this to an equivalent circuit, which requires the user defining this circuit with knowledge of the system. Osaka et al. [40] for instance presented and compared a number of different equivalent circuits used for describing battery EIS and showed how anode and cathode processes could be decoupled; helping to increase the identifiability of a system. In approaches where a single lock-on frequency is analysed, this gets around the problem of needing a fitting circuit, however, loses potential information at different frequencies. An approach which is more commonly used in the solid oxide fuel cell field, but is becoming increasingly topical in batteries, is the distribution of relaxation times (DRT) [43] . The advantage of this approach is that processes can be identified without an assumed equivalent circuit, which removes potential subjectivity. However, in order to process the DRT spectra, extremely stable EIS measurements are needed with long relaxation times.
The value of temperature measurements were previously highlighted and these normally use thermocouples or thermistors for their data collection. However, the challenge is that a battery pack often contains thousands of cells, and if the temperature of every one were to be measured this would result in thousands of individual sensors. In order to simplify this, researchers have investigated the use of fibre Bragg gratings, which are optical fibres which reflect particular wavelengths of light but refract others. The advantage of this approach is that with the right preparation, a single optic fibre can measure multiple temperatures and even strains. Furthermore, these Bragg gratings can also be integrated into a cell to infer internal cell temperature [44] . The chal-lenge however, is that the enabling equipment for this measurement is currently quite expensive.
Furthermore, most of these sensors are typically positioned on the surface of cells and thus do not directly measure the internal state of the battery. To this end, researchers have been investigating development of 'smart' cells which have in-built sensors. Fleming et al. [45] for instance demonstrated how thin thermocouples could be embedded into the centre of a cylindrical cell to measure the internal cell temperature. The challenge with this approach however is the cost and complexity of integrating this into cells as well as the chemical stability of the sensor which still needs to be improved upon.
The combination of various sensing techniques towards an intelligent cell was also demonstrated by Amietszajew et al. [46] who used a combination of an in-situ fibre Bragg grating and a lithium reference electrode to investigate limits of fast-charging. Here they showed that charging rates > 6x greater than manufacturer recommended values could be used without the potential of lithium-plating. This thus highlights, that a battery operation strategy driven purely by manufacturers prescribed limits is highly inefficient and the potential benefits of a more intelligent cell.
Beyond these purely current, voltage and temperature data vectors, other promising techniques include the measurement of mechanical states. One such approach presented by Hsieh et al. [47] used acoustic time-of-flight measurements to infer changes in the density of electrodes during lithiation/delithiation. Bommier et al. [48] later extended this approach to demonstrate how this approach could be used to detect lithium plating. Other approaches to measurement of mechanical properties of cells includes pressure and volume change sensing. By measuring this, mechanical changes caused by the lithiation/delithiation of the electrode [49] , have the potential to detect lithium-plating and other degradation modes.

Challenges and opportunities in data and sensing
Manufacturers prescribed limits on cell operation are often overly conservative leading to inefficiencies in operation. Clearly EVs and battery systems of the future will be generating vast amounts of data, which if used intelligently can extend the operating window of a battery without significant increases in degradation. The most common types of time domain data including voltage, temperature and current, will continue to be important. However, there are emerging opportunities to take certain lab based sensing and diagnostic techniques, and implement these on vehicles to fuse different data vectors together to create deeper electrochemical insights and increase the identifiability of these systems. For example, EIS and DRT have been shown to be powerful techniques to decoupling anode and cathode resistances, which is a challenge for HPPC approaches for resistance extraction. The fusion of these resistance extraction techniques with approaches such as ICA for understanding LLI and LAM could be a powerful combination for data driven models.
However, there is a risk that without proper curation of the data in terms of standardised collection methods, naming conventions and suitable data structures to store this with the appropriate meta-data, that subsequent data driven approaches will not be as effective. One notable effort towards this is the battery evaluation and early prediction (BEEP) software package by Herring et al. [50] . Here, the BEEP software provides a framework for inputting and analysing battery testing data in a consistent way, which includes not just the raw data but also the meta-data relating to the cell testing, thus enabling easy application into ML codes. Furthermore, the alignment of different standards and their broader use also needs to be considered. A good resource for this is BatteryStandards.info [51] which contains approximately 400 searchable battery related standards including characterisation tests, ageing tests and safety/abuse tests for cell, module and system level tests in application areas ranging from transport to station-ary. Having these considerations at the forefront of thinking as emerging on-board diagnostic mature will be critical for effective data-driven solutions.

Artificial intelligence: machine learning and data driven approaches
Clearly there are many challenges around the lengthy time needed to parameterise physics based models which, when compounded with the inherent variability of cells, results in reduced prediction accuracy of purely physics based approaches in real world applications. ML allows computers to learn from data and improve its performance without the need for additional manual programming, which make them ideal to adapt to the inherent variability in batteries. Whilst data driven approaches, which use techniques such as ANN, have been in existence for some time, it was not until recently with the advent of increased processing power and the abundance of data which has inspired a resurgence of interest in these data driven approaches in the battery field. Various works in applying techniques such as ANNs, support vector machines (SVM), Gaussian/Bayesian regression, random forest and KF based approaches have been reported for predicting states such as SOC, SOH and RUL which are well summarised by Ng et al. [52] and Li et al. [53] .
A notable application of a data driven used for the RUL estimation in batteries is shown by Severson et al. [54] who demonstrated how a simple linear regression model, when combined with differential voltage data for LFP-Gr cells cycled under fast charging profiles could be used to estimate the RUL with an accuracy of 9.1% using the first 100 cycles. Other authors such as Li et al. [55] also showed how there is a linear correlation between specific peak locations of the ICA spectra and the SOH of a cell. By tracking only specific peaks, partial charge curves could be used to estimate SOH; closing the gap between labs based and real world application. However, care should be exercised when using linear models for RUL estimation since under extreme use conditions, the battery capacity fade can accelerate. To address this problem, Fermin-Cueto et al. [56] showed how the identification of knee-points, where the degradation rate increases, could be achieved by using a combination of a Bacon-Watts model and a SVM.
Whilst these approaches using voltage and capacity data for RUL estimation are extremely promising there is potential to synthesize these simple regression approaches, with deeper electrochemical insights, models which capture more modes of degradation and more diverse data types. Hu et al. [57] for instance, demonstrated a data driven approach for RUL estimation based on the use of 5 metrics associated with battery charging. These included: initial charge voltage, constant current charge capacity, constant voltage charge capacity, final charge voltage and final charge current, which were combined together using a k-nearest neighbours algorithm supplemented with a particle swarm optimisation to determine the feature weights. The use of k-means nearest neighbours in particular highlights the future potential of synthesizing different diagnostic data types together for more accurate RUL estimation as suggested in Fig. 2 .
Furthermore, these RUL estimation methods are especially powerful when then combined with closed-loop optimisation algorithm. Attia et al. [58] for instance used a linear model trained via elastic net regression to predict the final lifetime of LFP-Gr batteries from the first 100 cycles under different fast charging algorithms. The main advance here was the combination of this ML driven RUL estimation with a Bayesian optimisation algorithm and closed-loop optimisation, which significantly reduced the number of experiments needed over a parameter space. This allowed for 244 different charging protocols to be investigated in 16 days compared to 500 days without early lifetime predictions, showing the potential for ML in accelerating the development of smarter battery functionality.
Beyond the use of differential voltage measurements as an indicator for RUL, other authors have suggested the use of EIS spectra as a SOH indicator. Zhang et al. [59] for instance showed that by taking the entire EIS spectrum and combining this with Gaussian process regression and an automatic relevance determination algorithm, that accurate RUL estimations can be achieved. Here the 2 specific frequencies of 17 Hz and 2 Hz were identified, however whilst it was suggested that this is an indicator of changing interfacial properties, it is not clear the physical significance of why these frequencies would be a strong indicator for degradation. However, whilst the potential of EIS measurements in data driven RUL estimations is apparent, the quality of the EIS data is also a critical consideration in real world problems. Towards this end, Liu et al. [60] reformulated some of the core EIS criteria under a Bayesian framework and subsequently create metrics to assess how well the data fits to the constraints of its derivation, namely linearity, time-invariance and causality. Though doing this, more reliable EIS and thus DRT measurements can be enabled, opening the possibility for real world implementation, where signal noise and sub-optimum measurement conditions might be present.

Challenges and opportunities
Whilst interest in data driven approaches for battery applications has increased significantly in recent years, the generation of the parameterisation data still remains one of the main challenges, since this is lengthy and expensive. The solution to this problem could well come from the wealth of data which will emerge from IOT enabled BEVs, however care needs to be exercised with assessing it's quality. Alternatively, other emerging ML techniques such as generative adversarial networks (GANs) could provide a route for generating synthetic data types to augment real data. The potential of this approach was demonstrated by Gayon-Lombardo et al. [61] who trained a GAN with battery and fuel cell microstructural data in order to create a generator capable of creating new microstructures with statistically identical properties. Furthermore, this data gap could also be filled through the application of surrogate models. For instance, Wang et al. [62] showed how a high fidelity multi-physics model of a fuel cell could be used to train a ANN and SVM to mimic the model performance but with significantly reduced computational cost, opening the possibility for more accurate optimisation.
Moving forward, combining new data types, with approaches such as k-mean nearest neighbours, into the existing framework of ML based approaches for RUL estimation is promising with potential examples being the fusion of ICA and EIS data. However, further opportunities exist in combining these with new sensor types such as pressure and acoustic measurements as well as processing existing data in different ways, for example DRT analysis. Beyond RUL estimation, ML techniques have also been finding use for other purposes. Closed-loop optimisation, for instance, when combined with RUL estimations from these emerging techniques could also pave the way for the accelerated development of new fast-charging algorithms, control approaches and thermal management systems. ML approaches could also be used to improve the quality of data types such as EIS, unlocking additional data processing functionalities such as DRT.

Battery digital twins: the fusion of models, data and artificial intelligence
In many applications where an abundance of data is now becoming available due to low-cost sensing and the increased deployment of IOT enabled devices, they have sought to create cyber-physical systems.
Here remote sensing of a physical device over its lifetime of use is combined with cloud-based models which monitor and optimise their use within a network of systems to create a virtual representation of the physical system. This was previously represented in Fig. 1 , with some of the mirrored functionality highlighted. This approach has thus been termed a digital twin, but has also been referred to as a computational mega-model, device shadow, mirrored system, avatar or synchronized virtual prototype [63] , which has seen application in aerospace, product design and increasingly in new fields [64 , 65] . The potential of this approach lays in the fact that there is a close interaction between the physical object, it's digital equivalent and the aggregation of data from agents operating in a diversity of conditions. Individually, these agents might not collect enough data to provide statistical significance for a data driven RUL model, however, when aggregated together this can improve the underpinning ML models which can then be fused with close-loop optimisers to update the vehicle control constantly.
BEVs already have BMSs which log sensor data and do a degree of onboard processing. Within the digital twin framework, researchers have also been exploring using reduced order models and a degree of offline processing to best utilise the powerful P2D framework [66] . In the majority of cases, however, BMS data is stored locally, though researchers are starting to proposed cloud enabled systems to both minimise local computational needs but aggregate large data-sets for improving the performance of ML based algorithms [67] .
Beyond just vehicle level data logging, researchers are also proposing a whole-system approach of tracking key data from material synthesis all the way into vehicle applications. Yang et al. [68] for instance proposed the cyber hierarchy and interactional network (CHAIN) framework which suggests uploading key physical and electrochemical parameters of a cell during manufacturing to a cloud-based server to perform closed-loop optimisation for full lifetime management of battery systems. The structure of the CHAIN framework decomposes a complex system into hierarchical interdependent layers with various functions. This includes: • Multi-scale mapping: Seamlessly integrating digital models along the different lifecycle and length-scale points to free researchers from time-consuming experiments and achieve: increased resource efficiency, shorter development times and enhanced flexibility. • Cyber-physical linking: Here, sensing data is wirelessly and seamlessly uploaded to servers, where they can be easily accessed to create a series of desired models which can be quickly trained, providing guidance for manufacturing, product design and optimisation. The real-time nature of this data-logging enables a rapid upgrading of existing assets as well as rapid insight dissemination into future products. • Multi-stage prognostics and health management: An end-to-end approach capturing states from materials, cell formation, automotive use, 2nd life applications and ultimately closed-loop cycle back to raw materials can be realized by applying a variety of estimation methods from cloud based servers. Data from each stage of the cycle can be uploaded and shared in the cloud servers for data analysis to provide cross stage insights. For example, the impact on 1st life on 2nd life performance as well as into the future of assessing the quality of different recycled battery materials. • "End-Edge-Cloud " multi-layer collaborative layout: The future smart system will also be multi-scale in nature, with processing occurring in on-board end applications, edge computing nodes and cloud computing servers. Optimising data flows across this network and ensuring proper data security will be emerging challenges as these types of framework are implemented.
As a multi-disciplinary physical system, battery digital twins play a transformative role in multi-scale design and intelligent management system of battery systems. The proposed complex physical battery digital system can be continuously updated using knowledge generated from data of both known and unknown physics. Furthermore, some key computing and networking technologies for Cloud computing, such as virtualisation and service-orientated architectures, will bring the promise of better digital twins and their impact on the energy area closer to reality.
Whilst the potential benefits of the improved traceability leading towards more accurate performance modelling was highlighted, a number of challenges were also noted. In addition to Yang et al. [68] , Rasheed et al. [63] provided a perspective on some of the challenges for digi- Fig. 3. Evolution of approaches for battery modelling and the potential eco-system for battery digital twin data aggregation. tal twins. Collectively these include: the need for multi-physics models, the need for nano/micro scale characterisation, low latency future communication networks, the importance of effective data pre-processing leading to computationally efficient algorithms and ultimately increased data security towards mass uptake of these approaches.
Thus, over the years, battery control and lifetime estimation has evolved from a largely empirical approach, towards more model driven techniques. With the advent of increased computational processing power, data driven and ML approaches have seen a resurgence, however challenges in true applicability in real world applications remain. This progression is highlighted in Fig. 3 and alludes to a proposed combined hybrid model/data approach that leverages real-time data collection from IOT enabled system towards a battery digital twin.

Conclusions
Batteries will clearly be an essential technology in our low-carbon future, with key applications such as electric vehicles and grid scale energy storage. Maximising the lifetime and effective use of these devices remains a major challenge. However, with the recent advances in understanding battery performance/lifetime, the diversity of diagnostic techniques and the advent of ML approaches, there is clearly an opportunity for more intelligent control of battery systems. These control systems have evolved from largely empirical relationships towards today's physics based modelling approaches. However, whilst physics based models have many advantages, such as estimation of anode potentials for fast charging algorithms, challenges with their real-world implementation are becoming apparent. This includes the large number of parameters in the model, many of which require disassembly of the cell, and the inherent cell-to-cell variation which gets amplified as the battery degrades.
Diagnostic techniques have undoubtedly helped improve our current understanding of how batteries perform, with many scientific works leveraging off a combination of spectroscopic, physical and electrochemical methods. However, for real-world and in-operando implementation of diagnostic techniques, the focus is shifting towards low-cost on-board methods which currently leverage voltage, current and tem-perature measurements. These can be analysed in a number of ways to gain electrochemical insights into the system. Beyond these metrics, emerging data types such as stress and strain also have the potential to increase the functionality of on-board diagnostics. This is in addition to lowering the cost of techniques such as EIS and thus DRT analysis. Yet, these developments need to be done with due consideration of welldesigned data logging and curation to allow for the effective fusion of different data types. Therefore, with this wealth of data and uncertainty in predicting real-world performance of batteries, ML based approaches have received renewed attention. Many of these approaches have shown significant promise in terms of correlating key features in the charge/discharge profile of batteries with their capacity loss under well controlled lab conditions. However, data generation remains a challenge in the more mainstream adoption of these approaches. This has therefore motivated the development of hybrid and surrogate models, which leverage off the predictive powers of multi-physics and multi-scale models which are traditionally very computationally expensive. The fusion of these models with ANNs therefore offers a route to faster model predictions which retain high order physics, thus enabling closed-loop optimisation of key functionality such as fast charging algorithms. As this diversity of diagnostics continues to mature, the fusion of these data types, with approaches such as k-mean nearest neighbours and SVMs, will likely increase accuracy and open up new application areas.
These 3 core elements of models, data and ML tools, therefore, provide the foundations of battery digital twins, where there is a close interaction between the physical entity and its digital equivalent. Here, the aggregation of multiple data sets, real-time monitoring of key states and the fusion of this with hybrid models therefore unlocks the potential for longer life battery systems. However, with this new emerging field comes multi-disciplinary challenges around the curation of the data, how its shared and the security of these systems.
These increasingly digital and connected tools at the disposal of battery scientist and engineers are therefore a powerful enabler for the future of battery systems opening new ways of operating. Whilst this perspectives paper presents some of the current thinking about elements of this integrated digital landscape, much is still to be done, and the role of the battery community coming together will be essential. This is helped by the various online efforts to build community, where platforms such as online webinars, Twitter and Slack to share ideas and solve problems are key enablers [69] .
In summary, battery digital twins have a huge role to play in the future development of battery technologies in a diversity of applications, yet there are still significant challenges, and thus opportunities, to be solved. Some of these are itemised below: • Wider use of standardised and transparent testing/data processing procedures across academia and industry for parameterisation and diagnostics. • Standardised and transferrable approaches for data storage and database management. • Multi-scale physics models capturing nano-scale effects on macroscopic metrics towards the development of surrogate models to aid with high fidelity digital data generation for training of ML models. • The development of hybrid models, fusing physics and data driven models towards real world implementation and increased accuracy. • Combination of deeper electrochemical insights and new sensing approaches with data/hybrid model based approaches for lifetime estimation. • Efficient ML algorithms, data pre-processing and selective storage of data for effective curation. • Low latency cyber-physical systems, enabling real-time adaptive control.