Quantitative assessment of machine-learning segmentation of battery electrode materials for active material quantification

, where 10 – 25% of each slice was trained. This approach is applied to lab-based X-ray CT data and compared with data obtained by focused-ion beam/scanning electron microscopy slice-and-view tomography. Variation in active material volume fraction between users is lower for at least one of these two approaches (10% or 25%) when applied to raw LIB cathode tomograms, versus unsupervised techniques such as simple and watershed segmentations. On average, the absolute volume fraction values are closer to that acquired by the correlated technique, most closely matching for high-resolution data. The present analysis provides an optimised approach for using open-source software to apply machine-learning segmentation when quantifying active material volume fractions in cutting-edge LIB electrodes, providing a more robust route to active material quantification.


Introduction
Lithium-ion batteries (LIB) play an increasingly vital role in our everyday lives, from personal electronics to portable power, but they also feature highly in our global transition towards net-zero emissions, in the face of climate change, energy security concerns, and rising local air pollution [1].LIBs are a mature technology that will be even more widely applied as an energy storage solution as more intermittent renewable energy generation is implemented, alongside increasing demand from electric vehicle uptake [2].Understanding how their microstructure and material properties influence performance and durability, in terms of thermal, electrochemical, and mechanical aspects, is vital in the pursuit of next-generation batteries that remain safe and affordable but have greater energy and power densities [3].
To inspect the internal microstructure of lithium-ion battery components, such as electrodes and separators, it is important to use techniques that provide reliable three-dimensional information that do not depend on stochastic approximations or stereological inferences.This is particularly the case when it comes to transport properties and percolation of the constituent phases.Although there have been valuable insights drawn from early two-dimensional work on the solid electrolyte interphase [4], intercalation [5], and estimated volume fractions [6,7], more recent analysis has demonstrated that stereological techniques are likely to be associated with ambiguity and significant error [8].Therefore, to accurately capture the three-dimensional microstructure of LIB electrodes, tomographic techniques such as transmission electron microscopy (TEM) tomography [9,10], focused-ion beam/scanning electron microscopy (FIB-SEM) slice-and-view tomography [11], and X-ray computed tomography (CT) [12], are now increasingly implemented.
Since TEM tomography has a very limited field-of-view, many studies have combined the high-resolution power of scanning electron probes with ion-milling, by use of FIB-SEM slice-and-view tomography.Wilson et al. first captured the three-dimensional microstructure of a LiCoO 2 positive LIB electrode showing evidence of internal particle cracking from the cell formation step [13].Nonetheless, the analysis was restricted to two phases and only centred on three particles.Further FIB-SEM studies have since elucidated the three-phase microstructure of different chemistries, such as LiCoO 2 [14], LiFePO 4 [15], and LiNi 1/3 Mn 1/3 Co 1/3 (NMC) [16], and have provided input for simulations [17] or hierarchical approaches combined with X-ray CT [18].One method of note is the use of alternative impregnation materials to enhance the contrast between the active material (AM), carbon binder domain (CBD) and pore space [15,19].Despite providing insight into connectivity, tortuosity factor, and providing access to fine microstructural features, the inherently destructive nature of FIB-SEM slice-and-view has meant that initial studies looked only at one microstructure, and even the more recent "3.5D" studies [16] suffer from inescapable sample-to-sample variations.
Most data acquired by X-ray CT must undergo a segmentation step, to convert grayscale images into phase-defined objects that are subsequently analysed.The most common type is semantic segmentation, whereby each voxel (3D pixel) of an image is associated with a class label.Often an interim processing step is employed to reduce noise and improve the likelihood of reliable phase distinction, but both the processing and segmentation steps vary widely from study to study, and sometimes go unreported.A significant number of studies employ segmentation by "unsupervised" techniques, such as simple thresholding or watershed segmentation.Simple thresholding involves the selection of a single (binary) or multiple (ternary, quaternary etc.) thresholds, such that voxels that lie below or above are allocated to a particular phase.The next level of unsupervised complexity is to rely on the grayscale gradient, or the difference between adjacent voxels, which often takes the form of a watershed segmentation [55,56], wherein voxels that can be reliably allocated to certain phases become "seeds", which are "grown" to gradient boundaries following the watershed algorithm.However, these methods are often not powerful enough to capture the microstructure to a satisfactory level of accuracy from the acquired tomogram; both methods are open to systematic errors deriving from thresholds that are defined globally (for the entire process in the case of simple thresholding, or for seed selection in the case of watershed segmentation).This is particularly true in the case of low signal-to-noise ratio (SNR), whereby the distinction between the features of interest and the background is less pronounced, either as a result of unoptimised imaging parameters or dynamic scanning with reduced scan times.Although shading corrections [57] may help, often more local information is needed to correctly identify features poorly segmented by only voxel intensity or nearest neighbour gradients.Consequently, researchers have looked to machine-learning (ML) [58] to perform semantic segmentation [59] that is often based on convolutional neural networks [60] or random forest techniques [61] that learn from user training inputs on selected image slices (so-called "supervised" learning).In particular, the biomedical field has accelerated the use of ML segmentation for a wide variety of applications [62], from brain structure scans [63,64] to cell image analysis [65].There are recent examples in the geological field comparing unsupervised and supervised segmentation approaches on X-ray CT images of concrete [66], cements [67], various rocks [68,69], and the mineral phases therein [70].In both fields, results show that these supervised methods, although more computationally demanding, often give rise to more visually convincing segmentations, even with minimal user input.Although significant progress has been made in both materials discovery and design by ML techniques [71], there is currently limited literature on semantic segmentation for materials science applications.While ML has aided the analysis of some four-dimensional imaging experiments in recent years [72][73][74], the potential for its widespread use in the field has not yet been fully realised.With regards to batteries, in particular, research interest in ML is nonetheless growing across a wide range of use cases [75].Li-ion diffusion mechanisms in solid-state batteries have been explored by data mining of molecular dynamics simulation results [76] and low-strain cathode materials for LIBs have been screened by establishing a quantitative structure-activity relationship for volume changes based on data from ab-initio calculations [77].
With regards to image-based classification tasks, recent work has included automatic crack detection [78,79], analysis of the particle-carbon binder detachment [80], material identification in all-solid-state batteries [81] and grain boundary enhancement [82].There are now very recent examples of semantic segmentation, both of pores in Li-metal batteries [83] and of graphite-silicon LIB anodes [84].To date, however, there is no dedicated analysis on the application of ML techniques for the semantic segmentation of traditional LIB electrode tomograms and direct comparison with unsupervised approaches, which is the focus of this work.
First, an approach to ML-based semantic segmentation is developed and applied to open-source data previously collected by the authors [85,86].This optimised approach is subsequently applied to acquired lab-based X-ray CT tomograms of a fabricated electrode, which is also subjected to focused-ion beam/scanning electron microscopy (FIB-SEM) slice-and-view tomography for correlative analysis.To examine variability attributable to human interation with the data, and therefore to assess the impact of segmentation approach on producing binarised datasets for extracting the important microstructure metric of volume fraction in LIB electrodes, three individual users each applied traditional segmentation and variants of the developed ML approach.To the authors' knowledge, this is the first in-depth, statistical comparison of ML segmentation versus more traditional techniques for distinguishing active from non-active materials in LIB electrode microstructures.

Data sources
Four open-source battery electrode microstructures from NREL's Battery Microstructures Library [87] were subjected to a processing, segmentation, and analysis pipeline, which was used to develop an understanding of each of these stages on extracted metrics.The sample set consisted of two calendered Li-ion NMC cathodes (Toda NMC532: 1-CAL and 2-CAL) and two calendered Li-ion graphite anodes (Conoco Phillips A12 Graphite: 5-CAL and 6-CAL).The cathode microstructures were acquired at a nominal isotropic voxel dimension of 397 nm and the anode microstructures were acquired at a nominal isotropic voxel dimension of 126 nm.A subvolume consisting of 200 × 200 × 200 voxels was extracted from each dataset to minimise computational load, giving a total analysed volume of ca.500,000 μm 3 and 16,100 μm 3 for the cathodes and anodes, respectively.Three users independently carried out the segmentation procedures to assess the degree of error due to human subjectivity.The application of ML semantic segmentation was the focus of this work; however, simple and watershed thresholding were also carried out on all volumes, acting as standard benchmark procedures with which to compare the results derived from ML segmentation.Various parameters, such as level of image coverage and amount of training data were explored for the data, yielding specific subroutines identified as giving more satisfactory segmentation results.
This led to a more refined methodology, which used the most promising subset of the previous procedures to segment new X-ray CT data acquired by the authors.This sample set consisted of tomograms acquired by scanning an NMC622 cathode at two different resolutionswith a nominal isotropic voxel dimension of 371 nm on a Zeiss Xradia 520 Versa instrument and with a nominal isotropic voxel dimension of 126 nm on a Zeiss Xradia 810 Ultra instrument.The cathodes were fabricated by slurry-casting following the procedures as reported in previous studies [88][89][90].The cathode slurry consisted of 96 wt% LiNi 0.6 Mn 0.2 Co 0.2 O 2 (NMC622, BASF), 2 wt% PVDF (Solvay) and 2 wt% C 65 (Imerys).A THINKY mixer (ARE-20, Intertronics) was used to mix the cathode binder solution, NMC622, and C 65 to form a slurry with a solid content of ~60 wt%.The homogeneous slurry was degassed in the THINKY mixer at a speed of 2000 rpm for 2 min before being coated onto a piece of aluminium foil with thickness ~16 μm (PI-KEM) using a doctor blade thin-film applicator (calibrated with a metal shim).The slurry-cast coatings were subsequently dried on a pre-heated hotplate (Nickel-Electro Clifton HP1-2D) at 60 • C. X-ray CT sample preparation was carried out using laser-micromachining, as described in a previous publication [33].A summary of the datasets examined in this work is shown in Table 1.

Sample preparation
For the new sample set investigated by X-ray CT in this work, disks of ca.750 μm in diameter were cut from an electrode sheet using an A Series/Compact Laser Micro-machining System (Oxford Lasers, Oxford, UK), and then glued on steel dowels, as described previously [33].After micro-scale imaging, these pillars were milled to ca. 60-100 μm disks for nano-scale imaging with the same laser micro-machining tool.
For the sample investigated by FIB-SEM, the electrode was mounted in cross-section using a metal clip and epoxy-impregnated (EpoFix, Struers, UK) under vacuum.The sample was left to cure overnight in a desiccator before grinding (SiC of progressively finer grade) and polishing (3, 1, and 0.5 μm diamond paste).The top surface was Au-coated using a SC7620 Mini Sputter Coater/Glow Discharge System (Quorum Technologies, UK) to reduce charging.

Image acquisition
Micro-scale CT imaging was performed using a Zeiss Xradia Versa X-ray micro-CT instrument (Carl Zeiss, CA, USA) with an accelerating tube voltage of 80 kVp.The machine utilises a stationary tungsten anode on a copper substrate, producing a polychromatic beam with a characteristic emission peak at 58 keV (W-K α ).An exposure time of s was used, for all 601 projections, with a 20 × magnification lens.Reconstruction of the data was carried out via Zeiss Scout-and-Scan Reconstructor (Carl Zeiss, CA, USA), utilizing cone-beam filtered backprojection algorithms, resulting in a nominal isotropic voxel dimension of ca.371 nm.
Nano-scale CT was performed using a Zeiss Xradia 810 Ultra X-ray nano-CT instrument (Carl Zeiss, CA, USA) utilizing a quasimonochromatic beam and a Cr characteristic emission energy of 5.4 keV and a 64 μm × 64 μm field-of-view.An exposure time of 47 s was used for all 1601 projections, with a camera binning of 2. Post-imaging, each projection was reference and centre-shift corrected.Reconstruction was carried out using parallel-beam, filtered back-projection algorithms within Zeiss Scout-and-Scan Reconstructor, resulting in a nominal isotropic voxel dimension of ca.126 nm.
For FIB-SEM slice-and-view tomography, the Au-coated epoxy puck was loaded into a JIB-4700F MultiBeam FIB-SEM instrument (JEOL Ltd., Japan) at the Research Complex at Harwell.FIB milling was performed with a beam current of 10 nA, SEM imaging was conducted at an accelerating voltage of 15 kV, giving 338 slices with a nominal thickness of 256 nm and x-y voxel dimensions of ca.55 nm.

Image processing
Each of the four open-source sample datasets was filtered using a 3D Gaussian filter with a kernel size factor of 2 and standard deviation of 1.1 voxels, creating eight datasets (unfiltered and filtered images) to which the various segmentation approaches were applied.Each of the two newly acquired datasets underwent the same filtering step (yielding four datasets in total).The SEM micrographs acquired using FIB-SEM were aligned, shear-corrected, cropped, and 'decurtained' in GeoDict (Math2Market GmbH, Germany) before watershed segmentation in Avizo (Thermo Fisher Scientific, U.S.).The rectangular voxels were resampled, with interpolation, and cropped to give a volume composed of 568 × 456 × 1180 voxels with a nominal isotropic dimension of ca.55 nm.

Segmentation approaches
Standard unsupervised segmentation approaches include simple thresholdingwhereby either a user selects one or more grayscale thresholds, or an automated threshold selection is made by Otsu's algorithm [91] and watershed segmentation.The latter morphological segmentation approach relies on a flooding algorithm of "catchment basins" defined by thresholding the gradient transform of the acquired tomogram [92].The supervised segmentation procedures in this work were carried out in Ilastik [93] (University of Heidelberg) using a random forest classifier in the learning step, whereby voxel neighbourhoods are characterised by a set of non-linear features in 3D.Details of subroutines within the ML segmentation procedure were explored as detailed in Section 2.5.3.

Simple threshold segmentation
Simple thresholding was carried out manually by assigning a discrete range of grayscale values to a label (corresponding to a desired phase) using Avizo.The balance of grayscale values was then applied to the other phase.The choice of this threshold was left to the individual user, not an algorithm.Segmentations were made at the user-preferred threshold (S2) as well as at 10 grayscale values below (S1) and 10 grayscales above (S3), to explore the impact of a systematic error by eye.

Watershed segmentation
The watershed segmentation methodology was carried out in Avizo, and to apply the algorithm, seeding of the two phases (active and nonactive material) was required.The seeds were generated by the allocation of labels to each phase using thresholding.However, a region in the tomographic histogram was omitted and left for assignment by the watershed algorithm.The internal variability of this method was assessed by altering the extent of the omitted region when thresholding.Therefore, two watershed segmentations were carried out, one containing a small window of omitted grayscale values (generous watershed, gWS), giving the watershed algorithm very few voxels to segment.The second segmentation was given a larger region of omitted grayscale values (conservative watershed, cWS), wherein the watershed algorithm is given a larger window in which to predict the labelling of voxels.The sizes of the windows in both were at the discretion of the user and two approaches were taken to reduce the likelihood that a systematic underor over-seeding would mask the efficacy of the watershed approach.

Machine-learning segmentation
ML segmentation was carried out using open-source software, Ilastik [93].To use the ML-based segmentation algorithm, a training dataset is required which establishes a starting point for the algorithm and is based on a priori knowledge of the microstructure.It should be noted that, compared with ML techniques used, for example, for lattice constant prediction, the input data is restricted to a dimensionality of four -three spatial co-ordinates (x,y, and z) and the individual grayscale values for each voxelthus reducing the computational complexity of model generation.For providing training data, the user has two choices: (a) use thresholded seeds to train the algorithm, or (b) input phases manually by 'drawing in' one or multiple slices of the tomogram.The ML segmentation procedures were first carried out using method (a) but subsequent analysis using method (b) almost always led to reduced variation in extracted phase fraction across users.During preliminary segmentations, the location and degree of coverage appeared to have a marked effect on the resulting segmentations, and it was also discovered that a central, rather than the first or last slice, should be favoured in the training datasets, to avoid edge effects.Hereafter, only segmentations produced via method (b) are discussed, albeit with important differences between subroutines.It is also worth noting that a balance between user time and accuracy was pursued, such that larger training dataset samples for ML model generation may give rise to improved reproducibility but at a cost of greater user time.
In this second method, all seeds were manually allocated within Ilastik, based on users' inferences from selected 2D slices, with location and extent of seeding as primary foci.Another notable parameter was the use or not of 'iterative interaction' with pre-segmented data.Once a training dataset is established, the ML algorithm can be initiated, and a segmented image produced.However, the ML algorithm can be further trained by interaction with the initial output overlaid on raw data, improving the predicted segmentation through learnings from previous attempts.This approach was found to reduce variability in almost all cases.This interim interactivity allows for model improvement by giving the user the ability to correct for glaring segmentation errors, before applying the generated model, for example, to further datasets.Therefore, in each subroutine, the middle slice was trained, and a segmented image was produced.Further interaction was carried out on six more slices (equally spaced either side of the central slice), a balance between increasing segmentation fidelity and minimising laborious user input.The workflow for this procedure is shown in Fig. S1.Preliminary work showed a small reduction in variance between three and seven slices of user input, and although further reduction may lie beyond seven slices, this was deemed too time-consuming; analysis of further interaction is beyond the scope of this work.The level of coverage was varied, whereby 10%, 25%, 35%, 50%, and 65% of each of the seven slices was applied to unfiltered NMC-2 and unfiltered GRA-2 datasets to ascertain which level minimised user variance and where the best compromise between variance and user effort lay.A summary of approaches is given in Table 2, where 'MLX##' represents the 'iterative interaction' approach with ##% slice coverage in the training step.

Volume fraction analysis
Particle volume fraction is typically of significant interest when considering battery electrode materials as it tends to dictate the total amount of lithium that may be stored in an electrode.Herein, we refer to this phase as the active material and place emphasis on its volume fraction.Volume fractions were extracted by counting the voxels in each labelled phase, normalised relative to the total number of voxels in each image and was carried out in Avizo.Two complementary methods were used to assess the user variability of each segmentation approach: variation analysis in bulk volume fraction (standard deviation); and by comparing pairs of images from different users (image subtraction).

Standard deviation
Each of the three users (n = 3) produced a value for the active material volume fraction that was used to calculate a standard deviation, S x , using Equation (1).where x i are the individual values and x is the arithmetic mean.This bulk value of standard deviation served as a proxy for segmentation accuracy in the absence of a base truth, acknowledging that significant variation between users deems a method unreliable but little variation does not necessarily imply a high-fidelity segmentation, just a reproducible one.

Image subtraction
To compare the difference in segmentation between users, a MAT-LAB script was developed.In this script, the full TIFF stack of the segmented volume for user A is subtracted from that of user B. The voxels in the segmented stack either have a value of 1 (active material) or 2 (pore and CBD).After subtraction, a new stack is created, containing the following values: − 1 (active material for user A and pore for user B), 0 (same phase assigned) or 1 (pore for user A and active material for user B).The absolute values in the new stack are found, which gives the number of voxels where the two stacks differ.Summing this value, and then dividing by the total number of voxels available, gives the percentage of voxels where the two users have differing segmentations.This is repeated for A -C and B-C to get a comparison for the three users and the differences between the three pairs is then averaged.Fig. 1 shows example slices from segmented data for the open-source data and acquired data using the MLX25 approach, as well as associated subtraction images.

Open-source data
Despite a lack of access to the samples themselves, and therefore no opportunity for FIB-SEM tomographic investigation, the open-source data from NREL's Battery Microstructures Library [87] provided ample opportunity to compare unsupervised approaches (simple and watershed) with various ML subroutines (see Table 2) on data produced on lab-based X-ray CT instruments, some with significant levels of noise.Thus, the results in this section are for internal comparison, aiming to identify the route to the least variation across users as an imperfect proxy for segmentation accuracy.

Simple thresholding and watershed segmentation
A single central xy-orthoslice from the open-source data subvolumes and their filtered counterparts are shown in Fig. 2 a) -h), which displays the different morphologies inherent to the materials used in typical LIBs (metal oxide and graphite) and highlights the varying contrast between the active and non-active materials in each of the samples (see Fig. S1 for histograms).Moreover, it illustrates that the level of noise in each sample differs, likely as a function of the sample thickness, as well as inhomogeneities inherent to lab-based X-ray sources.From the histograms in Fig. S1, it is clear that the application of the Gaussian filter enhances the distinction between two peaks (corresponding to active and inactive material) in each case, but that the improvement in this distinction is not equal across all cases.In fact, the raw NMC-1 and GRA-1 do not have clearly distinguishable peaks in their histograms, consistent with lower SNRs than those for NMC-2 and GRA-2.The visually noisier data for NMC-1 and GRA-1, shown in Fig. 2 a) and c), are most improved by the application of the filter (Fig. 2 e) and g) and Fig. S1).
Simple thresholding (S1, S2, and S3) and two watershed segmentation approaches (gWS and cWS) were applied to all eight datasets by three users; segmented central xy-orthoslices for NMC-1 and GRA-1 by a simple thresholding (S2) and a watershed segmentation (cWS) approach are shown in Fig. 2 i)-p).These images highlight the importance of applying a denoising filter to noisy raw data when attempting to segment with simple thresholding; high-grayscale noise from the background is otherwise erroneously segmented as small NMC particles.To account for this, these segmented 'speckles' can be removed by reviewing particle size distribution histograms after segmentation, and selecting an appropriate cut-off size, but this introduces additional human-error and additional processing time.S1 and S3 datasets showed similar variation to S2; only S2 is discussed hereafter.The impact of noise is less evident for the watershed segmentation datasets (Fig. 2 m)p)) where conservative seed selection reduces the over-segmentation of high-grayscale noise in the first place.
There was significant variation between users when applying a single threshold value to segment any of the four unfiltered datasets, which is quantified as the standard deviation across the three users, as shown in Fig. 3.The individual data can be seen in Fig. S2 and Fig. S3.For the noisier NMC-1 case (Fig. 2 a)), the application of the filter reduces user variability for all segmentation approaches, but this is not the case for NMC-2, which is less noisy to begin with.This is replicated in GRA-2 but not GRA-1, illustrating that filtering data does not guarantee a segmentation less biased by the user.In fact, this highlights that user variability plays a significant role in determining the resultant volume fraction when applying either of these unsupervised approaches to data with low SNR.
It should be noted that it is feasible that low variability between users may still provide a reproducibly erroneous segmentation, such that close inspection of the segmented volume and comparison with the processed volume must also be factored in.Comparison with a "base truth" and the balance between reproducibility and accuracy will be explored further in Section 3.1.2.

Machine-learning segmentation
With the aim of reducing user variability in extracted volume fraction, a ML approach to segmentation was refined using open-source software, Ilastik.The first generation of ML was based on thresholded seeds as training inputs (method (a) described in Section 2.3.3) and was applied to all samples by three users, and then compared to manual training on selected samples (method (b) in Section 2.3.3).The results from method (a) are summarised in terms of standard deviation in Fig. S4, and given explicitly in Fig. S5.Regardless of whether applied to a single slice, multiple slices, or the full volume, thresholded seeds only yielded a decrease in user variation for half of the samples examined.As a result, method (b), using user-defined seeds, is followed hereafter.
A similar approach to ML1 was carried out, only seeds were input by the user manually "painting" a single slice; the central slice was chosen to minimise edge effects.It was not time-effective for the user to interact with every slice manually.However, since it was observed that the user variation was lower for ML2 (multiple slices) than ML1 (one slice only), a multi-slice approach was adopted.Balancing improved accuracy and the time required by the user, a total of seven slices was chosen, six of which were equally spread across the volume in the z-direction (e.g., for a 200-slice stack, at slices 25,50,75,100,125,150,175). It is important to note that variation between users may be reduced further by provision of a larger sample for model generation (i.e., annotations on a greater number of slices), but there is a compromise to be made between accuracy and user time, given the manual nature of the training and the potential need to follow this approach for any new dataset of different acquisition parameters.Iterative training was used to improve segmentation fidelity -an initial segmentation was output after "painting" the central slice (100), whereafter the other six slices were further refined based on discrepancies between the raw data and initial output.A total of five different levels of coverage were applied, by three users, on two samples, giving rise to 30 datasets in this second-generation ML approach.Each user added training annotations to the central slice (and subsequently to the other six slices), covering 10%, 25%, 35%, 50%, and 65% of the total pixel count of each slice.These approaches were applied to unfiltered and filtered versions of both NMC-2 and GRA-2, as a balance between user-interaction time and understanding how these supervised approaches work on different electrode microstructures.Fig. 4 a) shows the standard deviations for active material volume fraction for all five coverage values within this ML approach for all four samples.It was found that the variation was least, on average, for coverage values of 10% and 25%, with the variation across this metric for the three users shown in more detail in Fig. S6.
The standard deviation in active material volume fraction across users was <0.06 in all cases, whereas this value was as high as 0.13 in the unsupervised analogues.There was no clear monotonic trend towards lesser variation as the amount of coverage is increased, suggesting that the quality of the user-training input is more important than its quantity.For this metric, the coverage that gave the single lowest standard deviation across all four samples was 10% for unfiltered NMC-2, the same coverage that gave the second-lowest variation for unfiltered GRA-2, after 35%.In the filtered cases, the trend was different, with the highest coverage (65%) giving a slightly lower standard deviation than the 10% coverage subroutine.This implies a likely interdependency between processing (filtering) and optimum user-training coverage.These results suggest that less user-variation may be achieved by applying lower coverage user-training to unfiltered datasets with lower SNR, thus reducing user workload whilst optimising the segmented output, at least in terms of phase fraction for NMC cathodes and graphite anodes imaged by lab-based XCT.The implication is that ML  segmentation may be most effective when applied to noisy data, which may result from unoptimised imaging parameters, relatively large sample sizes, or from using short scanning times to capture dynamic processes or increase sample throughput.
Fig. 4 b) shows the average variation between users (mean of three image subtraction calculations) in terms of segmented voxels.Similar to the bulk standard deviation metric, the single lowest percentage difference across all four samples was 10% coverage for unfiltered NMC-2, with 3.5% voxels differing between users on average.However, unlike the bulk metric, where 65% coverage led to much greater variation, the average percentage difference in voxels was approximately the same (also 3.5%).Conversely, 50% coverage, although demonstrating only a marginal increased standard deviation for phase fraction, presented a significantly higher percentage difference in voxels (10.1%), demonstrating that these two metrics are required for robust analysis of segmentation variaiblity.It is worth noting that four out of five ML approaches gave rise to lower percentage differences than unsupervised approaches (simple, 7.6% and watershed, 4.6%).It is thought that at higher coverages, there is greater opportunity for erroneous classification in the training step, but the low variability for 65% coverages suggest that this is not necessarily the case.The trend in percentage difference for the filtered NMC-2 case mimics that of standard deviation and the absolute values are similarly higher than the unfiltered case, also supporting the hypothesis that this ML approach is most effective on unfiltered data with low SNR, suggesting this ML procedure facilitates segmentation of high throughput, short-acquisition-time scanning.
For the unfiltered and filtered GRA-2 samples, all ML approaches yielded percentage difference values lesser than those resulting from unsupervised approaches (see Table S1).Here, the trend was not the same in terms of coverage when compared with NMC-2 samples, with 35% coverage giving the lowest percentage difference (3.8%), suggesting that each microstructure may have its own optimal coverage.It should be noted that although the lowest percentage difference was found for 35% and 65% coverage for the unfiltered and filtered GRA-2 samples, respectively, mirroring the standard deviation results, there was greater variation in voxel segmentation in the lower coverage approaches than was indicated by relatively low standard deviation values.
In summary, the ML segmentations with thresholded seeds did not show a net improvement over the results obtained from following the unsupervised segmentation approaches.However, using manually applied seeds, the ML approaches on multiple slices on average yielded lower standard deviations and percentage differences in terms of voxels than the simple thresholding or watershed segmentations.Overall, low coverages for 10-25% gave the most promising results, in particular when applied to unfiltered datasets with lower signal-to-noise, such as the unfiltered NMC-2, as shown in the comparison plots in Fig. 5, combining less user interaction and mostly lower variability.It should be noted that since learning datasets have comprised several slices from, in each case, one X-ray CT tomogram, the resultant models may only be expected to perform well on related tomograms insofar as they are acquired on the same instrument with similar acquisition parameters.An extended model generated by exposure to various datasets acquired with different parameters on different instruments is beyond the scope of this work.
In the next section, these approaches (MLX10 and MLX25) are applied to in-house data, for which a "base truth" dataset was also collected by FIB-SEM slice-and-view tomography.

Establishing a baseline using FIB-SEM
To establish a baseline against which to compare extracted volume fractions via different segmentation approaches applied to the tomograms of newly acquired datasets, higher-resolution FIB-SEM tomography was performed.The large difference in secondary electron yield for the active (NMC622 particles) and non-active constituents (CBD or pore) gave rise to a high level of contrast, facilitating ready segmentation by a watershed segmentation approach.Fig. S7 a) shows a top-down scanning electron micrograph, displaying the current collector and the NMC622 particles, Fig. S7 b) displays the milled U-shaped trench and both the top and face of the volume of interest and Fig. S7c) shows an ion-generated image, that also gives contrast between the CBD and pore regions in the inactive areas, highlighted red and blue, respectively.A representative image from the slice-and-view tomography after processing is given in Fig. S7 d).
A volume rendering of the watershed segmentation of the FIB-SEM slice-and-view tomography of the NMC622 samples is shown in Fig. S8.For the FIB-SEM data, the voxels were anisotropic (55 nm × 55 nm × 256 nm), whereas the voxels for the acquired X-ray CT data were isotropic (126 × 126 × 126 nm).Nonetheless, the overall volume per voxel was approximately half in the FIB-SEM case and a higher resolution of the active particle surfaces was achieved in the x-y plane.Due to the higher resolution and improved contrast, internal pores were detected more often with the FIB-SEM technique than with X-ray CT.For fairer comparison between the datasets, the internal pores are virtually filled in the FIB-SEM case and the voxel size was resampled to 126 × 126 × 126 nm.
The volume fraction of active material (NMC622) was 50.8%, which can be used as a benchmark figure for comparison with the values extracted with the simple, watershed, and refined ML segmentation approaches applied to the acquired lab-based X-ray CT datasets.Although the NMC material in the open-source data was NMC-532, this discrepancy is not thought to cause any significant impact on the application of the refined ML segmentation approaches as the attenuation of particles constituting 5:3:2 Ni:Mn:Co is very similar to that for a ratio of 6:2:2, and the morphology is consistent.

Simple thresholding and watershed segmentation
A single central xy-orthoslice from the newly acquired data subvolumes, their filtered and segmented counterparts are shown in Fig. 6, both for simple thresholding (S2) and watershed segmentation (cWS).On comparing NEX-V (voxel dimension = 371 nm) with NMC-1 (voxel dimension = 397 nm), the data quality clearly improved, potentially due to smaller sample dimensions or optimised imaging parameters, resulting in a higher SNR tomogram.Nonetheless, the same phenomena can be seen whereby there are instances of high-grayscale background noise erroneously segmented as small NMC particles (Fig. 6 e)), which is eliminated by the application of the Gaussian filter (Fig. 6 f)).
As shown in Fig. 7, the variation across users is reduced in these newly acquired datasets versus the open-source datasets when using the unsupervised methods of S2 and cWS.The raw tomograms of NEX-V and NEX-U are less noisy than all of the raw open-source tomograms, though the application of the Gaussian filter still reduces user variability for the S2 approach.However, for cWS, user variability increases in the lowerresolution case (NEX-V), but is reduced in the higher-resolution case (NEX-U), implying a complex interplay between voxel resolution, SNR, and segmentation approach.

Machine-learning segmentation
For the ML segmentation of the newly acquired data, MLX10 and MLX25 approaches were pursued on unfiltered datasets, since it has been shown that the greatest improvement is found when ML approaches are used over traditional approaches on low-signal-to-noise datasets.The standard deviations for the simple, watershed, and both ML methods are shown in Fig. 8 a) for unfiltered data (NEX-V and NEX-U).When applied to NEX-V, the volume fraction variability was large for MLX10, ~0.04, but lowest for the MLX25, <0.01.When moving to the higher-resolution NEX-U data, both ML approaches showed improvement versus the unsupervised approaches, with both datasets having standard deviations of <0.01.Compared to unsupervised methods, the higher coverage ML method consistently outperformed simple and watershed segmentations in terms of repeatability between users, although compared with the open-source datasets, the variability for the unsupervised methods was lowered, likely due to improvements in  imaging parameters between the acquisition of the two sample sets.The lower coverage (MLX10) had 2-4 × more variation for NEX-V, but 2-3 × lower variation for NEX-U.This indicates that with lower coverage, to accurately extract volume fractions, a higher resolution is preferable, and if the resolution is lower, lower user variation can be achieved through a greater degree of coverage.In summary, user variability for both MLX10 and MLX25 is dependent upon the noise level and resolution of the acquired data.In the case of lower-resolution data, fewer projections or short exposure times, 25% coverage is advantageous over 10%, but for higher-resolution data, 10% coverage is sufficient to give less user variability, albeit with diminishing gains as the SNR ratio is increased.
Fig. 8 b) shows the average percentage difference in voxels across the three users for two datasets.In the lower-resolution NEX-V case, an interesting result is seen when comparing the simple and MLX25 results.Whereas the MLX25 approach yields the lowest standard deviation in bulk phase fraction, the image subtraction analysis reveals that there is greater variation in the voxels that are segmented as active material, suggesting that the user segmentations may vary but result in similar phase fractions.This supports the need for comparison with the base truth.In the higher-resolution NEX-U case, it can be observed that the ML approaches yield the same or lower variability at the voxel level, but the order changes -MLX25 appears the least variable by phase fraction, but MLX10 results in segmentations with fewer voxel differences between the three users (10.3% versus 15.3%).It is also evident that the absolute values are significantly higher in the higher-resolution case when compared to the lower-resolution case, which is thought to reflect a much greater number of voxels representing the boundary between active and non-active material.This observation reflects the appreciable impact that the choice of segmentation method has on absolute values for volume-specific surface area, even in cases where a similar volume fraction is determined.Further details of how the unsupervised and supervised approaches compare in terms of average percentage differences in voxels can be found in Table S2.
For the recently acquired samples, it was possible to compare the extracted volume fractions with a "base truth" via a complementary technique, FIB-SEM.The FIB-SEM volume fraction value was compared to the average result from the three users, averaging both length-scales, and calculating a percentage difference between them.Across all users, the NEX-U data gave results closer to this value, as expected for higherresolution imaging.Both ML methods performed better than unsupervised approaches, with MLX25 being within 1.5% of the base truth on average (2.4% for NEX-V and 0.6% for NEX-U) and MLX10 within 1.6% (3.1% for NEX-V and 0.1% for NEX-U).The traditional methods of simple and watershed were 2.6% (4.1% and 1.0%) and 2.7% (4.9% and 0.5%) from the base-truth, respectively.Despite large user variation for MLX10 applied to NEX-V, the users' average value was closer to the base-truth value.For this metric, ML is expected to provide a means to achieve a more reproducible segmentation, but voxel resolution is found to have more of a dominant effect in the case of high-quality data.
The variation from the base-truth highlights how each segmentation methodology must be considered individually.ML approaches appear to perform best for volume fraction calculations when applied to data with low SNR, perhaps because the user can iteratively improve the segmentation process as it is carried out, making it as accurate as possible.

Conclusion
A comprehensive quantitative assessment of applying ML segmentation to extract active material volume fractions was carried out on open-source X-ray CT data, and from the insights gained, two ML approaches were applied to acquired X-ray CT data and compared with results from FIB-SEM tomography.It was shown that although applying a filter to noisy raw data reduces erroneous segmentation of small features, denoising the data in this way does not guarantee less user variation in the extracted phase fraction value when performing unsupervised segmentation.Variation between users on noisy data can be reduced by applying supervised ML segmentation, though the training datasets must be applied manually and not as thresholded seeds to achieve this.It was found that these ML approaches were only appreciably effective when applied to unfiltered data and approaches involving training coverages of 10% and 25% were identified for application to acquired data, giving low standard deviations and minimised user time.Application of these techniques to recently acquired data, with higher signal-to-noise ratio, inidicated only minimally lower variation than unsupervised approaches, and only with greater coverage in the lower-resolution case.Importantly, when applying simple, watershed, or ML segmentation approaches, the latter yielded a closer value to the "base truth" volume fraction in all cases, showing very little (0.1%) deviation in the higher-resolution case.This work suggests that for the identification of the volume fraction of LIB electrode active material, the application of an iterative ML segmentation approach that involves manual training covering 10-25% of ca.3% of the total number of slices, yields a more reliable result than simple or watershed segmentations, but only in cases of data with lower signal-to-noise ratio.The additional user effort, using accessible open-source software, is therefore most justified when processing noisier data, which may result from intentionally short scans, for time-lapse tomography experiments, or high-throughput imaging.Moreover, it was shown that greater coverage in ML segmentation approaches was required when applied to coarser data to achieve less variability versus unsupervised techniques.These conclusions apply to the application of this methodology to the pristine state of LIB electrodes and further work would be required to explore its applicability to cycled electrode materials.
Overall, it has been shown that a segmented image can exhibit significant variation amongst users and across the various demonstrated segmentation methodologies.Given that metric extraction and microstructure-level simulations rely on accurate representations of the electrode active material, it is imperative that segmentation procedures undergo careful consideration to avoid uncertainty.This may, for example, consist of repetitive segmentation by an individual or team (as

Fig. 1 .
Fig. 1.Central 2D segmented orthoslices from each user from MLX25 segmentation of a)-c) open-source data (scalebar is 40 μm), and their associated absolute difference images: d) User A -User B; e) User A -User C; f) User B -User C, and d-f) acquired data (scalebar is 20 μm), and their associated absolute difference images: j) User A -User B; k) User A -User C; l) User B -User C.

Fig. 3 .
Fig. 3. Standard deviation plot across all simple and WS segmentation approaches for active material volume fraction.

Fig. 4 .
Fig. 4. a) Standard deviation and b) image subtraction plots across secondgeneration machine-learning segmentation approaches, for active material volume fraction.

Fig. 7 .
Fig. 7. Standard deviation plot across simple (S2) and WS (cWS) segmentation approaches, for active material volume fraction in NEX-V and NEX-U samples.

Fig. 8 .
Fig. 8. a) Standard deviation and b) image subtraction plots comparing S2, cWS, MLX10, and MLX25 segmentation approaches, for active material volume fraction in NEX-V and NEX-U samples.

Table 1
Provenance, type, and voxel dimension information for all X-ray CT data analysed in this work.
J.J.Bailey et al.

Table 2
Details of the various segmentation types and subroutines applied in this work.