Reduced-order hybrid modelling for powder compaction: Predicting density and classifying diametrical hardness

diametrical hardness of pellets from a powder compaction process (pelleting). For interpretability, the models use as input only the parameters from a modified Drucker–Prager Cap (DPC) model calculated from process data monitoring and the applied maximal compression force. For quantitative density prediction, 8 features linked to first-principles models of powder compaction are generated, and the final model uses only 2. A critical part of the modelling, and also one of the main contributions, is a data augmentation step for the primary data set of this study by leveraging much smaller supplementary data sets that have measurements not present in the primary data set. The final results imply a significant reduction in the quantity of data needed for model input and cut down the cost of data acquisition, storage, and computational time. Additionally provided is a detailed analysis of the impact and relevance of the generated features on the model performance. The density prediction model, using only 2 features, reaches a mean absolute scaled error (MASE) of 12.9% and a mean absolute error (MAE) of 0.10 (where 𝑟 2 = 0 . 975 ). The scaled (diametrical) hardness classifier has an F1 score of 0.915 using 4 features.


A B S T R A C T
Drawing on machine learning (ML) techniques and physics-based modelling, two feature-based reduced-order models are presented: one for the quantitative prediction of density and another for the classification of the diametrical hardness of pellets from a powder compaction process (pelleting). For interpretability, the models use as input only the parameters from a modified Drucker-Prager Cap (DPC) model calculated from process data monitoring and the applied maximal compression force. For quantitative density prediction, 8 features linked to first-principles models of powder compaction are generated, and the final model uses only 2. A critical part of the modelling, and also one of the main contributions, is a data augmentation step for the primary data set of this study by leveraging much smaller supplementary data sets that have measurements not present in the primary data set.
The final results imply a significant reduction in the quantity of data needed for model input and cut down the cost of data acquisition, storage, and computational time. Additionally provided is a detailed analysis of the impact and relevance of the generated features on the model performance.
The density prediction model, using only 2 features, reaches a mean absolute scaled error (MASE) of 12.9% and a mean absolute error (MAE) of 0.10 (where 2 = 0.975). The scaled (diametrical) hardness classifier has an F1 score of 0.915 using 4 features.

Nomenclature
Internal friction angle (see DPC

Introduction
Powder compaction is a ubiquitous manufacturing process found in a number of industries, from pelleting metallic materials to anode and cathode coatings for batteries [1, Table 1], to producing pharmaceutical products, detergents, ceramics and many others. For example, the production of chemical catalysts relies on this process, and the catalyst industry was valued at ϖ33.9 billion in 2019 [2]. This is expected to grow by an estimated 4.4% per year, indicating a significant economic incentive to improve the pelleting process. In addition to economic implications, controlling this process more effectively will reduce unsustainable waste and generate actionable insights for manufacturers in terms of quality control [3]. In the context of this work, two key properties of the pelleting process are essential to monitor: pellet density and hardness.
The pelleting process typically involves 3 main stages [4]: die filling, powder compaction, and ejection. The die fill refers to the process of inserting the powder into the pellet mould, and the filling method aims to spread the powder evenly inside the mould. Often this is achieved using a paddle or a vibration feeder. Compaction of the powder further rearranges the particles and then deforms the material under increased pressure to form a pellet. Finally, the ejection stage applies a force to the underside of the pellet to remove it from the mould. Errors in any of these stages propagate through the manufacturing process and can produce pellets that are not fit for purpose.
Previous studies have primarily focused on physics-based constitutive models, for example plasticity models such as the Drucker-Prager Cap (DPC) model to describe plastic deformation [5]. Broadly speaking, the goal of this manuscript is to shed light on the relationships between elements of the manufacturing process and the resulting density and/or hardness and to contextualise these relationships by drawing on established first principles models such as the DPC model [6,7].
There are two broad model paradigms in the literature: particulate models and continuum models. The discrete element method (DEM) views the powder as elastic-plastic spheres [8] and basic contact laws between particles allow the behaviour of the macroscopic pellet to be simulated [9]. However, this approach has its limitations when dealing with high deformation and multiple contacts [4]. An alternative is the Finite Element Model (FEM) methodology to simulate the powder compaction behaviour with a pressure-dependent yielding plastic model (modified Drucker-Prager Cap model). This method requires timeconsuming mechanical testing for calibration, and its lab-scale model has extensive computing requirements. Another major limitation is that each simulation must be validated, which is a significant challenge [6,7].
These observations motivate the need for a faster model for quality control, with real-time response. The solution to be explored comes from machine learning (ML) modelling and associated reduced-order models. ML modelling has already made significant contributions to compaction models in the pharmaceutical industry, addressing die filling performance [10], and relationships between input variables (such as process parameters and powder properties) and critical quality attributes (CQAs) [11][12][13]. Recent work has also shown that hybrid data-driven and theoretical approaches are of interest due to their interpretability. For example, the authors of [14] derived a hybrid modelling approach which took into account the throughput of the tableting process, as this was found to impact density and hardness in addition to the compaction pressure.
In contrast to the previous pharmaceutical studies, the work contained within our paper concerns inorganic catalyst material, which is generally harder than pharmaceutical powders. However, the processing equipment and the underpinning physics is sufficiently similar to draw useful inferences from the pharmaceutical literature. This can be seen, for example, when comparing [15], which shows that compaction force was the most influential factor in the study of a pharmaceutical product, and [3], which mirrors this finding for inorganic catalyst material.
The goal in [3] was to understand the impact of the pelleting process parameters on the density and strength of the resulting pellets. This was achieved by developing a partial least squares (PLS) regression model and employing a sequential feature selection process to reduce the model inputs to the predictive sections of the time-series profiles Table 1 Variables recorded by data set and number of pellets (after data pre-processing of the various process parameters (including punch force, speed and displacement). The model utilised latent features from the PLS, which makes tractable causality analysis difficult to carry out. From this study, an accurate quantitative prediction of pellet density was achieved whereas the hardness prediction was less reliable. In this manuscript, the aim is to build from [3] (see also [16]: it clarifies factual information regarding the machines used in [3]) by asking the question: can one improve the predictive capability of ML models by using physics-informed feature engineering 1 ? This approach offers the benefits of reducing the time-series data to a set of interpretable features with clear physical meaning while avoiding the random element of data-driven feature selection methods that add complexity to model interpretation. For the hardness modelling, it is argued that a classification approach is more suitable than the quantitative prediction of [3]. This is a credible alternative given the poor reliability of the hardness measurements in the region of average hardness for the available data.
This work is structured as follows: Section 2 gives information about the data used in this study and Section 3 provides details of physicsinformed feature generation and the data augmentation procedure. The ML target problems and associated modelling process is described in Section 4. Section 5 presents the results and their discussion.

Data description and data pre-processing steps
There are 3 data sets in this study, 2 , and , where 2 was studied in [3,16] and was denoted there as 2 . These 3 data sets are collected from different experiments on the same catalytic material: the powders have the same size, range, and characteristics. Nonetheless, since each experiment is different, each data set contains different measurements (with overlaps). Table 1 summarises the data sets and measurements contained in each.
This section provides a detailed description of the 3 data sets (see also [3,16, Section 2.1 and 2.2]) and also outline the data preprocessing, including outlier removal and scaling. Specific descriptions are given afterwards.

Materials, experimental design and data collected
The primary data set, denoted 2 , was generated by Johnson Matthey. The feed powder was a co-precipitated base metal plus inorganic support material catalyst precursor. This has not been calcined, so it will comprise essentially classic low-temperature, high surface area crystalline morphologies of the oxides or hydrated oxides. The feed has been pre-compacted using a roll compactor, and screened for underand over-sized fractions. Representative data for the particle size are shown in Table 2. It is noted that the size distribution is rather broad, but this is influenced by production economics 2 . The pelleting data were obtained using a STYL'One Evolution (Medelpharm SAS, France) compaction simulator [3,16]. The STYL'One Evolution is designed to simulate production scale pelleting with an instrumented single-punch. The machine is fitted with an array of sensors to record data during the pelleting process (see Table 1).
In addition, an instrumented die (in the STYL'ONE Evolution) was fitted with sensors 3 to record the diametrical stress and axial stress (further used for data sets and , discussed below) against other variables during the compaction/pelleting process, such as time, punch displacement and pellet volumetric density (the method is well described below and in [6]). The recorded data from instrumented die compactor were used to calibrate solids' mechanical properties such as cohesion, internal friction angle, Young's Modulus and Poisson ratio for developing the Drucker-Prager Cap model (DPC; detailed in the methods Section 3). The pelleting process involves four main steps, given in order: (a) filling of the die; (b) pre-compaction of the solids (rearranging the particles); (c) main compaction of the solids (forming the pellet); and (d) ejection of the pellet from the die. The measurements (described in Table 1) are recorded every 0.01 ms (ms) over pre-compaction, main compaction, and ejection.
The compaction simulator was used to produce 200 pellets (200mg per pellet) for each of the six experimental runs, and each run used different solids feeder setups. Two different solids feeder systems were used: (a) a force feeder, which pushes solids over the die to allow it to fill; and (b) a vibration feeder, which vibrates so that solids fall into the die. The force feeder was used in both a left and right orientation, which changes the angle of the blade that pushes the solids over the die. The feeders were operated at different speeds and the vibration feeder at different vibration intensities. The feeder set-ups for each experiment are detailed below: • Force feeder 2 It is impractical in this case to require a homogeneous particle size distribution, as this would be too expensive for this industrial application. In particular, the low pass yield of the upstream roll compaction would result in prohibitive costs. 3 For clarification, ''instrumented die'' is a piece of unit in the machine. The pre-compression thickness was kept constant, the main compaction depth was set at 2.9 mm, and the machine speed was set at 25 cycles per minute. Furthermore, the SOTAX ST50 (SOTAX AG, Switzerland), a semi-automatic pellet hardness tester (diametric crush strength test), was used to measure four properties of the pellets: weight, diameter, thickness and hardness.
Although the experiment generated data for 1, 200 pellets, data set 2 contains only compaction measurements for around 800 pellets as they are compressed, and the measurements recorded are shown in Table 1.

Data pre-processing
Significant amounts of data processing is required for data set 2 . Initially, various summary statistics and non-numerical entries were present in 2 , which were identified and removed. Exploration of the data showed that pellets did not undergo compression at the same time. The stages in the process occurred at a range of times (see Figs. 1(a) and 1(b)). Plotting the time series profiles for all variables revealed that the lower punch force shows the pre-compression, main compression, and ejection stages most clearly. Therefore the pellets are aligned based on the time at which this variable reaches its maximum, which corresponds to aligning pellets so that the main compression occurs at a pre-determined time point. This shift is applied for all variables so that the main stages of the process happen at easily identifiable time points.
It is clear that there are large time intervals at which no compression or ejection is occurring. For example Fig. 1(b) shows that any measurements recorded before 2300 ms or after 3000 ms do not contain useful information. Removing these data points reduces the size of 2 very significantly (and improves the performance of the models discussed in later sections). Additionally, it is possible to remove all measurements in the time intervals between compression and ejection events without negatively impacting the performance of the models. That is, when no compression or ejection events are occurring, the behaviour of the punches have no impact on the resulting density or hardness of the pellet as the punches themselves are no longer in contact with the material. Therefore, removing the measurements recorded at these time intervals does not remove useful information from the model, hence not increasing the prediction error. Using this method reduces the number of time points by more than 70%, from approximately 70, 000 to fewer than 19, 000.
A further step in the pre-processing operations for the data is to identify and remove outliers. Inspection of feature profiles after alignment allows clear outliers to be easily identified, but an initial linear regression on the variables indicated that further significant outliers were present. Therefore, assuming that the data points are normally distributed, all points outwith a 95% confidence interval are identified as outliers and removed.
Due to the method of data collection, it is highly likely that there are mismatches in the compaction data and the response variables. The volumetric density (in g∕cm 3 ) and hardness (in Newtons, ) of the pellets were measured by collecting compacted pellets from the simulator and manually transferring them to a hardness tester. During the transfer, any changes to the order of the compacted pellets will result in a pellet being associated with a density and hardness measurement that is not correct. To identify these, one observes that the maximum compression force, denoted , is highly correlated with the density of the pellet, and thus an inspection of Figs. 1(c) and 1(d) shows clear identifiable mismatches in the data that are then removed. After removing the outliers, the data set 2 contains 771 observations (see Table 1). Throughout, the density and hardness measurements are scaled to have zero mean and unit variance. Lastly, Fig. 1(d) evidences a scatter effect in the hardness measurements and this will influence the modelling choices. The issue is discussed in Section 4.3 and includes a literature perspective.

Secondary data sets: and
Two further smaller data sets, and , were also provided by Johnson Matthey. These data sets correspond to independent experiments, each with data from 140 pellets (after pre-processing), that were prepared using the same feed material as for data set 2 -it is the same catalytic material: the powders have the same size, range, and characteristics.
The secondary compaction data set, , contains largely the same compaction information as 2 (see Table 1) and underwent the same data pre-processing steps as 2 . There are critical differences, namely that contains: (a) an additional measurement of the radial pressure in the die during compaction (measured from a different machine); (b) significantly fewer observations than 2 (140 pellets compared to 771 of 2 ); (c) there are 20 pellets for each of 7 pre-selected compression forces (hence averages can be computed); and, (d) the pellets in do not have any hardness or density measurement associated to them.
The other supplementary data set contains results of axial and diametrical hardness tests, and sufficient additional information to compute the volumetric density for each of its 140 pellets. Note that contains the same amount of pellets as , again 20 per pre-set compression force, and of these 20: 10 pellets have an axial hardness measurement associated and the other 10 have a diametrical hardness measurement. It is not possible to record both hardness measurements for each pellet due to the destructive nature of the test. Again, contains significantly fewer observations than 2 , and and are from different experiments. For emphasis, data set only contains hardness and (computed) density measurements, but in contrast to 2 , has both axial and diametrical hardness in (see Table 1). Finally, like , there are multiple pellets with the same (relative) density and thus averages may be used (see Fig. 4

below).
The experimental description of the data set confirms the pellets underwent the same type of compression force profiles as 2 and (but such profiles are not provided). Importantly, the measurements in cannot be matched directly to the compaction measurements from , but instead contains data about pellets of the same material as those in 2 and spanning a larger range of densities. Nonetheless, it is possible to approximately match the data in with the measurements in using the procedure detailed in Section 3.2, and it is these approximate pairings that are later used to enhance 2 . This procedure is a primary contribution of this work.

Further remarks on data set and
The data in are collected using a gravity feeder in the Styl'One compaction simulator (see Section 2.1), with an instrumented die capable of measuring the radial pressure during compaction. The diameter of the pellets was 11.28 mm, and for each given tablet thickness (or compaction pressure) twenty pellets were produced. Seven pressure levels were explored, and no hardness or density information was recorded for these tablets.
Data set contains only hardness measurements, again for seven distinct pressure levels. For each pressure level, twenty pellets are tested; 10 pellets are used to measure diametrical hardness, and the remaining 10 for axial hardness. This is because each of the tests are destructive so cannot be performed on the same pellet. Diametrical hardness was measured using the Brazilian test, which is standard practice in the pharmaceutical industry [17]. The test applied increasing compressive force across the diameter of the pellet to induce a fracture. The pressure at which the tablet fractures indicates the diametrical or tensile strength of the pellet. Similarly, the axial hardness is measured by exerting a compressive force at a 90 degree angle to the diameter of the pellet. Again, pressure at which the fracture occurs can be used to compute the axial strength of the pellet. Both of these measurements are recorded in .

Modelling and methods: data augmentation
The goal of Section 3.1 and Section 3.2 is to use the secondary data sets and to augment 2 , via the Drucker-Prager Cap (DPC) model. These dependencies are summarised in Table 3.
In this section, it is shown how to compute estimates of DPC parameters as functions of the relative density of the pellets. Here, the relative density is defined as the ratio of the actual density of the pellet to the maximum density of the pellet (measured when the maximum compression force is applied). Note that in Sections 4-6, the text refers to the scaled density of 2 , which is distinct from the relative density of 2 , and . The scaled density refers to the actual density scaled to have a zero mean and unit variance, and this scaling is done using Python's StandardScaler. Scaling is also applied to the diametrical hardness of 2 .

Feature generation: DPC model parameters in and
The Drucker-Prager Cap model is a plasticity model that is capable of representing the densification and hardening of a powder during compaction [18]. In this model, the powder is viewed as a continuum, and properties such as cohesion, internal friction, and material stresses are averaged over all particles -the validity of this representation is a generally accepted approach [19]. The idea is to express the properties of the material as the compaction progresses as a function of a specified state variable -typically the chosen state variable is the relative density,̃. Fig. 2 shows the components of a DPC model with the relevant parameters and an in-depth discussion can be found in [18]. The two distinct segments, the cap surface and the shear failure surface , represent the two behaviours expected during powder compression. A third surface, , is introduced to smooth the transition between and , and hence is known as the transition surface. These surfaces are described in (A.1)-(A.3).
When the hydrostatic pressure is low, the model shows that the powder will follow a Mohr-Coulomb shear failure line. Shear failure will occur when the stresses ( , ) lie on the Mohr-Coulomb line (where is the von Mises equivalent stress). Note that throughout this section, stress variables are measured in megapascals, MPa. When > ,  the behaviour of the material is described by the cap surface . This represents the hardening of the material as a result of compaction; if the state ( , ) lies on the surface , the powder undergoes densification and hardening. This is caused by plastic deformation in the material. The transition surface, , exists solely to smooth the transition between and . Note that these DPC parameters, summarised in Table 3, are typically density-dependent and are determined as a function of the volumetric plastic strain; [20] corroborates that DPC model-based material properties for compaction properties of dry granulated powders are density-dependent.
To determine the parameters of the DPC model, one begins by computing the stress and strain of the powder as it undergoes compaction. These parameters can be inferred directly using the compaction data in , and are then transformed into stress invariants as follows [22,Chapter 10]: In (1)-(2), refers to the compression force exerted on the powder, is the surface area of the pellet, and̃is the relative density. These are all computed from the data in .

Shear failure surface parameters ,
Consider the DPC model shown in Fig. 2. The shear failure surface is determined by the cohesion, , and the internal angle of friction, , which can be computed using as follows 4 : In (3), and denote the compression and tensile stress respectively, defined as where is the diameter of the pellet and is the measured thickness. The force ℎ refers to the axial hardness, and ℎ to the diametrical hardness. Additionally, the internal friction angle is converted from radians to degrees. Fig. 4(a) illustrates the relationship between the relative density of a given pellet and the internal friction angle , and a similar computation is undertaken for the cohesion . These computations aim to determine the parameters as functions of the relative density, (̃) and (̃). The approximate function (̃) is also shown in Fig. 4(a). One can then use these approximations to find estimates of and for the pellets in by: computing the relative densitỹfor each pellet, and then evaluating (̃) and (̃) at each of these values. These estimates are essential for the determination of the cap parameters in the proceeding section.

Cap parameterŝ, , and
The cap surface is described by the parameterŝ, , , and (see Table 3). Again, following [18, Section 4] and as described above, one estimates the values of̂, and for data via the functionŝ (̃), (̃) and (̃) determined by . Combining these with the shear surface parameters and , the cap parameters can be used to fully determine the DPC model via data set . All cap parameters are computed by following the formulae in [22, Chapter 10] (see (A.7)) and, just like for (̃) and (̃), one produces estimated functions for these.
Note that, since the parameters are density dependent, increasing the compression force will change the shape and magnitude of the DPC surfaces. Fig. 4(b) shows the relationships between the Young's modulus and the relative density of the pellets in , and it is clear that there is a strong positive correlation. This provides good evidence that  computing estimates for the DPC parameters in the data set 2 as inputs to a model is likely to generate a well-performing predictor.

Young's modulus and Poisson ratio
Using (1)-(2), one is able to compute the Young's modulus and the Poisson ratio of the powder for pellets in . This is done by following [18, Section 4] -see (A.4)-(A.5). Note that the results are dependent on the pellet's relative densitỹ. Therefore, compressing the powder at varying maximum compression forces generates a range of values for both the Young's modulus and Poisson ratio for the same material, so that these properties can be expressed as functions of the relative density. These estimated functions are denoted as (̃), for the Young's modulus, and (̃), for the Poisson ratio. Fig. 4(b) shows the true values of computed using and the polynomial estimate (̃).

Enhancing data set 2
Using the procedure outlined in the previous section, it is straightforward to compute the cohesion and friction angle from . One can then use the estimated functions (̃) and (̃) to find an approximation for and for each pellet in . Then, these parameters are used to find the cap parameters for pellets, for which one then produce polynomial estimates as discussed above. Along with the estimated polynomials (̃) and (̃), one now has the following functions of the relative density: (̃), (̃),̂(̃), (̃), and (̃). Contingent on a justification provided below, one finds the DPC parameters for 2 simply by evaluating these functions on the relative density of the pellets in 2 .
Inspection of 2 in Fig. 5(a) reveals that the relative densities of the pellets in this primary data set 2 range from 0.84 to 0.99. From this information one can retrospectively analyse the polynomial estimates and confirm that a good fit for the true data in this range is indeed observed. For example, in Fig. 5(b), the polynomial fits the true Poisson ratio well in this range, but has a steep negative gradient as̃is decreased below 0.6. Although this is not likely to be a good fit for real data due to the physical interpretation of , for this work one is not concerned with such fit outside of the range of interest determined by 2 .
At this point, one has generated, via extrapolation, estimates for all features displayed in Table 3 for 2 (with the exception of , which is arbitrarily chosen). These estimated features encode information from a significant proportion of the original features in 2 , which themselves have no established physical interpretation from a first-principles standpoint. This physics-informed augmentation procedure is motivated by an interest in contextualising the input features to the machine learning models developed next, so that the predictions can be clearly linked to first-principles models of powder compaction. The real gain of this methodology is that one is able to extract an increased value out of data that already exists without the need for additional testing or data generation.
In conclusion, this section shows how to augment 2 by including the DPC parameters of Table 3. To highlight the gain of the augmentation procedure note that in the next modelling step only the density, hardness, maximum compression of 2 , and the 7 DPC estimated parameters (justified by the strong relation seen in Fig. 1(c)) are considered. All other force measurements of 2 (see Table 1) are discarded: this reduces the dimension of the feature space from more than 70, 000 features to just 8.

Modelling and methods: Machine learning modelling
The next sections describe: (i) a feature selection mechanism (Section 4.1) used to ascertain which features highlighted in Table 3 are most relevant to the model and lead to parsimonious models; and, (ii) a description of the machine learning modelling carried out for the task of quantitative prediction of density (Section 4.2) and task of classification of hardness (Section 4.3).
Our modelling gain: an alternative approach to that used in this paper for the prediction of density would be to inject the estimated DPC parameters into the DPC model itself and solve it for the density. Using ML and statistics gives further interpretability to feature importance loadings and the small number of parameters needed to predict/estimate density (model parsimony). The sole use of DPC abdicates the ability to fine-tune the model scores on train/test and fit/overfit balance and -as is shown below -density can be estimated with a small number of parameters, much smaller than the 7 DPC parameters estimated in the previous section.

Feature selection process: mRMR
Effective feature selection has a number of benefits, including improved model interpretability [23]. In this work, the Minimum Redundancy and Maximum Relevance (mRMR) selection (introduced in [24]) is deployed. It chooses a subset of the original features based on the maximising the correlation with the target variable (relevance) and minimising shared information within the selected subset (redundancy). Specifically, the algorithm computes the mutual information between each of the original features and the target, and subtracts from this the mean of the pairwise mutual information between the features [25] (see (A.8)). The implementation used is that of [26]. Crucially, mRMR performs better than feature selection mechanisms based solely on Pearson's correlation.

Quantitative prediction model for density
A number of machine learning models for density prediction were developed and analysed, including classical linear regression (LR), partial least squares regression (PLS) [3,27], support vector machine regression (SVM) [28], and Gaussian process regression (GPR) [29,30]. All models performed similarly but GPR was able to reduce overall model variance and over-fitting, capture non-linear relationships in the data, and had the best performance (both in sample and out of sample) in terms of the metrics considered. For brevity, only GPR is discussed as it is not in the scope of this work to compare model performances.
Model description. GPR takes a Bayesian approach to predictive modelling and generates a probabilistic output; a prior is specified, which one then uses with the training data to define a posterior function. The GPR model then finds a probability distribution over all posteriors that fit the data, and predictions can be made using each of these. The mean of the predictions is taken as the point prediction for the GPR model, and one can also use the standard deviations to generate the prediction intervals of each prediction [31]. This provides a reliable uncertainty quantification element for every density prediction, which is a major advantage of this model.
Input features and prediction target. It is known that GPR performs poorly when too many input features (more than 20−30) are used [29]. Using only the DPC estimates and the maximum force max as features, a total of 8 features are used -this set is referred to as the Density Physics-Informed Features (D-PIF) set. Clearly, this set of features generates a model that is in the efficiency region of GPR with respect to number of input variables. The feature selection mechanism of Section 4.1 reduces these further. The prediction target is the scaled density of the pellets in 2 .
Model performance metrics. Two metrics are considered: the mean absolute scaled error (MASE) given in (A.9), and the classical statistical coefficient of determination, 2 . The models' evaluations is based on the scores on an independent testing set, and metrics are also computed during model training to identify and correct overfitting.
Details on model training and tuning. The data are split into training and testing sets in a 70∕30 ratio respectively. The model error is estimated using 5-fold cross-validation [32, Section 7.10]. A grid search on GPR kernels is performed, and the kernel minimising the MASE of the model is implemented. This kernel is the sum of a Matérn kernel with a length scale of 1000 and a smoothing parameter of 1.5, and a white noise kernel with noise level 0.5.

Classification model for diametrical hardness
The results from [3,16], the evident scatter in hardness measurements seen in Fig. 1(d) and initial experimentation showing poor MASE and 2 metrics, indicate that the quantitative prediction of hardness is a significantly more challenging problem than a quantitative prediction of density.
A perspective on the high scatter in hardness data. The literature indicates that catalyst crush strength data are subject to significant scatter [33][34][35]. Variability in excess of 25% is commonly seen in published data sets, even where the pellets are taken from an ostensibly homogeneous sample. This is irrespective of the test method used, whether horizontal crush strength (as used for the data set analysed herein), cutting or the three point bend test [36]. The key origin of the variance arises from the random alignment of microcracks, that are propagated in pellet fracture, with the tensile stress lines imposed during the test [37] and the data are commonly fitted using the fracture theory derived Weibull distribution [34,35,38,39]. ''Repeat'' tests from comparable pellets may however equally fit a Gaussian distribution in accordance with the noted random microcrack orientation [35,40,41]. Moreover, it has been shown that a large number of test samples  is required to confidently fit the parameters of the distribution [38], implying that production environment strength data will always suffer high scatter or low signal to noise ratio which will be difficult to fit by regression without risk of over-fitting Modelling choices. In order for the hardness model to function as a quality control tool, it is thus not necessary to generate a point prediction of the hardness. A classification model is a natural option if one is interested only in knowing whether or not a pellet falls within an acceptable hardness range.
Input features and prediction target. The input data for the classification model are the 8 features used for the quantitative prediction of density discussed in Section 4.2, and additionally the predicted density generated by the GPR model that is denoted by the symbol . A total of 9 features are used initially -this collection is referred to as the Hardness Physics-Informed Features (H-PIF) set. The mRMR feature selection algorithm is also applied to these features.
Next, a region of acceptability must be established. Fig. 1(d) (also Fig. 9(c)) shows that the majority of pellets have a scaled hardness measurement of approximately −1 to 1, and in-field informed discussion indicated that this cluster would be a reasonable acceptability region for hardness in 2 .
Model description. As in Section 4.2, several classification methods were implemented and analysed. Only the Multilayer Perceptron (MLP) model is discussed here as this achieved the best performance according to the relevant metrics (see below). The MLP model belong to the class of (deep) Neural Network models [42]: it contains an input layer, through which the features are passed to the hidden layers. These hidden layers are composed of perceptrons, which are simple artificial neurons that take an input and calculate a weighted sum, before applying an activation function, and then producing an output [43]. The final layer of a MLP model is known as the output layer. This model is trained efficiently using backpropagation to compute the gradient of the error in the network [42], and iterates standard gradient descent to converge to a minimal loss.
Model performance metrics. Analyses of classification models require a choice of metric to evaluate against. By default, many classifiers use an accuracy score but in practise alternative metrics can be significantly more insightful. An alternative metric, the F1 score, is often more informative as it balances the precision and the recall of the model, which provides a more complete picture of the model performance [43]. This is the metric against which the MLP model is evaluated, and is calculated as the harmonic mean of the precision and the recall. Eq.
This metric gives a high score to classifiers that perform well in terms of both precision and recall [43]. This is useful since one would like to minimise both the false negatives and the false positives. False negatives refer to pellets that are classed as not acceptable when in fact they are fit for purpose; these should be minimised because the model will be used to flag problems in the compaction process, and a high rate of false negatives reduces trust in the model and may obscure genuine errors. On the other hand, false positives are pellets that are not acceptable, but the model determines that they are. These misclassified pellets would be at risk of fracturing during packaging or transportation, or may be too hard to dissolve or react when used. Both errors in the model have significant impact and the F1 score in Eq. (5) provides a good solution balancing the precision-recall trade-off.
Details on model training and tuning. A number of parameters must be specified prior to training the MLP model. These choices are detailed in Table 4. Since a logistic activation function is used, the cross-entropy loss (also known as the logistic loss) function is minimised [44]. With this setup, the model is trained and outputs a binary classification of '1' for a pellet that falls in the acceptability region, and '0' otherwise.

Results
As discussed throughout the previous sections, understanding feature interpretability and relevance is a primary motivation for this work. In particular, the aim is to identify the smallest possible subset of physically meaningful features that produces a well-performing, parsimonious model according to the specified metrics. For each problem the feature selection and relevance is firstly discussed and the results of the model itself are then presented. Fig. 6(a) shows the Pearson correlation between the 8 features and both targets (density and hardness); this measure indicates how strongly related the features are in terms of their linear relationship, and suggests which may be most relevant for predicting a given target [45]. This correlation analysis is limited to linear relationships in the data and more complex nonlinear relationships are not highlighted here. One sees that the strongest positive correlation in the variables is between the density and the maximum compression force, , with the strongest negative correlation occurring between density and the Young's modulus, .  Table 3 for feature symbols and descriptions.

Feature selection
A further notable finding from Fig. 6(a) is that the Poisson ratio, , has low correlation with the density. This suggests that this feature is the least relevant of the D-PIF set when building a predictive model for density predictions. Section 4.1 describes how the Pearson correlation can quantify the relevance of each feature and weight this against the measured redundancy in order to select the best feature set. Fig. 6 displays the effect of feature selection on the model performance on the left ( Fig. 6(b)) and the mRMR feature relevance of all D-PIF on the right (Fig. 6(c)).
For density prediction, selecting only 2 features improves both the MASE and 2 score for the GPR model. As suggested by Fig. 6(a), the most relevant features are the Young's modulus, , and the maximum compression force . These are exactly the two features selected by mRMR, which indicates that there is minimal shared information (redundancy) between these variables. The selection of is in line with the findings of [3,15], and the strong correlation highlighted by Fig. 1(c). This latter feature has a clear physical interpretation and is easily controlled during manufacture.
The Young's Modulus is a pellet property that is known to crosscorrelate with pellet porosity [46,47]. When the feed material is constant, as it is in this manuscript, Young's Modulus makes for a (mathematically) good ''soft sensor'' or marker for density. Zhang et al. [47] in a univariate analysis of experimentally derived data note additionally that the hydrostatic yield stress ( ), internal friction and Cap Eccentricity ( , related to the evolution pressure, ) all show a significant monotonic relationship with pellet density and will therefore be statistically confounded. This may explain why the feature importance ranking in this study indicates only minor discrimination between their influence (Fig. 6(c)). Fig. 7(a) summarises the performance of several models for density prediction for both MASE and 2 , trained on only the estimated DPC features (Table 3) and maximum compression force. The results from LR, PLS and SVM models are displayed without detail and for reference only. It is clear that the GPR model is the better performing model in terms of the MASE score on the independent testing, achieving a 13.1% scaled error. Comparatively, the simple LR model has an MASE of 15.1%. The GPR parity plot is displayed in Fig. 7(b) and compares the density prediction to the actual measured density. The black line indicates a 1:1 parity and provides a reference point for the quality of the predictions. That is, the closer to the line a point falls, the closer the predicted density is to the true value. Residual errors are shown in the histogram inset in Fig. 7(b), and it is clear that these errors are distributed approximately normally with zero mean -this is indicative of a good model fit. Fig. 7(c) shows the prediction intervals for GPR, which are smaller than those of the other models implemented. This indicates that the prediction generated by GPR has less uncertainty associated with it. The relatively low level of uncertainty also promotes explainability in the results. Table 5 summarises the findings. It indicates that choosing the 2 features ( max and ) suffices for improved model performance in terms of MASE. This is likely due to correlations and noise present in the other features, which can cause overfitting, so removing the extra features improves the model's ability to generalise to testing data. The number of features seems to have no significant impact on the model's 2 score. Finally, the last row of Table 5 is discussed in remark 5.1.3.

Remark: using statistical features vs physics-informed features
The last row of Table 5 shows, for comparison purposes, the metrics when using only compaction data as input, where 'compaction data' refers to statistical features generated from only the original data in   Table 5 demonstrates that one is able to derive physically relevant features from existing data and train a maximally explainable model, whilst also improving upon model performance.

Feature selection
The analysis for feature selection for the hardness classifier is similar to that presented in Section 5.1.1 for the density prediction task. Fig. 6(a) shows that, for this data, the D-PIF features are more significantly related to the density than to the hardness. This may relate to the observation that a quantitative hardness prediction is a more difficult problem than for density [3], as the data does not contain a sufficient amount of information. The Poisson ratio, , is weakly correlated with the hardness compared to the other H-PIF features; the same was found when considering density correlations.
This observation is also found in Fig. 8(b), which illustrates that the Poisson ratio has minimal mRMR relevance, so it is clear that this feature is not informative for either prediction task. The F1 score for the classification model is not impacted significantly by reducing the number of features (see Fig. 8(a)). Therefore one chooses 4 features, in order to reduce the feature space by 55%. Again, is selected. The selection of and Young's modulus, , for the model is particularly interesting as is a quantity that is easily monitored and adjusted in situ while Young's modulus is a pellet property.
From a theoretical point of view, Griffiths' Fracture Theory [46], and the eponymous equation which shows the fracture tensile stress ∝ √ , it is not surprising that Young's modulus demonstrates a strong influence. Evidently though, the particle-particle powder friction, , and onset of plastic deformation during compaction, , are able to capture this important relationship statistically just as well (see Fig. 8(b)).

Model performance
Training the MLP classifier yields an F1 score of 0.915 on the independent test set, which signifies that there are very few false negatives or false positives identified. This indicates that the model  Table 3 for feature descriptions.
has very good predictive capabilities in terms of accurately classifying pellets according to their hardness.
It is known that machine learning models can be susceptible to overfitting [48], which is characterised by scores that are significantly better on the training data than on the validation set. This occurs when the model learns the training data too well, for example by learning the noise present in those data. Two ways to reduce the likelihood of overfitting are to reduce the complexity of the model, and to reduce the number of observations in the training set. A single hidden layer was chosen in Section 4.3 for simplicity, and Fig. 9(a) shows that overfitting does not appear to be a problem. However, one notes that using 242 training instances produces a model that generalises well to the validation set, with a sufficiently narrow confidence interval on the F1 score. It is clear that additional training data has little impact on the performance of the classifier so one selects 242 pellets randomly to train the MLP model. This promotes computational efficiency in the model without compromising on performance. Fig. 9(b) shows the confusion matrix for the trained MLP model. This illustrates the errors made by the model, and it is noteworthy to observe that the most common error is a false positive. That is, when the model makes an error, it is more likely that this mistake classifies an out of specification pellet as acceptable. The false positives identified in the validation set are shown in Fig. 9(c). This highlights the fact that the majority of false positives lie above the acceptability region, with a cluster of such errors falling just outwith the boundary.

Comparison with the literature
The main data set of this work, 2 , is also the data set 2 from [3] but in [3] neither or were used. In [3], the authors use Partial Least Squares (PLS) Regression, a covariance based regression model, to quantitatively predict scaled density and hardness. The input for the model are the complete force profiles of 2 (see the first four lines in Table 1), and during model training the inputs are reduced via PLS's Variable Importance in Projection scores. Critically, the features generated by the PLS are latent variables, which diminishes the understanding of why certain parts of the time-series are selected and how the selected features relate to rejected features. For quantitative prediction of scaled density, their PLS model with 4 latent variables reports MAE between 0.09 and 0.18, and 2 = 0.86. Comparatively, the 2-feature GPR model in this work reports an MAE of 0.10, and 2 = 0.975 (and MASE of 12.9). No comparison can be established between the hardness models as here one deals with a classification problem while there it is a quantitative prediction problem.
The main advantage that the model here offers in terms of the quantitative prediction of density is a construction based on physicsinformed features ultimately motivated by an interest in contextualising the input features to the models. The models in [3] cannot provide for this.

Conclusions
In this manuscript two predictive models were presented as well as a reproducible mechanism to enhance data sets by leveraging smaller ones from independent experiments. Regarding the models, the first is for the quantitative prediction of pellet density and the second is for the classification of pellet hardness as 'in-specification' or 'out-ofspecification'. The analysis provided demonstrates that is it possible to derive physically relevant features from existing data and train a maximally explainable model, whilst also improving upon model performance -the model selection of compaction forces and peak force is in-line with either DPC literature or early data-driven modelling [3,16]. The models developed improve upon significantly existing findings and provide insights into the relationships between compaction process parameters and pellet CQAs (density and hardness). This is a step towards optimisation of the pelleting process parameters to achieve reduced variability in pellet density and hardness. Critically, the models were built consistently with relevant quality control metrics of high-recall (minimising false negatives) and improved precision (minimising false positives). Our models are the most competitive data-driven physicsinformed reduce-order models found in current literature (to the best of the authors' knowledge), seen by either low errors, better explainability or a reduced amount of training data.
From an industrial perspective, the models developed here are inline with the trend of sustainable advanced manufacturing aiming to capture complex materials' properties and mechanical processing requirements. These models deliver on the possibility of controlling the process from design to manufacturing for less well-studied material, and thus provide savings of the high-value materials, energy consumption, working risks and other eco-environmental factors of practical trial and error. The mantra being the same as in [3]: ''Control of quality leads to improved economics and sustainability''.
From a practical perspective, the information provided by these models can be deployed to supervisor plant operators or embedded in an automated control system to guide and monitor the production process ensuring on-specification outputs. From a holistic point of view, such models help overcome the challenges of scale-up and scaledown by being digital-twin component proxies to the multi-variables across the production scales. Left open is further research into the impact of the processes upstream of pelleting, such as roller compaction and granulation, and the influence of formulation and manufacturing process disturbances. These would help to bridge the gap between R&D and commercial scale production.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
The data used for this research is proprietary to Johnson Matthey, and remains confidential outwith. For commercial reasons, the specific type of catalyst cannot be disclosed including compositional, morphological, or intended reaction duty characteristics. However, we have added some details concerning the nature of the material.

Data availability
The data that has been used is confidential.

Additional information
This work has no supplementary information file.

A.1. Drucker-Prager cap equations
The equations for the DPC surfaces described in Section 3 are given in this Appendix. See Table 3  One finds next the formulae to calculate the Young's modulus and the Poisson ratio. These quantities require the calculation of the bulk modulus and shear modulus , which is done by solving the equations below Recall that and are the axial stress and strain respectively, and then is at point in Fig. 3(a). The quantities , and are defined similarly, namely, as the stress and strain at points and . Also, and are the hydrostatic pressure at points and from Fig. 3(b) respectively.
The Young's modulus, , and the Poisson ratio, , are computed through

A.2. Machine learning elements
Feature generation: statistical feature from compaction data Observing the compaction data in 2 , it is clear that the force profiles of different pellets have varying skews, and the gradient of the compression force at each time point is steeper when the maximum force recorded is larger. For these reasons, one generates statistical features from the curve profiles in 2 (e.g., Fig. 1(b)) that correlate with the density variable. Table A.6 overviews said generated features. Table 5 contains the results of applying the ML model to the statistical features. In particular, the mRMR algorithm is applied to a set of features containing exactly the measurements from dataset 2 and the features in Table A Note that the number refers to the time at which the measurement is recorded (in ms), and that in this case maximum compression occurs at time 2595.1. That is, the punch force features selected appear to relate to post-compression relaxation.

Details on the mRMR feature selection algorithm
The mRMR algorithm utilises the F-statistic, (⋅, ⋅) and the Pearson correlation, (⋅, ⋅), to rank features according to their importance score, mRMR as shown [23]: where for ∈ {1, … , } are the original features, is the target, and is the set of selected features. This version of mRMR is known as FCD and is implemented for continuous features.
To select features, one computes the score of each candidate feature using (A.8) and then adds the feature with the highest score to the set of selected features, . This selected feature is removed from the candidate set. These steps are performed iteratively until | | = , at which point the algorithm terminates and the output is presented as the selected feature subset.

Performance metrics
The metrics considered in this study are the coefficient of determination, 2 , mean absolute error (MAE), mean absolute scaled error (MASE). They are defined as follows [49,Section 3.4]: where ,̂are the actual and predicted values for sample respectively, is the number of samples, and is the mean of the actual values.