Analyses of used engine oils via atomic spectroscopy – Influence of sample pre-treatment and machine learning for engine type classification and lifetime assessment

The analysis of used engine oils from industrial engines enables the study of engine wear and oil degradation in order to evaluate the necessity of oil changes. As the matrix composition of an engine oil strongly depends on its intended application, meaningful diagnostic oil analyses bear considerable challenges. Owing to the broad spectrum of available oil matrices, we have evaluated the applicability of using an internal standard and/or preceding sample digestion for elemental analysis of used engine oils via inductively coupled plasma optical emission spectroscopy (ICP OES). Elements originating from both wear particles and additives as well as particle size influence could be clearly recognized by their distinct digestion behaviour. While a precise determination of most wear elements can be achieved in oily matrix, the measurement of additives is performed preferably after sample digestion. Considering a dataset of physicochemical parameters and elemental composition for several hundred used engine oils, we have further investigated the feasibility of predicting the identity and overall condition of an unknown combustion engine using the machine learning system XGBoost. A maximum accuracy of 89.6% in predicting the engine type was achieved, a mean error of less than 10% of the observed timeframe in predicting the oil running time and even less than 4% for the total engine running time, based purely on common oil check data. Furthermore, obstacles and possibilities to improve the performance of the machine learning models were analysed and the factors that enabled the prediction were explored with SHapley Additive exPlanation (SHAP). Our results demonstrate that both the identification of an unknown engine as well as a lifetime assessment can be performed for a first estimation of the actual sample without requiring meticulous


Introduction
Lubricants constitute essential construction components for combustion engines and hence contribute to the strict requirements regarding the emissions, performance and efficiency of modern engines. Their main tasks include reducing wear of the moving parts as well as transmitting thermal stresses while keeping their viscosity, thus contributing to a longer lifetime and reliable operation of the machine. However, they also need to protect the engine from corrosion [1] and ensure durability by avoiding degradation [2,3], even under immense loads and extreme working conditions in modern high-output engines. Consequently, engine oils need to be equally subject to continuous development and oversight processes as the more prominent metallic construction parts.
Characteristically, engine oil can be taken as a universal indicator for the mechanical and thermal history of the machine it was used in [4,5], and thus provide valuable information about the condition of a usually sealed and inaccessible engine. Standard oil analytics include a wide range of parameters such as the viscosity, total base number, particle size of debris, soot content as well as its elemental composition [6,7]. As for the latter, there are three main groups of elements to be differentiated (see Fig. 1): Additives (e.g. Ca, P, Zn) [8,9], wear elements (e.g. Al, Cu, Fe) and contaminants (e.g. K, Na, Si). While the amount of wear elements mainly characterizes the state of the moving and therefore rubbing parts, contaminants may indicate leaks in the air induction or engine cooling system. Additionally, additives are used to maintain and control the physicochemical parameters of the engine oil [10,11] and improve anti-oxidative properties [12]. Furthermore, they can provide a characteristic 'fingerprint' that allows for distinguishing specific types of oil spectroscopically [13,14].
Regular laboratory analyses of engine oils can advert to engine malfunctions and enable to foresee and prevent severe damages. Thus, they may provide an early warning system for potential engine failures if the analytical results are interpreted properly [17]. In particular, this is relevant for industrial engines, as they often work stationary in the non-road sector and oil exchanges in these applications are directly linked to machine downtime and additional costs. For certain engine operations under low load, an extension of oil exchange intervals is desired but requires the manufacturer's clearance which can only be obtained by thorough oil analysis [15].
Complexity arises from the fact that the composition of an engine oil is highly specific for its intended use, considering e.g. the type of engine, site of operation and available fuel [18,19]. A direct comparison and assessment of different oil types is therefore challenging. Particularly, difficulties arise from engines of varying age, resulting in the necessity to compare oils with partially degraded additives and differing levels of wear particles and contaminants. However, since many engine types are mechanically resemblant, it can be assumed that they exhibit a comparable, yet distinct ageing behaviour [20] that might be extractable from underlying spectroscopic information.
For a typical ICP OES measurement, the sampled engine oil is diluted with kerosene to obtain the desired viscosity before introducing it into the ICP torch. Alternatively, the oil sample can undergo a pre-treatment [27][28][29] and be analysed indirectly (see e.g. DIN 51460-1:2007-11), i.e. by incineration followed by an acid digestion and measurement in aqueous solution [30]. Additionally, for both cases, a suitable internal standard (IS) can be used to further compensate for losses during the sample pre-treatment or other influences which may lead to a lower accuracy.
When dealing with complex matrices such as engine oil, chemometrics and machine learning can be applied to find patterns or build classification and regression models. Unsupervised methods like principal component analysis (PCA) and hierarchical cluster analysis have already been used on spectroscopic data to study degradation compounds in engine oils [20,27,31] as well as to detect adulterated engine oils or classify oil service conditions [32,33]. Apart from that, supervised methods have previously been used to predict engine oil characteristics, e.g. contents of cheap engine oil in adulterated oils via partial least squares regression [32], thermophysical and physicochemical properties via support vector regression [34,35] or the total acid number of various engine oils using several machine learning algorithms [36,37]. Furthermore, artificial neural networks models have been employed to predict physicochemical properties and oil ageing from infrared spectroscopy [38].
Although different machine learning algorithms may be suitable for specific analytical questions, the best performance for big datasets can be achieved by model ensembles that combine many learner variations. One such powerful technique is boosting, which adds classifiers, each focussing on the mistakes the previous classifier made [39]. Extreme gradient boost (XGBoost) is a scalable boosting system that uses a gradient tree boosting algorithm [40], which employs tree ensembles and proved to give state-of-the-art results on various machine learning problems [41]. This ensemble has been recently used to predict engine oil pressure levels during normal operation [42].
Commonly, machine learning algorithms develop models that are difficult to interpret. For this reason, SHapley Additive exPlanation (SHAP) can be applied for a facilitated interpretation as this tool enables quantifying the relative importance of model features [43,44].
In this study, we provide a comprehensive comparison of four combinations for sample preparation (oily, oily with IS, aqueous, aqueous with IS) by examining the analytical performance with various used engine oil samples of different engine types. Our findings aim at both a higher validity of oil analyses in general by identifying influences of sample pre-treatment, as well as to evaluate the potentials of applying machine learning to facilitate individual case decisions for lifespan estimation based on common oil analysis data. Ultimately, the goal of this study is to examine the feasibility of identifying the engine type, predict both oil running time and total operation time of an engine by applying XGBoost to regular laboratory oil check data and using SHAP for a comprehensive assessment of the models.

Spectroscopic analysis using IS and sample digestion
For the comparison of the four different sample pre-treatment methods, oil samples originated from engine test stands. Test engines were all L4, L6 and V8 common rail diesel engines (L = in-line, V = Vshaped engine), intended for use in industrial engines, which were run for up to 1300 h of operation. 15 representative oil samples from the beginning, middle and end of a common oil exchange interval were used. Oil samples were acquired according to DIN 51574:2017-04.
For the optional sample pre-treatment via incineration and wet acid digestion according to DIN 51460-2:2016-12, 10 g of used engine oil were heated on a rapid incinerator (Gestigkeit, Düsseldorf, Germany), ignited and allowed to burn down. After being kept in a muffle furnace (Nabertherm, Lilienthal, Germany) at 800 • C for residual soot removal, the cooled down ash samples were dissolved in boiling hydrochloric acid (VWR, Radnor, PA, USA). If an internal standard was used, 1% (w/w) of a 1000 mg kg − 1 Yttrium standard solution (Bernd Kraft, Duisburg, Germany) was added.
ICP OES analyses were carried out according to DIN 51399-1 using a SpectroBlue (Spectro Ametek, Kleve, Germany) in radial plasma observation, equipped with a standard Scott spray chamber and cross-flow  [15,16]. Values were obtained by averaging over all engine oil data used within the study.
nebulizer. Separate sample introduction systems were used depending on the analysis of oily or digested samples. Operation conditions for elemental quantifications are listed in Table 1.
For calibration, certified oil-dissolved standard solutions (Cono-stan®, SPS Science, Baie D'Urfé, Canada) and aqueous standard solutions (Bernd Kraft, Duisburg, Germany) were used for oily and digested samples, respectively. Stock solutions and calibration ranges were selected according to DIN 51399-1.

Dataset pre-processing for engine type classification
In order to identify variables which contain the most essential information about the engine type, state of wear and oil degradation, additional common physicochemical oil analyses were performed. Kinematic viscosity at 40 • C and 100 • C (denoted "viscosity40" and "vis-cosity100") as well as the viscosity index (VI) have been determined according to DIN EN ISO 3104:2017-11 and DIN ISO 2909:2004-08 using a HVU 490 viscometer (PAC GmbH, Lauda-Königshoven, Germany). For the determination of the density according to DIN EN ISO 12185:1997-11, a DMA 4500 density meter (Anton Paar, Graz, Austria) was used.
An anonymized dataset was provided by DEUTZ AG (Cologne, Germany) consisting of a broad variety of oil samples originating from both engine test stands as well as field trials. All engines were DEUTZ AG L3, L4 and L6 industrial engines which were run for up to approx. 9000 h of operation. Engine types vary in fuel (gas acc. DIN EN 589:2019-03, diesel acc. DIN EN 590:2017-10), displacement and forced induction. All oil samples have been acquired and analysed via ICP OES (oily without IS) as described in section 2.1.
As the original dataset included a broad variety of samples with different origins and partially limited oil analysis data, pre-processing was deemed necessary. In order to enable machine learning, engine types with less than 30 samples were excluded from the original dataset. Additionally, variables which were not available for the majority of samples as well as samples with missing features were removed, leading to 713 remaining samples and 21 features. 180 additional samples were only missing information about the running time and were thusly included in the engine classification step but left out for the running time regression (see section 2.3.). All samples represent 8 different engine types, as can be seen in Table 2.
All data analysis and machine learning was performed using R [45] with specialized packages [46][47][48]. Model interpretation within the SHAP framework was achieved using the Python SHAP package, accessed in R through the "reticulate" package [49].
The dataset was randomly divided into a training and a test dataset with 90% (803 samples) of the samples in the first and 10% (90 samples) in the latter. The training dataset was used for hyperparameter optimization and the test dataset for a final validation. Hyperparameter optimization was achieved using a Bayesian optimization and a 10 times repeated 10-fold cross validation (CV), optimizing the multi-class log loss. For the final evaluation, a 100 times repeated 10-fold CV was performed for the training dataset and a 40 times repeated validation for the test dataset. All other calculations were based on a 10 times repeated 10-fold CV.

Lifetime regression with XGBoost
For regression, the dataset used for classification was reduced to samples from test blocks where the running time of the engine was measured and both the total running time and the running time since the last oil exchange (here denoted "oil running time") were available, leaving a total of 713 samples for the regression (Table 3).
Regression models based on the oil analytics were built with XGBoost for both the total running time and the oil running time. Evaluation was performed using the identical procedure as for the engine type classification. For a visual representation of the regression model, an additional 10-fold CV was performed with the complete dataset plotting the observed against the predicted values as recommended [50].

Influence of sample preparation and IS usage on elemental determination
In order to compare and assess each sample preparation technique, the analytical performance of all calibrations has been investigated. A to be a particularly difficult additive element to analyse in aqueous matrix since its regression function exhibits a low linearity and hence leads to low precision analyses. The obtained values for LOD and LOQ are in the same order of magnitude as reported in previous works [51][52][53], confirming that the chosen methodology is not limiting the analyses of the datasets. Apart from that, almost all calibrations for both matrices could be fitted to linear functions with coefficients of determination R 2 > 0.999 and precision coefficients of less than 3%. Therefore it can be concluded that despite the lower LOD values for aqueous matrices, oily measurements still enable a sufficiently reliable quantification of all analytes which indicate a substantial ageing of the respective engine.
To enable a comparison of the analytical results for different oil samples i and elements j from the four sample pre-treatment procedures k under investigation (oily, oily with IS, aqueous, aqueous with IS), standard scores z i,j,k have been calculated according to equation (1).
Herein, ω i,j,k denotes the elemental content determined for a given configuration, expressed as a mass fraction, while μ i,j and σ i,j represent the mean value and standard deviation of all four methods for a given sample, respectively.
In order to explore patterns of how the four methods compare to one another depending on the respective analyte, the standard scores of each sample are depicted for all configurations individually (Figs. 2 and 3 for the most prominent elements, Figs. A.1 and A.2 in the appendix for further elements under investigation). For each element the calculated values for all four methods were connected with lines to offer an Table 1 Operation conditions for ICP OES measurements of oily and aqueous samples in radial configuration.

Parameter
Oily When comparing the score plots in Fig. 2, it can be noted that in general, additive elements exhibit higher score valuesand hence measured elemental contentsfor aqueous determinations in contrast to oily measurements. This is most obvious for typical additive elements such as Ca or Zn, where standard score values amount to +1 for the aqueous method compared to − 1 for the oily method, nearly independent of the real elemental content within the oil. For other additive elements, this is especially true if they are present in higher concentrations in the used engine oil. As recovery rates for ashing methods range from 55% to 102% [54,55], it implies that the direct measurement of oily samples entails even higher losses for additive elements.
Characteristically, wear elements like iron are prone to be underdetermined in oily samples as they are typically present as solid particles in used engine oil, and thus cannot be quantified correctly with direct injection [56]. For additives however, this behaviour can be ascribed to the structural properties of the organometallic compounds, which allow them to adsorb to soot and wear particle surfaces. As those particles cannot be completely atomized, it subsequently leads to an underdetermination of the additives. To partially account for these losses, the internal standard yttrium can be used, since it possesses a comparable physicochemical behaviour as many major additives.
Since Mg, B and Mo are part of additive compounds that are not used in every type of engine oil, the score plots distinguish two distinct patterns, depending on the overall oil composition: Engine oils which contain these additives to a significant amount exhibit the aforementioned trend concerning the different determination techniques (dark blue lines) whereas oils that are clear of these elements show a differing behaviour due to higher deviations with the determination of trace amounts (bright yellow lines).
Sulfur is an exceptional analyte in that it cannot be accurately quantified after incineration and acid digestion. This is because sulfur forms volatile SO 2 gas during the pre-treatment process and thus leaks in a matter that yttrium as an internal standard cannot account for. This  property also becomes apparent in the standard score plot where both aqueous methods show significantly lower score values than the oily methods. Fig. 2 further shows that for Ca, Mg and Znadditives with elemental contents exceeding 500 mg kg − 1 -the use of an internal standard with an aqueous determination leads to lower values than using a conventional external calibration. This is a consequence of calibrating with multi-element stock solutions, which are subject to substantial matrix effects. Due to the higher amount of energy consuming substance introduced to the plasma, both self-absorption phenomena and excitation interferences occur, decreasing the sensitivity for the determination of a single element [57].
The amount of wear elements present in used engine oils is much lower compared to additive elements, which can exceed values of 2000 mg kg − 1 . As becomes apparent in Fig. 3, elemental contents of less than 20 mg kg − 1 are typically measured for wear elements, thus putting much higher requirements on the analytical method.
Considering the score plots for typical wear elements such as Mn or V, it can be noted that the lowest score values are obtained with aqueous measurements due to losses during the pre-treatment process. However, adding an internal standard leads to significantly higher measured elemental contents, so that the score values match the results of oily methods or even surpass them. Furthermore, it can be observed that in general, oily determinations allow for a precise measurement of trace levels of wear elements, even if no internal standard was used.
Iron, as the most prominent wear element, shows a differing behaviour for high and low concentrations in used engine oils: While low amounts of Fe result in low score values with the aqueous determination and high score values with the oily method, the exact opposite result is obtained for high amounts of iron. This appears to be a consequence of differing particle sizes since chipped debris does not get sufficiently atomized within the ICP torch, thus compromising the elemental quantification with oily methods [58,59]. As the lifetime of an engine oil progresses, an increasing amount of gradually larger Fe particles get abraded within the engine [60], causing an underdetermination of iron at higher weight concentrations. The opposite can be observed when aluminium is under investigation, leading to the inherent assumption that with increasing engine running time smaller aluminium debris is produced [61,62]. This could be due to differing mechanical properties of construction parts out of various friction resistant alloys and coatings, e.g. of the crank drive component materials.
Nonetheless, an internal standard should be used for digestion procedures.
As the selected technique for sample preparation strongly impacts on the obtained results, the operator must consider the goal of a respective oil analysis. If the precise determination of additive elements is considered of highest importance, sample digestion is strongly advised. If the determination of wear elements is to be carried out within the same analysis, the use of an internal standard is recommended. Although a sufficiently correct determination of most wear elements can be obtained by a simple, quick and direct oily measurement, iron levels suffer from severe deviations.
This specialty in wear metal determination could be considered a methodological disadvantage for a single used engine oil sample. Nevertheless, when oil analytics are carried out in order to determine oil exchange intervals or predict looming engine failures, only the consistency of the results is important and minor errors of elemental content levels may be considered negligible.
Furthermore, this effect may also be deemed a feature in a multivariate dataset which benefits from a much larger sample size. For this reason, the most common and facilitated type of elemental oil analysis (oily without IS) was chosen to access a large dataset of used engine oil analyses. Using this dataset and considering all method advantages and disadvantages, an intra-method evaluation of all engine oil samples can be obtained by a multivariate approach. This further enables to extract Fig. 3. Standard score plots for four representative wear elements under investigation. In general, oily measurements enable an accurate determination of trace level wear elements, while acid digestion processes require an internal standard. For iron, the effect of debris particle size within the used engine oil becomes particularly apparent.
information concerning the engine type and condition by employing a time-and cost-efficient method.

Engine type classification
In this segment, we investigate the possibilities of predicting the engine type based on the measured properties of engine oil samples. To perform this study, XGBoost was selected as it allows for excellent achievable accuracies, provides a rather low computing time and comes with inherent mechanisms against overfitting [39,41].
With classification accuracies of 81.1% ± 1.5% in the test dataset and 83.7% ± 4.0% in the training dataset, the engine type classification proved to be applicable. As the mean accuracy of the test dataset is within the standard deviation of the training dataset, a successful prevention of model overfitting is indicated. Since gas engines are underrepresented within the dataset, it was necessary to check the classification model for distortions. Evaluating the classification sensitivity for gas engines, a value of 82.5% was obtained, which is close to the overall model accuracy. Moreover, with respect to the specificity of this class, 97.1% of samples from diesel engines were correctly rejected as engine type 1, which notably demonstrates the robustness of the classification model.
Considering the overall performance, improvements are still desirable. It can be expected that a higher number of samples will increase the accuracy and lower the relatively high standard deviation of the accuracy of the cross validation. The former is a result of XGBoost being designed to work best with extremely large datasets of several hundreds of thousands of samples [63][64][65].
In a first approach, the model accuracy is examined for consistency over the entire oil running time range. Fig. 4(a) displays to what extent the accuracy of the classification model hinges on the oil running time of the engines included within the model dataset. Due to inhomogeneously spread sampling times, sample binning was carried out with a constant number of samples per bin for a facilitated interpretation. Apparently, engine classification accuracy strongly fluctuates with the oil running time. For engine oils of more than 20 operating hours, the mean accuracy improves dramatically to over 85%. The classification of samples with low oil running times however faces a problem: Engines using the same fresh engine oil should theoretically not be distinguishable at the beginning of an oil exchange interval. Yet, there is still a basal classification capacity for samples with running times smaller than 1 h. This is due to the fact that within the first operating hour after an oil exchange, the refill oil mixes with residuals of the remaining used oil, enabling a minimum of discrimination. However, the classification performance for these distinct engines merely attained a mean accuracy of 65.2%.
It is with increasing running time that wear and degradation behaviour get more apparent and distinguishable. Thus, the classification performance should improve if oil datasets of engines with oil running times beneath a selected lower threshold are omitted from the model. To verify this effect, XGBoost classification models based on the entire dataset (including both the prior training and test dataset) with varying minimum oil running times were trained and evaluated, using the same hyperparameters as before. In contrast to the unmodified model, Fig. 4(b) therefore displays the obtained overall model accuracy values resulting from differing lower threshold levels with the given dataset.
By removing oil samples with less than 1 h of operation, a jump in accuracy from 84.1% ± 0.1% to 85.8% ± 0.3% could already be achieved for the overall model, increasing steadily and plateauing at around 20-40 h at 87.4% ± 0.2%.
With more than 40 h as a lower threshold, the accuracy decreases as a result of too small class sizes. Since the highest accuracy is achieved for sample acquisition performed after 20 h of operation, this can be seen as a guide value for a lower threshold of oil running time if the goal of an oil analysis is the determination of the originating engine type.
Another obstacle for classification is the inconsistency of the samples included in the dataset. Engines from field trials used a variety of engine oils which show tremendous differences in their composition and physicochemical properties, introducing additional variance. In addition, difficulties arise from the inhomogeneous spread of oil types within the dataset. While oil types A and B were found in 541 and 246 samples, respectively, the other 106 samples are spread over more than 10 different oil types. The effect on the classification performance can be seen when removing all samples from the otherwise unrestricted dataset except for oil types A and B, achieving an increase in accuracy to 86.0% ± 0.2%.
At this point, the limits of the engine type classification are clearly shaped. With more classes and more engine oils, the classification accuracy is doomed to decrease. Vice versa, the limitation to fewer different engine oils improves the identification reliability. When using the two most prominently represented oil classes and 20 h of minimum oil operating time, a maximum accuracy of 89.6% ± 3.8% can be achieved.
So far, the classification model has not shed light on the general relationships between the features, which led to the generated predictions, and thus makes it impossible to assess the plausibility of the model. The SHAP framework provides a tool for understanding blackbox machine learning models by calculating SHAP values for each sample and feature within the classification model. These SHAP values measure the relative importance of a feature with respect to the model output for each observation.
To demonstrate exemplarily how the SHAP framework can be used for the interpretation of engine type classification based on common oil analysis data, SHAP values for the training dataset based on the optimized XGBoost model were calculated. As we deal with a multi-class classification, matrices of SHAP values (n samples x m features) are generated for each of the 8 engine type classes. Fig. 5 depicts the cumulated means of absolute SHAP values for each feature. This parameter represents the overall importance of a feature for the classification of a certain engine type.
Whilst the Si content is the most important feature overall, it can be noted that the class discrimination is highly specific for different engine type classes. For instance, considering the Cr content as the most important wear element feature, it is predominantly relevant for classification as engine type 1, followed by types 4 and 8, while it is of negligible importance for the identification of engine types 3, 5 and 6. This can be ascribed to the lower amount of chromium found in oil samples of the gas-operated engine type 1.
Regarding oil additives, Mg is considered the most important element, especially for the classification of engine type 5. This might be due to the usage of a particularly Mg-free engine oil which is distinctive for this particular engine type. In contrast, common additive elements like Ca and S are only of minor importance. Notably for a contaminant, silicon stands out as the element with the highest impact on the engine type classification model. As for the physicochemical parameters, the oil density contributes less to the engine type classification than viscosity values. Especially engine type 6 can be identified by its distinct rheological behaviour.
Overall, wear elements possess the highest SHAP values, indicating that the intrinsic ageing fingerprint of an engine caused by wear is of higher interest for the presented classification model than the use of oils with specific additives. With Si showing such a prominent yet undifferentiating importance for the model, there might be an additional engine-specific silicon source. This is supported by the fact that for engine type 6, both Al and Si are considered to be of increased importance, implying the use of materials qualified by the particular lack or presence of these two elements (i.e. AlSi pistons or Al-containing bearings). Engine type 1 on the other hand is primarily identified by distinct Fe and Cr contents, indicating a peculiarity in the ageing behaviour of stainlesssteel components.
The importance plot in Fig. 5 conceals the way the features contribute to the classification model. However, this information can be extracted from the calculated matrices of SHAP values and depicted as a summary plot (see Fig. 6(a)). Herein, each data point represents a sample with its SHAP value plotted for each feature. Since we performed a multi-class classification, this plot can only display the SHAP values for one class at a time, e.g. exemplarily shown here for engine type 1. A positive value refers to the corresponding feature classifying the sample as belonging to the selected engine type class. Vice versa, a negative SHAP value disqualifies the sample as a member of the considered class. With a SHAP value close to zero, the feature is not deemed important for the classification of the particular oil sample. The dot colour displays the relative value of the respective feature.
Since elevated levels of Fe and Cr are related with negative SHAP values for the classification as engine type 1, it can be assumed that in this specific gas engine type, no stainless-steel components are exposed to significant wear. This hints at a low abrasion level caused by low engine load, a generally reduced power level or the avoidance of this material. Furthermore, a high content of zinc qualifies the oil sample as belonging to engine type 1, whereas low values contribute to the contrary. This implies that engine type 1 uses an engine oil with Zn-based friction modifiers [66], indicating an engine constructed for durability.
The SHAP dependence plot in Fig. 6(b) enables an even more differentiated insight into the influence of a feature on the prediction of the classification model. Each data point represents one sample with its SHAP value on the y-axis and the feature value (i.e. measured Cr content) on the x-axis. In contrast to the summary plot that only revealed a general correlation, this plot further examines the nature of this correlation.
Again, on the example of chromium for engine type 1, it becomes obvious that a Cr content of close to around 0 mg kg − 1 is specific for the classification as engine type 1. All samples with higher Cr contents have negative SHAP values and hence contribute to disqualifying the samples for this class. However, low viscosity40 values reduce the effect caused by the elevated chromium content and lead to slightly less negative SHAP values (Fig. 6(b), green sphere).

Oil running time regression
In order to reasonably assess the constitution of an unknown engine oil, it is mandatory to gauge its previous duration of operation. For this reason, we discuss the feasibility of an oil running time regression from the dataset in the following section.
Including all eight engine types in one model, we obtained a mean absolute error (MAE) of 121 h in the training dataset, which is 10% of the total running time range (1250 h), and 117 h for the test dataset. The coefficients of determination were 0.658 ± 0.131 and 0.772 ± 0.026, respectively. Overfitting of the model was refuted as the root-meansquare errors (RMSE) of (192 ± 35) h for the training dataset and (173 ± 9) h for the test dataset are consistent within their margins of error. Considering the underrepresented gas engine samples only, all regression performance values (MAE = 125 h, RMSE = 193 h, R 2 = 0.660) lie within the error range of the overall model, indicating no distortion. Thusly, our findings show that predicting the engine oil running time based on engine oil parameters using XGBoost is generally possible.
Nonetheless, a rather high spread of the data points can be noted in the scatter plots in Fig. 7. For both the training and test dataset, these graphs illustrate the observed values versus the predicted values for the oil running time regression model.
As the scattering increases for higher oil running times, the prediction accuracy consequently decreases. This is a result from major incoherencies within the dataset. Most notably, dealing with all eight different engine types and multiple different engine oils in the same model creates major difficulties for the oil running time prediction as each engine type exhibits different wear characteristics. Furthermore, including samples of several oil exchange intervals bears challenges as the engine wear and oil degradation behaviour also differs. This becomes most obvious in Fig. 7(a) (green sphere) for oil samples acquired immediately after an oil exchange, where the real oil running time is 0 h whilst the model predicts oil running times of 0 h-250 h. As aforementioned, residual oil and wear particles from the previous oil exchange interval mix with the new engine oil and thus lead to a practically unused engine oil already containing signs of wear. While this circumstance has been considered a benefit during engine type classification, it now impairs the regression model.
Taking those challenges into account, the prediction accuracy is deemed quite decent. To enable a better regression model, a larger and more consistent dataset would be necessary, as for the present study an engine-specific regression model with XGBoost would not be beneficial due to an insufficient amount of oil samples per engine type. While it may be possible to obtain practicable results with simpler chemometric algorithms when dealing with just one engine type, we considered to focus on the selected machine learner to obtain a more robust and versatile model which can more easily deal with engine data not included within the original dataset.
For the SHAP analysis, only one matrix of SHAP values (n samples x m features) results for the regression model in contrast to the previously performed multi-class classification. In this case, understanding the SHAP values is more intuitive: negative SHAP values contribute to a prediction of lower oil running times and vice versa. Fig. 8(a) gives the summary plot for each feature, with each dot representing one sample.
It can be noted that the oil density is considered the most important feature for the regression model and correlates positively with the SHAP value for the oil running time prediction. For many additive elements (e. g. P, B, S), a decrease in elemental content contributes to an estimation as fresh oil, as do elevated levels of certain wear elements (e.g. Fe, Ni). Previously, Lara et al. considered additive elements to be of major importance for oil running time estimation [27], however, wear and abrasion elements are equally important. During long-term and high-load operation, volatile fractions of the oil can degrade or evaporate and impair the physicochemical properties. This leads to lowered viscosity40 and increasing density feature values, correlating with high, positive SHAP values.
The influence of a feature on the output of the regression model is mostly non-linear, as observed for the most prominent wear indicator iron in the dependence plot ( Fig. 8(b)). There is a positive correlation between the iron content and the SHAP value for an iron content below 30 mg kg − 1 and over 80 mg kg − 1 , reflecting the anticipated increase of iron content with the running time due to wear. However, between 30 and 80 mg kg − 1 (green sphere), the SHAP values remain relatively stable in the range between 0 and 50. This indicates that oil samples within this range cannot be distinguished substantially by their iron content, as there is no sufficient discrimination in oil running time. Furthermore, the oil running time regression model also needs to be sensitive to the total running time of the engine as the wear behaviour of  an engine changes during its lifespan. For instance, iron wear abates with total running time, thus while having a low iron content in the first oil exchange interval may correspond to a low oil running time, the same value will imply a higher oil running time in a later oil exchange interval. The dependence plot in Fig. 8(b) reveals that the XGBoost model is able to adjust to his behaviour, as can be gathered from the different colours representing the total running time. When comparing samples with the same iron content, the SHAP values are generally higher for samples with a high total running time than for samples with a lower total running time. By doing so the model reacts to the gradually decreasing wear in an engine.

Total running time regression
Since engine oils are regularly exchanged after several hundred hours of operation, according to manufacturer's guidelines or after an oil laboratory recommendation, regressing the total running time of an engine was presumed to be particularly challenging. Yet, the regression led to satisfying results, contrary to our expectations: We obtained a MAE of 348 h for the 90% training dataset and achieved 310 h for the 10% test dataset. Coefficients of determination were 0.735 ± 0.143 and 0.771 ± 0.028, respectively. The RMSE of (658 ± 170) h for the training dataset and (573 ± 29) h for the test dataset overlap within their margins of error, successfully demonstrating the avoidance of model overfitting. Considering the underrepresented gas engine samples only, all regression performance values (MAE = 353 h, RMSE = 655 h, R 2 = 0.774) lie within the error range of the overall model, again indicating no distortion. To visually assess the regression model, Fig. 9 shows the observed versus predicted values of a 10-fold CV with highlighted oil running time ( Fig. 9(a)), as well as for distinguishing training and test dataset ( Fig. 9(b)).
Both the engine oil and total engine running time regression show comparable coefficients of determination while the MAE of the total running time is around 3 times higher than the oil running time. This is a direct result of the oil exchanges impeding the total running time regression, since it virtually resets the physicochemical oil parameters and the level of additives while the model has to cope with more complex interactions between all features for a precise running time estimation. Despite this major obstacle, the total running time of an engine oil can be estimated with a relative MAE of merely 4% within a range of 9000 h of operation. This order of magnitude is satisfactory to verify the operational history of an engine and potentially identify erroneous documentation of engine maintenance, no matter how often the oil has been exchanged. Therefore, the regression model still enables a sufficiently conclusive evaluation of an unknown engine with respect to its lifespan.
The SHAP interpretation is of even increased importance when the total running time is considered since the relationship between the oil parameters and the total running time is not self-evident due to oil exchanges. In Fig. 10(a), the SHAP values for each feature are shown in a summary plot, while the dependence plot in Fig. 10(b) displays the SHAP values for copper as a function of the measured Cu content and the viscosity40 values for each sample.
From the summary plot it becomes apparent that copper is the most important feature for the engine lifetime assessment in contrast to the oil running time regression. A low copper value is therefore more likely to be related to an older engine, whereas high copper abrasion is a notable indicator for a younger engine. Moreover, increasing levels of Zn in the engine oil appear to be a characteristic of older engines. Notably, Cr is considered of minor importance for the regression model while it was considered essential during engine type classification.
Considering the dependence plot ( Fig. 10(b)), one can note that while Cu values below 2 mg kg − 1 are related to engines with longer estimated running times, the opposite is true for higher contents. This is not uncommon as engines undergo a running-in procedure, therefore Cu abrasion originating from bearing material is decreasing over time. Thus, the Cu content in the engine oil gradually approaches 0 mg kg − 1 with frequent oil exchanges. Such behaviour is the key to predicting the total running time of an engine even though the engine oil is replaced regularly. However, the copper content cannot be considered as a single, ideal estimator because of an increasing uncertainty with total running time. For higher total running times no significant changes in Cu levels are observed, therefore a sensitive regression cannot be achieved.
The dependence plot further allows to distinguish between the two oil types A and B and indicates that the model adapts its prediction of the total running time considerably based on the influence of their respective viscosities. When comparing two oil samples with the identical, very low Cu content, a higher viscous oil sample (dark blue/purple data points) is correlated with a much younger engine compared to a low viscous oil (bright yellow/orange data points). Thus, this differentiation occurs by ascribing more extreme SHAP values to more viscous oil samples and therefore enables the model to adapt more sensitively to total running time estimation in spite of a similar copper content.

Conclusion
In the present work we demonstrated the influence of sample preparation and the use of an internal standard on the determined elemental content. Measuring oily samples allows for a more precise determination of most wear elements, but leads to an inherent underdetermination of additives. In contrast, elements that can be present as larger wear particles in used engine oil yield substantially higher values when determined from an aqueous solution. Oily measurements do not benefit from the use of an internal standard to the same extent as aqueous samples, thus justifying its omission. The sufficient performance of the commonly used oily measurement ratifies its widespread use and raison d'être due to its unrivalled simplicity in everyday laboratory work. More complimentary information could potentially be obtained for an automated implementation of sample digestion on a larger scale.
However, using the simplest method for elemental analysis, accurate classification and regression models were built using XGBoost. Solely based on common oil check data, a maximum prediction accuracy of 89.6% was achieved for classifying engines of eight different engine types, while both oil running time and total engine running time could be determined with a mean error of less than 10% and 4% of the observed timeframe, respectively. Although heavily deviating engine oils pose a difficulty for the classification, this may constitute a helpful first step during screening before further in-depth investigation is performed. SHAP analyses have shown that in general, wear elements contribute the most to the classification model, although every engine type is characterized by a specific feature pattern. Regarding the running time, decreasing wear past oil exchange intervals was observed. Hence, the oil running time regression has to cope with this behaviour, ultimately leading to the less afflicted oil density being the most important feature. In contrast, copper proved to be the analyte that is most important to assess the total running time of an unknown engine. Thus, by accounting for changes in wear behaviour, lifetime regression even beyond oil exchanges has been enabled.
These machine learning models enable a differentiated engine lifetime assessment tool which can be useful in product portfolio development. Knowing the wear behaviour for specific engine types, targeted construction part optimization and development can be pushed. Future studies could improve by a larger dataset and inclusion of more physicochemical parameters to allow for an even more comprehensive insight into systematic oil ageing and prediction of looming engine failures.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.  Analytical parameters for ICP OES regression functions of all observed elements using an oily determination without an internal standard. For each element and analytical method, limits of detection (LOD) and quantification (LOD) as well as the coefficient of determination (R 2 ) are given and show satisfactory values. Additionally, the method precision is given as the coefficient of variation of the procedure (V x0 ).  Analytical parameters for ICP OES regression functions of all observed elements using an aqueous determination without an internal standard. For each element and analytical method, limits of detection (LOD) and quantification (LOD) as well as the coefficient of determination (R 2 ) are given and show satisfactory values. Additionally, the method precision is given as the coefficient of variation of the procedure (V x0 ).