Predictive Models for the Characterization of Internal Defects in Additive Materials from Active Thermography Sequences Supported by Machine Learning Methods

The present article addresses a generation of predictive models that assesses the thickness and length of internal defects in additive manufacturing materials. These modes use data from the application of active transient thermography numerical simulation. In this manner, the raised procedure is an ad-hoc hybrid method that integrates finite element simulation and machine learning models using different predictive feature sets and characteristics (i.e., regression, Gaussian regression, support vector machines, multilayer perceptron, and random forest). The performance results for each model were statistically analyzed, evaluated, and compared in terms of predictive performance, processing time, and outlier sensibility to facilitate the choice of a predictive method to obtain the thickness and length of an internal defect from thermographic monitoring. The best model to predictdefect thickness with six thermal features was interaction linear regression. To make predictive models for defect length and thickness, the best model was Gaussian process regression. However, models such as support vector machines also had significative advantages in terms of processing time and adequate performance for certain feature sets. In this way, the results showed that the predictive capability of some types of algorithms could allow for the detection and measurement of internal defects in materials produced by additive manufacturing using active thermography as a non-destructive test.


Introduction
Every industrial manufacturing process aims for the highest possible quality. Generally speaking, decreases in quality standards are linked to a wide range of defects that are inherent to manufacturing processes. These defects may be internal and may lead to failure and collapse of those structures, devices, or machines with additive-manufactured functional parts. Dealing with defects implies previous actions that detect and repairs parts in which they appeared or dismissed them, especially if the repair costs exceeded the manufacturing costs of new parts. Quality requirements are more critical geometry which can be used as input for the prediction model. These models could serve to design intelligent, automated, and non-destructive inspection protocols of additive-manufactured parts using active thermography.
Performance results for several generated multiparameter models are scientifically compared. In this way, a predictive technique based on the last advances in ML is proposed for the estimation of the geometric parameters of internal defects, using the thermal properties acquired with AT.

Materials and Methods
An ad-hoc hybrid strategy that integrates FEM and ML was designed to address this research. The two phases are described in the workflow outlined in Figure 1.

Numerical Model Design
Additive manufacturing (AM) procedures use different techniques for material deposition. Depending on these techniques, pores can be confused with small defects and appear, causing variations in the thermomechanical behavior in the different points and working directions. This problem was studied in laminated object manufacturing (LOM) [27] and fused filament fabrication (FFF) [28]. However, addressing this issue would imply a study of material properties after deposition at the mesostructural level and the preparation of a FEM model capable of simulating it. Although the study of this problem could be extremely interesting, it would enormously complicate this work. Thus, in this study, a simpler approach was carried out in which the macrostructural thermomechanical behavior of the material was considered as continuum and isotropic. This approach was previously considered in several works [7,12,14].
A FEM model was designed to study the effect that an internal defect (e.g., a hole-like) provokes in the heat flux and the temperature distribution. The geometry and the principal dimensions of the FEM model proposed in this work can be seen in Figure 2b. In addition, four points,P1-P2 and

Numerical Model Design
Additive manufacturing (AM) procedures use different techniques for material deposition. Depending on these techniques, pores can be confused with small defects and appear, causing variations in the thermomechanical behavior in the different points and working directions. This problem was studied in laminated object manufacturing (LOM) [27] and fused filament fabrication (FFF) [28]. However, addressing this issue would imply a study of material properties after deposition at the mesostructural level and the preparation of a FEM model capable of simulating it. Although the study of this problem could be extremely interesting, it would enormously complicate this work. Thus, in this study, a simpler approach was carried out in which the macrostructural thermomechanical behavior of the material was considered as continuum and isotropic. This approach was previously considered in several works [7,12,14].
Sensors 2020, 20, 3982 4 of 25 A FEM model was designed to study the effect that an internal defect (e.g., a hole-like) provokes in the heat flux and the temperature distribution. The geometry and the principal dimensions of the FEM model proposed in this work can be seen in Figure 2b. In addition, four points, P1-P2 and P3-P4, were located on the upper and lower surfaces, respectively, in order to study the evolution of temperatures through time. P1 and P3 are close to the defect, while P2 and P4 are far from it. The distance between P1-P2 and P3-P4 is 0.025 mm. The comparison between these points allows us to see the effect of the defect together with its superficial temperature distribution, using different thermal loads applied to the model. Considering that the model was prepared with a small thickness, it was possible to study the effect of the defect in the upper surface (reflection case) and lower surface (transmission case). The reflection case studies the temperature trace in the upper surface, where the heat excitation is applied. For its part, the transmission case studies the temperature trace in the lower surface, i.e., in the opposite side where the heat excitation is applied. This model was used to study the effect of the principal thermal properties (i.e., conductivity, specific heat, density, film coefficient, and emissivity coefficient) on both surfaces. The properties of the material were those corresponding to a polymeric material Nylon PA-12, a widely used material in 3D printing. All these material properties, the geometry and the heat process were proposed following [7] and can be seen in Table 1.
The model was subjected to a heating process (heating-step) in its upper face from 24 • C to 120 • C through a linear ramp for 20 s. Once the highest temperature was reached, the heat source moved away and the model started to exchange heat with the external environment through convection and radiation heat transfer processes (cooling-step). The studied values of this interaction can be seen in Table 1.
Finally, the model was meshed with DC3D8 for heat transfer 3D 8-node linear isoparametric elements using the commercially FEM software Abaqus2019 ® [11,12]. A biased, non-uniform meshing was defined to increase the density of elements in defective areas, improving the precision of data, and reducing the density of elements in the background area. The number of elements was reduced to 25% of the number of elements corresponding to a uniform mesh, maintaining the same precision in the areas close to the defect ( Figure 2). To complete the meshing design, different convergence analyses were conducted in order to obtain a mesh size, which can give accurate results in the defect areas, without penalizing considerably the time needed to compute the models. A size element of 1 mm was considered precise enough without penalizing the computational cost. Finally, each model had 5020 elements. Several command lines were added, using Python language to the file created by Abaqus. These command lines were programmed to obtain temporal evolutions of the temperatures in points P1 to P4. More command lines were used to plot contrast curves based on temperature versustime between points P1-P2 and P3-P4. The higher this contrast, the easier it is to detect a defect, as well as its size and location.
After all these steps were completed, the obtained results were used to apply ML techniques that allowed us to estimate the geometrical features of the defects using AT data.  Several command lines were added, using Python language to the file created by Abaqus. These command lines were programmed to obtain temporal evolutions of the temperatures in points P1 to P4. More command lines were used to plot contrast curves based on temperature versustime between points P1-P2 and P3-P4. The higher this contrast, the easier it is to detect a defect, as well as its size and location.
After all these steps were completed, the obtained results were used to apply ML techniques that allowed us to estimate the geometrical features of the defects using AT data.

Machine Learning Modelling
Different regression learners were applied and trained to compare their performances. The same model type was trained using different sets of features and different k-fold validations and/or hyperparameters in order to obtain the best performance setup.
MATLAB © [29] was used to train the next model types: linear regression, GPR, and SVM, while the open source software, Weka [30], was applied to train the random forest (RF) and multilayer perceptron (MLP) models. All the models were trained considering different features frames and parameters. The results of unsuccessful models were not reported, although some of them were indicated in Appendix A. The different predictive model typologies used are widely defined in the literature, yet in order to contextualize the raised research, a brief description of each of them is given below.

Linear Regression Model
Linear regression models are predictive algorithms which are easy to interpret and fast to predict. However, these models provide a low flexibility and their highly constrained form means that they usually have poor predictive accuracy compared to other more complex models. In this case, three different linear regression models were applied: (i) linear regression which uses a constant and linear term; (ii) interaction linear regression which applies interaction between predictors; (iii) stepwise linear regression, which analyses the significance of each variable [31]. In this work, we considered stepwise linear regression to prioritize the detection potential of the algorithm with respect to the physical significance of the statistical relationships between variables. In the last decade, the GPR model has attracted considerable attention, especially in ML approaches [32]. These methods apply non-parametric kernel functions based on probabilistic models (Bayesian inference) [20]. These non-parametric methods are usually more rigorous than the standard regression methods described above, especially for the treatment of complex and noisy non-linear functions [33] and its cross validation [34].

Support Vector Machine
SVM are supervised learning models initially used for classification problems but also for robust regression solutions [31]. SVM are non-parametric techniques that are still affected by outliers [35]. SVM robust regression may be useful to add robust estimators based on variable weight functions [31]. The flexibility of SVM methods are due to the kernel functions (radial basis function (RBF), quadratic, cubic, or linear) [36]. In this research, the four kernel functions were used. Furthermore, for RBF, three different kernel scales were used: fine, medium, and coarse. Those prediction errors that were smaller than the threshold (ε) were ignored and treated as equal to zero. Epsilon mode was automatically calculated using a heuristic procedure to select the kernel scale.

Random Forest
RF [37] is a known ensemble classifier that can be used for both classification and regression, like trees, where each tree is generated from different bootstrapped samples of training data [38], enabling many weakly-correlated classifiers form a strong classifier. RF is usually easy to implement and computationally fast, which performs well in many real-world tasks.

Multilayer Perceptron
MLP is an ANN method that uses backpropagation to learn a multilayer perceptron to classify instances. The MLP allows to represent some smooth measurable functional relationships between the inputs (predictors features) and the outputs (responses). MLP is a distributed, information processing system massively parallel and successfully applied for the generation of models to solve non-linear problems [39,40]. The processes are based on three different layers of neurons: input layers (N neurons), hidden layers (S neurons) and output layers (L neurons), where each layer has a group of connected points (neurons). Each connection has a numerical weight and each neuron of the network performs as a weighted sum of its inputs and thresholds the results. The momentum rate for the backpropagation algorithm was established as 0.2 for the standard value and 0.3 for the learning rate, while nominal to binary filter was applied. Hidden layers were established as (attributes+ classes)/2 for each test.

Evaluation of the Model Performance
The evaluation of the models can be implemented by assessing the difference between the observed values (ŷ j ) and predicted values (y j ) [20]. The performance of the regression learning models can be evaluated using classical performance results [41]. In this research, three statistical error types were obtained for each model: • Determination of the correlation coefficient (R2) between observed values and predicted values (1). When it is closer to 1, the correlation between observed and predicted values will be more adjusted. A theoretical value of 1 means a perfect correlation between the observed and predicted values, which could be interpreted as a perfect prediction (graphically, this would mean that all points represented in the predicted vs. actual plot are located in the regression line).

•
Mean absolute error (MAE): this error describes the typical magnitude of the residuals being robust to outliers (2). MAE was used to independently evaluate the accuracy of the model.

•
Mean square error (MSE): this error estimation was computed considering the square of the differences, being more sensitive to outliers than MAE.
• Root mean square error (RMSE): it was calculated as the square root of the MSE (3). In this way, the error data was converted to the units of the variable, making the data interpretation more intuitive in the magnitude of the response.
Finally, the training time is a parameter that was reported for each model in order to compare the response speed of each algorithm. To this end, all the trainings of the different models were implemented in an Intel Core i7-5700HQ, 2.7 GHz CPU without parallel computing. Additionally, the distribution and morphology of the residuals was another performance model indicator evaluated.

Simulation Results
Some of the calculations carried out and the results achieved in this study are shown below. With the initial values of the geometric variables, the thermal properties of the material and the thermal load curves applied, a calculation of the temperature distribution along the whole model was performed. Figure 3 shows the temperature distribution in the model at 53 s.
Finally, the training time is a parameter that was reported for each model in order to compare the response speed of each algorithm. To this end, all the trainings of the different models were implemented in an Intel Core i7-5700HQ, 2.7 GHz CPU without parallel computing. Additionally, the distribution and morphology of the residuals was another performance model indicator evaluated.

Simulation Results
Some of the calculations carried out and the results achieved in this study are shown below. With the initial values of the geometric variables, the thermal properties of the material and the thermal load curves applied, a calculation of the temperature distribution along the whole model was performed. Figure 3 shows the temperature distribution in the model at 53 s. Using the script developed in Python, the variation of the temperatures through time at points P1-P4 was recorded. Figure 4a shows the values of the temperatures through time at points P1-P4.
Also, the temperature contrast curves, P1 minus P2 and P4 minus P3, through time were calculated and plotted (Figure 4b), which show the difference of temperatures between areas near and far to the defect. Contrast curve P1-P2 shows how the presence of the defect affects the upper surface, the so-called reflection case, while contrast curve P4-P3, exhibits the effect of the defect on the rear surface, that is, due to the difference of transmissibility temperature between zones with defects and zones without them. It can be also observed how the maximum in the temperature curves appears at 53 s of the total time, that is, 33 s after the start of the cooling step. Using the script developed in Python, the variation of the temperatures through time at points P1-P4 was recorded. Figure 4a shows the values of the temperatures through time at points P1-P4.
Also, the temperature contrast curves, P1 minus P2 and P4 minus P3, through time were calculated and plotted (Figure 4b), which show the difference of temperatures between areas near and far to the defect. Contrast curve P1-P2 shows how the presence of the defect affects the upper surface, the so-called reflection case, while contrast curve P4-P3, exhibits the effect of the defect on the rear surface, that is, due to the difference of transmissibility temperature between zones with defects and Sensors 2020, 20, 3982 8 of 25 zones without them. It can be also observed how the maximum in the temperature curves appears at 53 s of the total time, that is, 33 s after the start of the cooling step.  The point of maximum contrast is of great interest since it would allow to determine the presence of the defect and its characteristics. Therefore, these points were used to analyze the variation of the input variables (i.e., thermal properties, size and thickness of the defect) over the upper face-reflection case (Contrast Front (∆ )) and over the lower face-transmission case (contrast rear (∆ )). The ranges of variation of the input variables established for this research are outlined in Table 1. The sets of values used in each simulation were automatically selected by the software within the thresholds indicated in this table. In this manner, two datasets first repeated the simulation 100 times and the second repeated the simulation 500 times.
A design of experiment (DOE) study was carried out using the Latin hypercube technique with 500 points. Figure 5 shows the Pareto plot for responses "Contrast Front" and "Contrast Rear". The size of the bars indicates the proportion in which each one of the input variables affects the variation of the output variables. The blue color indicates that the relationship is direct, while the red color indicates that is inverse, i.e., if the value of the input variable increases, the value of the output decreases and vice versa.  Figure 5 shows how the most influential variable in both cases was the maximum heating temperature ( ). This indicates the need to carry out a good design of the thermal loading process, adjusting this temperature as much as possible. Moreover, the size of the defect ( ) had a high The point of maximum contrast is of great interest since it would allow to determine the presence of the defect and its characteristics. Therefore, these points were used to analyze the variation of the input variables (i.e., thermal properties, size and thickness of the defect) over the upper face-reflection case (Contrast Front (∆T F )) and over the lower face-transmission case (contrast rear (∆T R )). The ranges of variation of the input variables established for this research are outlined in Table 1. The sets of values used in each simulation were automatically selected by the software within the thresholds indicated in this table. In this manner, two datasets first repeated the simulation 100 times and the second repeated the simulation 500 times.
A design of experiment (DOE) study was carried out using the Latin hypercube technique with 500 points. Figure 5 shows the Pareto plot for responses "Contrast Front" and "Contrast Rear". The size of the bars indicates the proportion in which each one of the input variables affects the variation of the output variables. The blue color indicates that the relationship is direct, while the red color indicates that is inverse, i.e., if the value of the input variable increases, the value of the output decreases and vice versa. Figure 5 shows how the most influential variable in both cases was the maximum heating temperature (T H ). This indicates the need to carry out a good design of the thermal loading process, adjusting this temperature as much as possible. Moreover, the size of the defect (L D ) had a high weight that indicates that the magnitude of the contrast could be used to estimate the size of the defect. On the other hand, it seems significant how, in both cases, the thickness of the defect (t D ) had a low effect, especially in the "Contrast Rear" case.
Finally, Figure 6 shows two of the many possible approximated surfaces that can be prepared to study the variation of the output values as a function of the variation of the input values. In Figure 6b, it can be seen that the variation of "Contrast Front" with the variation of the maximum heating temperature and with the size of the defect. Since both input variables have a high effect, the surface varies almost equally in both base coordinates. Instead, in Figure 6a, the "Contrast Rear" is shown in relationship with the length and thickness of the defect. Because the thickness of the defect has a smaller effect, the approximated surface changes more along the length related coordinate.
A design of experiment (DOE) study was carried out using the Latin hypercube technique with 500 points. Figure 5 shows the Pareto plot for responses "Contrast Front" and "Contrast Rear". The size of the bars indicates the proportion in which each one of the input variables affects the variation of the output variables. The blue color indicates that the relationship is direct, while the red color indicates that is inverse, i.e., if the value of the input variable increases, the value of the output decreases and vice versa.  shows how the most influential variable in both cases was the maximum heating temperature ( ). This indicates the need to carry out a good design of the thermal loading process, adjusting this temperature as much as possible. Moreover, the size of the defect ( ) had a high weight that indicates that the magnitude of the contrast could be used to estimate the size of the defect. On the other hand, it seems significant how, in both cases, the thickness of the defect ( ) had a low effect, especially in the "Contrast Rear" case. Finally, Figure 6 shows two of the many possible approximated surfaces that can be prepared to study the variation of the output values as a function of the variation of the input values. In Figure  6b, it can be seen that the variation of "Contrast Front" with the variation of the maximum heating temperature and with the size of the defect. Since both input variables have a high effect, the surface varies almost equally in both base coordinates. Instead, in Figure 6a, the "Contrast Rear" is shown in relationship with the length and thickness of the defect. Because the thickness of the defect has a smaller effect, the approximated surface changes more along the length related coordinate.

Machine Learning
First, an exploratory data analysis was applied to the datasets for both the 100-value and the 500-value. This was implemented using scattering plots, which showed similar trends in the relationship of the features for the two data collections. An apparent collinearity is detected between the "Contrast Font" (∆ ) and "Contrast Rear" (∆ ) because both are independent with respect to the rest of the features. This phenomenon could be due to the heat transfer and the presence of the defect, which makes the difference in temperature between the defect and non-defect zones very similar in both sides of the model. However, the relationship between the two features is not rigorously linear because it has a non-constant variability. The rest of variables are independent since they are inputs for the simulation processes (Table1). Therefore, in the following sections, different tests were implemented in order to find an adequate parsimonious model with the fewest assumptions. For the different trained models, MAE, R2, and RMSE were reported. The rest of parameters analyzed for each model were reported in the Appendix A in order to make easier its reading. Please note that in the finite element part, the geometric variables and the thermal properties are inputs, while the temperature contrasts ∆ and ∆ are outputs. On the other hand, this changes in the machine learning part of the study; the contrasts become∆ and ∆ together with the thermal properties inputs, while the geometric variables length and thickness of the defect are outputs, being this last the variables to predict using the model.

Defect Thickness Predictor
Firstly, models were trained only using the 100 sets of values to analyze what happens to a small sample size, but the predictive capacity was low. The most accurate model yielded poor

Machine Learning
First, an exploratory data analysis was applied to the datasets for both the 100-value and the 500-value. This was implemented using scattering plots, which showed similar trends in the relationship of the features for the two data collections. An apparent collinearity is detected between the "Contrast Font" (∆T F ) and "Contrast Rear" (∆T R ) because both are independent with respect to the rest of the features. This phenomenon could be due to the heat transfer and the presence of the defect, which makes the difference in temperature between the defect and non-defect zones very similar in both sides of the model. However, the relationship between the two features is not rigorously linear because it has a non-constant variability. The rest of variables are independent since they are inputs for the simulation processes (Table 1). Therefore, in the following sections, different tests were implemented in order to find an adequate parsimonious model with the fewest assumptions. For the different trained models, MAE, R2, and RMSE were reported. The rest of parameters analyzed for each model were reported in the Appendix A in order to make easier its reading. Please note that in the finite element part, the geometric variables and the thermal properties are inputs, while the temperature contrasts ∆T F and ∆T R are outputs. On the other hand, this changes in the machine learning part of the study; the contrasts become ∆T F and ∆T R together with the thermal properties inputs, while the geometric variables length and thickness of the defect are outputs, being this last the variables to predict using the model.

Defect Thickness Predictor
Firstly, models were trained only using the 100 sets of values to analyze what happens to a small sample size, but the predictive capacity was low. The most accurate model yielded poor prediction results (e.g., stepwise regression model was the more effective yielding R2 = 0.45). Then, the predictor model was calculated using the 500 sets of values of the dataset. The results for the models were trained using 500 sets of values ( Table 2). The "Contrast Rear" feature had to be included in the model to get suitable results. Otherwise, the predictive performance decreased significantly and the models obtained were not adequate (maximum R2 = 0.35).
The best results were achieved by the following two models: stepwise regression model and interaction regression model, although the training time is extremely much longer for the stepwise, as shown in Appendix A. The best predictor was the one that used all features (MAE = 5.148 × 10 −5 , R2 = 0.79). The error obtained was acceptable considering the range and order of magnitude of the predicted variable (5 × 10 −4 ± 50% mmin Table 1) However, when only 5 features (k, h, T H , ∆T F , ∆T R ) were considered (5 excluded), MAE increased by 36.71% and R2 was 0.63. For the thickness predictor, it was always necessary to consider "Contrast Rear" (∆T R ) to obtain suitable results.
Additionally, all the models were calculated using three different k-fold validation parameter (5, 10, and 15). The results for the 10-fold validation were reported and the MAE of the other two k-fold's validations, and indicated as deviation in the Appendix A. In this way, deviation values for the MAE were not meaningfully high. The results referred to the regression models and are reported in this section because the rest of the models (i.e., GPR, SVM, RF, and MLP) did not provide suitable results due to the small size of the dataset. An appropriate linear relationship between predicted response and observed response was observed for all the predictions. In Figure 7, this regression line is shown for the predictor model, which provides a minor MAE (Table 2). Residuals are close to a symmetrical distribution around zero.

Defect Length Predictor
Firstly, the experiment was implemented using 100 sets of values. Unlike in the predictive thickness model, in this case, significative different results were obtained considering or not considering the "Contrast Rear" feature (∆T R ), so this aspect allowed us to compare the model performance when ∆T R wasconsidered or excluded. Consequently, the results for the two configurations are reported (Table 3) and the different predictive features are removed in order to analyze the model performance for each feature setup. The best MAE result (1.398 × 10 −3 ) was obtained using the interaction linear model when the minimum number of features was included. This result could be considered as adequate considering the small size of the dataset and the magnitude order of the response: defect length (0.01 ± 50% mmin Table 1).
However, the models that provided better predictive potentials were the stepwise and the interaction regression models. On the other hand, the performance results were not very suitable (maximum R2 is 0.65). The decrease of the error when the "Contrast Rear" was excluded is shown in Table 4. In this case, when "Contrast Rear" (∆T R ) was not considered, the model performance increased in terms of MAE and RMSE (both are reduced) (Table 4). Moreover, in this case, the results of the deviation values for MAE, when different k-fold parameters were applied, can be higher in some cases (up to 26.86% increase for stepwise regression model). Once 100 sets of values were studied, the experiment was repeated considering 500 sets of values (Tables 5-8). In this case, the model typologies that provided the least amount of error were the GPR, specifically the square exponential (MAE = 6.665 × 10 −4 , R2 = 0.92) and the rational quadratic GPR (MAE = 6.666 × 10 −4 , R2 = 0.92), when the feature "Contrast Rear" (∆T R ) was considered and the defect features (thickness and emissivity coefficient) were excluded (Tables 5 and 6). Note that the training time used when the rational quadratic kernel was chosen is three times higher than square exponential kernel, as is shown in the Appendix A. In this way, a model based on the rest of features using GPR provided a high performance. However, training time was also higher in comparison with the other methods, especially the SVM (considering that four kernel functions and specifically three for RBF in function of kernel scale-fine, medium, and coarse-were considered), but these last ones provided a lower predictive model performance for the same setup (e.g., quadratic SVM provided MAE = 1.302 × 10 −3 and R2 = 0.66). RMSE results calculated using SVM demonstrated that the outliers have an important effect (RMSE was significantly much higher than MSA). In addition, these regression models were shown for being the least sensitive to sample size because they were the only ones that at least provided acceptable results with 100 sets of values.
The interaction and stepwise regression models also provided adequate performance results, specifically the interaction regression, when all features were considered (MAE = 9.588 × 10 −4 , R2 = 0.81) ( Table 5). The results showed a higher error than the GPR models, which is compatible with complex noisy non-linear functions [33]. Nevertheless, interaction regression required significantly less computational time (except for the stepwise regression model, which took very much longer). The difference between the different k-fold's validations used was less than in the previous dataset for the same type of model, possibly due to the larger size of the dataset, as is shown in Appendix A.
Once we observed that both regression models and GPR models provided more adequate predictive results, a correlation between the observed and the prediction response was plotted. The two models of each type with lower error and the best fit are shown in Figure 8. Residuals were approximately and symmetrically distributed for the regression model (being a favorable aspect for the suitability of the model), as well as non-linearly distributed for the GPR.
Finally, MLP and RF models were the fastest training algorithms (Tables 6-8), but the MAE was significantly higher than the other models, indicating that they tend to improve for cases where fewer predictive features are used. Moreover, the rest of the performance parameters were less suitable than other models for the chosen configuration and setup.
for the same type of model, possibly due to the larger size of the dataset, as is shown in Appendix A.
Once we observed that both regression models and GPR models provided more adequate predictive results, a correlation between the observed and the prediction response was plotted. The two models of each type with lower error and the best fit are shown in Figure 8. Residuals were approximately and symmetrically distributed for the regression model (being a favorable aspect for the suitability of the model), as well as non-linearly distributed for the GPR.
Finally, MLP and RF models were the fastest training algorithms (Tables 6, 7 and 8), but the MAE was significantly higher than the other models, indicating that they tend to improve for cases where fewer predictive features are used. Moreover, the rest of the performance parameters were less suitable than other models for the chosen configuration and setup.  Table 5. Performance results for defect length predictor models using a dataset of 500 sets of values. Part 1: Regression and Gaussian process regression model (GPR) when "Contrast Rear" is contemplated as feature. The models with the best predictive performance are indicated in bold type.   When the "Contrast Rear" feature was excluded (Tables 7 and 8), the highest performance model was the GPR (Table 7), especially for both square exponential (R2 = 0.86, MAE = 8.513 × 10 −4 ) and rational quadratic (R2 = 0.86, MAE = 8.531 × 10 −4 ) kernels. When MAE results were compared between the models, which included the "Constant Rear" feature (∆T R ), an increase in the MAE was detected for almost all trainline models (Table 9). In this way, there were models that "suffer less" from the loss of that feature: MAE increased when GPR models were used while the models where MAE increased less were SVM, RF, and MLP. The GPR models were more sensitive to the absence of such property than the regression models. In this manner, we can indicate, in general terms, that the "Contrast Rear" feature increased the predictive model performance, but this increase was not always significant (Table 9).

Conclusions
Using Python, a parametrical FEM model was prepared to study the effect that the presence of an internal defect generates on the temperature distribution of a thermally loaded solid. To check that the model worked correctly, a first battery of tests was carried out using the same geometries and materials employed by other authors [7,9,14]. Once it was checked that the thermal distribution results of these tests coincided with those from the authors in both shapes and values. The model was used to study how the thermal properties of the material (c, k, ρ, T E , ε, h, T H ) and the geometric variables of the defect (t D , L D ) affected some interesting contrast values (∆T F , ∆T R ), which were defined in Section 3.1. As a result, the influence of each geometric and thermal parameter (Table 1) over the contrast values were obtained ( Figure 5).
In a second step, the simulation output frames were used as input to train 474 different prediction models to estimate the possibility of using thermal parameters (c, k, ρ, T E , ε, h, T H , ∆T F , ∆T R ) and thus to predict the geometric features of the defect (t D , L D ). Different models in function of different features were established, trained, evaluated, and, finally, compared. The comparison of the different algorithms was the main contribution of this work.
Regarding defect thickness, it is possible to provide predictive models with moderate predictive performance. In particular, interaction linear regression and stepwise regression models provided adequate results. However, stepwise model was slower to train. The best model for defect thickness prediction using five features (k, h, T H , ∆T F , ∆T R ) was interaction linear regression (MAE = 7.038 × 10 −5 , R2 = 0.63). Using all the features the model gave a MAE of 5.148 × 10 −5 (R2 = 0.79). In this case the "Contrast Rear" feature (∆T R ) was necessarily included in the model to get adequate results.
It was also possible to make predictive models for the defect length or thickness. In this case, a higher result for a higher number of model's types was reported. When 100 sets of values were applied to train the models, only regression models provided adequate results, while if 500 sets of values were applied, different type of models gave adequate results. These models can be established both considering the "Contrast Rear" feature (∆T R ) and without considering it. However, when it is considered, the error tends to reduce (Table 10) and, consequently, the model performance improves despite the possible tendency towards collinearity between "Contrast Rear" and "Contrast Front" features. When "Contrast Rear" feature was considered, the best model was GPR based on a square exponential kernel that provided MAE of 6.665 × 10 −4 when defect thickness and emissivity coefficient were also excluded.
Regression models were also tested and these gave adequate performance results but more unfavorable than those provided by GPR models (interaction regression model gave MAE of 9.588 × 10 −4 and R2 of 0.81 when all features were used). MAE slightly increased when "Contrast Rear" feature was not considered (MAE = 1.183 × 10 −3 and R2 = 0.74) and increased as the different variables were excluded (for the minimum numbers of features: MAE = 1.628 × 10 −3 and R2 = 0.53). It was demonstrated that, for this case, the stepwise regression model did not provide significantly better results than the interaction regression models but significantly increased computational training time. However, the predicted versus actual plots showed an adequate linearity and constant variability for the interaction and stepwise regression models.
SVM were also models which allow the prediction of the defect length and their training times were very low, but their performances were less than the one obtained using GPR. However, a high outliers influence was detected for SVM model based on RMSE and MAE results, in predicted versus observed plots and was shown in the residual plots. If the weight given to the more extreme residuals is less, these models can be useful [31]. Additionally, MLP and RF methods provided predictions very quickly, but theirs performs were significantly worse than other indicated methods. A qualitative comparison based on the information obtained in this research is outlined in Table 10.
The key variables to establish an adequate predictive model for the different performed experiments were compatible with the weight given by the simulation results ( Figure 5). The predictive performance was improved using both front and rear contrast data (∆T F , ∆T R ). Monitoring of both sample's sides improved predictive performance but, in the case of defect length prediction, adequate results could be also obtained from monitoring the front surface (reflection).
Futures lines will address the testing of the calculated algorithms from experimental results and a deeper study of the regression models modifying different parameters, especially in the case of the multilayer perceptron. Moreover, a mesostructural model should be proposed to take into account the presence of pores provoked by the material deposition process, which can be confused with small defects and cause variations in the mechanical properties in the different points and directions.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
All the parameters analyzed for the different predictive models with the different configurations are outlined in this section (Tables A1-A6). Please note that only MAE, R2, and RMSE are reported in the manuscript but the rest of the parameters for each trained model is given here to facilitate its reading.